Detection and Impact of Somatic Blood Mutations on Inflammation in Solid Tumors A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Eti Sinha December 2024 © 2024 Eti Sinha DETECTION AND IMPACT OF SOMATIC BLOOD MUTATIONS ON INFLAMMATION IN SOLID TUMORS Eti Sinha, Ph.D. Cornell University 2024 Clonal hematopoiesis (CH) is an age-associated phenomenon that has implication for multiple diseases, ranging from hematological malignancies, cardiovascular diseases, and solid tumor cancers. CH is defined by somatic mutations in the blood that gain a fitness advantage, proliferate, and then alter the immune system dynamics and disease outcomes. This thesis is divided into two chapter, exploring CH in solid tumor patients through complementary approaches. In the first chapter, I developed a non-invasive targeted molecular profiling assay and analysis pipeline using DNA from peripheral blood samples to detect germline and CH variants at low variant allele fractions. I accurately classified CH variants and revealed a strong correlation between CH and germline variants. My findings suggest that CH can serve as a valuable biomarker in the context of hematological and solid tissue malignancies. The second part of the thesis investigates the direct impact of CH on the immune landscape of solid tumors. I demonstrated that CH significantly alters leukocyte infiltration patterns, leading to increased immune dysregulation and poorer clinical outcomes. Through statistical analysis of immune profiling and clinical data, I show that CH amplifies anti-inflammatory pro-tumor signals in the tumor microenvironment, which may contribute to tumor progression. Together, these studies highlight the importance of CH in shaping immune responses in cancers and its potential as a biomarker for guiding treatment strategies. v BIOGRAPHICAL SKETCH eRA COMMONS USER NAME: SINHAETI (78030508) POSITION TITLE: Graduate Student Research Assistant EDUCATION/TRAINING INSTITUTION AND LOCATION DEGREE START DATE COMPLETION DATE FIELD OF STUDY University of California, San Diego B.S. 09/2014 06/2018 Bioengineering (Biotechnology) Cornell University M.S. 07/2019 08/2021 Biomedical Engineering Cornell University Ph.D. 07/2019 12/2024 Biomedical Engineering A. Personal Statement At Weill Cornell Medicine, Eti’s work focused on understanding how clonal hematopoiesis progresses alongside solid and advanced tumors. Clonal hematopoiesis occurs when subclonal hematopoietic stem cell populations expand and impede normal immune system function. It is not well understood why this phenomenon is associated with certain cancers or therapies more than others. Alongside developing a genetic screening panel, I am tracking clonal hematopoiesis in patients with comorbidities over multiple timepoints. a. Afaf E. G. Osman, Nuria Mencia-Trinchant, Caner Saygin, Luke Moma, Aelin Kim, Genevieve Housman, Matthew Pozsgai, Eti Sinha, Pooja Chandra, Duane C. Hassane, Andrea Sboner, Kishan Sangani, Nick DiNardi, Christopher Johnson, Sara S. Wallace, Bana Jabri, Hue Luu, Monica L. Guzman, Pinkal Desai, Lucy A. Godley; Paired bone marrow and peripheral blood samples demonstrate lack of widespread dissemination of some CH clones. Blood Adv 2023; 7 (9): 1910–1914. doi: https://doi.org/10.1182/bloodadvances.2022008521 b. Dennis Lee, Tyler Augi, Kata Alilovic, Nuria Mencia Trinchant, Eti Sinha, Jorge Contreras, Michael Samuel, Pinkal Desai, Michael Kluk, Brianna N Smith, Gail J. Roboz, Ellen Ritchie, Michael R. Savona, Monica L. Guzman, Justin D. Kaner; PTPN11 mutations Confer Adverse Outcomes and Therapy Resistance in Older Patients with Acute Myeloid Leukemia (AML). Blood 2022; 140 (Supplement 1): 3208–3209. doi: https://doi.org/10.1182/blood-2022-167068 c. Abhay Singh Singh, Nuria Mencia-Trinchant, Elizabeth A. Griffiths, Mahesh Swaminathan, Matthew Gravina, Rutaba Tajammal, Mark G. Faber, Marc S. Ernstoff, LunBiao Yan, Eti Sinha, Megan M. Herr, Duane C Hassane, Monica L. Guzman, Amanda Przespolewski, Eunice S. Wang, Swapna Thota. DNMT3A and TET2 mutant Clonal Hematopoiesis May Drive a Proinflammatory State and Predict Enhanced Response to Immune Checkpoint Inhibitors. Blood, 2021. https://doi.org/10.1182/blood-2021-150347. d. Abhay Singh et al., Mutant PPM1D and TP53 populate the hematopoietic compartment after peptide receptor radionuclide therapy (PRRT) exposure.. JCO 39, 10605- 10605(2021). DOI:10.1200/JCO.2021.39.15_suppl.10605 e. Philipp J. Rauch, Alexander J. Silver, Jk Gopakumar, Marie McConkey, Eti Sinha, Eugenia Shvartz, Galina Sukhova, Peter Libby, Benjamin L. Ebert, and Siddhartha vi Jaiswal. Loss-of-function Mutations in Dnmt3a and Tet2 Lead to Accelerated Atherosclerosis and Convergent Macrophage Phenotypes in Mice. American Society of Hematology. 2018. f. Eti Sinha, Bhavneet Binder, Olivier Elemento, Duane Hassane. Immune Sculpting of Clonal Hematopoiesis in Advanced and Metastatic Solid Tumors. American Society of Hematology. 2019. B. Positions and Honors Positions and Employment 2014 - 2018 Research Assistant, University of California, San Diego. Laboratory for Biomaterials and Regenerative Medicine, Sanford Consortium of Regenerative Medicine. 2016 Teaching Assistant, University of California, San Diego. Department of Biological Sciences. 2017 Analytical Chemistry Intern, Genentech Inc. 2018 Teaching Assistant, University of California, San Diego. Department of Biological Sciences. 2017 - 2018 Research Assistant, J Craig Venture Institute, San Diego, CA. 2018 - 2019 Life Sciences Research Professional 1, Stanford University, Palo Alto, CA. Department of Medicine. 2019 Teaching Assistant, Cornell University. Meinig School of Biomedical Engineering. 2019 - 2024 Graduate Student Research Assistant, Cornell University. Meinig School of Biomedical Engineering. 2021 Bioinformatics and Computational Biology Intern, Foundation Medicine. 2022 - 2023 Practicant, Center for Technology Licensing, Cornell University. Honors 2023 2nd Place, 2023 Biomedical Business Plan Challenge, Weill Cornell Medicine 2020 Grand Prize, Stanford University CardinalKit Buildathon 2019 Phool Prakash and Rukmini Sahai Graduate Fellowship, Cornell University 2018 Cum Laude, UC San Diego 2018 Department of Bioengineering Excellence in Leadership and Service, UC San Diego 2017 Biomedical Engineering Society Chapter Outstanding Achievement Award 2016 Genentech USBTD Outstanding Student Award Other Experience and Professional Memberships 2019 - American Society of Hematology 2015 - 2020 Biomedical Engineering Society 2016 - 2018 Tau Beta Pi (Engineering Honors Society) 2014 - 2018 Society of Women Engineers vii C. Contribution to Science 1. Undergraduate Research: Eti’s first research experience focused on understanding how tissue can be regenerated in the context of myocardial infarction, under Dr. Karen Christman at UC San Diego. Eti worked on characterizing the mechanical properties of a naturally-derived hydrogel, extracted from the extracellular matrix proteins in swine hearts and lungs. This work led to the discovery that the novel hydrogel does promote regeneration of the heart tissue. Furthermore, Eti examined the bioactivity of a novel nanoparticle as a drug delivery system, which has the potential to deliver therapies alongside the hydrogel. The naturally-derived hydrogel is currently in Phase 1 clinical trials for an indication in myocardial infarction treatments. a. Eti Sinha, Jessica L. Ungerleider, Cassandra E. Callmann, Nathan C. Gianneschi, Karen L. Christman. Novel Growth Factor Drug Delivery System Promotes Cell Proliferation In Vitro. Biomedical Engineering Society Annual Meeting. Phoenix, AZ. October 12-15, 2017. (Poster) 2. Undergraduate Research: Under the mentorship of Dr. Yo Suzki at the J. Craig Venture Institute, Eti transitioned her research focus to genetics and computational biology. Eti worked on designing and optimizing standard operating procedures for creating a CRISPR interference library of a synthetic minimal cell. The synthetic minimal cell is a model of essential genes required for life, with only 473 genes in the entire genome. Eti worked on understanding 149 essential genes with unknown biological function. A major outcome of this minimal cell program has been new tools and semi-automated processes for whole genome synthesis. a. Eti Sinha, Komal Dani, Ayesha Khan, Tharini Siddappa, Emanuel Vasquez, Hamilton Smith, Yo Suzuki. Systematic Gene Activity Attenuation via CRISPRi in Synthetic Minimal Bacteria. Biomedical Engineering Society Annual Meeting. Atlanta, Georgia. October 17-20, 2018. (Poster) 3. Postbaccalaureate Research: Eti’s research at Dr. Siddhartha Jaiswal’s lab at Stanford University focused on understanding how humans age. Using single cell RNA sequencing, Eti analyzed differential gene expression in mice models of clonal hematopoiesis. Eti’s work showed the that there is direct causation between somatic mutations in hematopoietic stem cells and the development of atherosclerosis in aging mice. This work won ‘Best Abstract’ at the American Society of Hematology’ Conference in 2018. Furthermore, Eti worked on developing a low-cost diagnostic for detecting clonal hematopoiesis through DNA sequencing. a. Philipp J. Rauch, Alexander J. Silver, Jk Gopakumar, Marie McConkey, Eti Sinha, Eugenia Shvartz, Galina Sukhova, Peter Libby, Benjamin L. Ebert, and Siddhartha Jaiswal. Loss-of-function Mutations in Dnmt3a and Tet2 Lead to Accelerated Atherosclerosis and Convergent Macrophage Phenotypes in Mice. American Society of Hematology Conference. 2018. (Poster) b. Rauch, P.J., Gopakumar, J., Silver, A.J. et al. Loss-of-function mutations in Dnmt3a and Tet2 lead to accelerated atherosclerosis and concordant macrophage phenotypes. Nat Cardiovasc Res 2, 805–818 (2023). https://doi.org/10.1038/s44161-023-00326-7 4. Graduate Research: At Weill Cornell Medicine, Eti’s work focused on understanding how clonal hematopoiesis progresses alongside solid and advanced tumors. Clonal viii hematopoiesis occurs when subclonal hematopoietic stem cell populations. It is not well understood why this phenomenon is associated with certain cancers or therapies more than others. Alongside developing a genetic screening panel, Eti tracked clonal hematopoiesis in patients with comorbidities over multiple timepoints. D. Scholastic Performance YEAR COURSE TITLE GRADE UC SAN DIEGO 2014 Introduction to Computer Science: Java I A 2014 Engineering Design for Development A+ 2014 Calculus for Scientists and Engineers III A+ 2015 Introduction to Bioengineering P 2015 Introduction to Computer Science: Java II A- 2015 Linear Algebra A- 2015 Physics: Mechanics B- 2015 MATLAB Programming A 2015 Physics: Electricity and Magnetism A 2015 Introduction to Differential Equation A 2015 Physics: Fluids, Waves, Thermodynamics, Optics A 2015 Physics Lab: Electricity, Magnetism, Waves, Optics A 2015 General Chemistry Lab B 2015 Basic Data Structures & Object-Orientated Programming A- 2015 Vector Calculus A 2016 Biochemical Techniques B- 2016 Genetics A+ 2016 Organic Chemistry I A+ 2016 Statistical Reasoning for Bioengineering Applications B 2016 Organic Chemistry II A 2016 Math, Algorithms and Systems Analysis A- 2016 Seminar in Bioinformatics A 2016 Human Physiology I P 2016 Introductory Fluid Mechanics B+ 2016 Engineering Experimental Techniques B 2017 Dynamic Simulation in Bioengineering B+ 2017 Biotechnology Thermodynamics and Kinetics A 2017 Biomolecular Engineering A 2017 Bioengineering Mass Transfer A 2017 Chemical and Molecular Bioengineering Techniques B+ 2017 Principles in Biomaterials Design A 2017 Human Nutrition A 2017 Senior Design Project: Genetic Circuits A+ ix YEAR COURSE TITLE GRADE 2018 Bioreactor Engineering A 2018 Biotechnology Lab A- 2018 Cell and Tissue Engineering A- 2018 Biochemical Engineering B 2018 Data Science in Practice A CORNELL UNIVERSITY 2019 Precision and Genomic Medicine A- 2020 Introduction to Design and Innovation A- 2020 Core Concepts in Disease A 2020 Biomedical Data Science A 2019 – 2022 Graduate Seminar in Biomedical Engineering P 2020 AI Strategy and Applications SX 2020 Project Management A- 2021 Pharma/Biotech A+ 2021 Machine Learning Applications A x Dedicated to my father, mother, twin sister, husband, and friends that have become family. xi ACKNOWLEDGMENTS I want to acknowledge all the mentors I have had along the way. Thank you Dr. Olivier Elemento for your unwavering support along this journey. Your patience for me is truly unmatched. Thank you to Dr. Monica Guzman for your academic guidance and helping me re-adjust to NYC after the pandemic. Thank you to Dr. Duane Hassane for inspiring the beginnings of my thesis work. Thank you to Dr. Siddhartha Jaiswal, who has been a trailblazer in this field and gave me my first real opportunity to learn bioinformatics and computational biology. Your work inspired me to continue down the path of translational research. Thank you to my friends and family. I want to acknowledge Dr. Vikash Morar and soon-to-be-Dr. Katherine Nguyen, who have been my pseudo-PhD cohort in a remote/hybrid-PhD world. I want to thank my twin sister, Eva Sinha. I want to thank my mother, Monica Sinha, and my father, Dr. Swapna Sandesh Sinha, for your unwavering support. Your unwavering positivity has been inspiring. This PhD is as much mine as it is yours. Lastly, I want to acknowledge my partner Ravin Sardal. You have been a rock, keeping us steady since the day we met. You remind me to keep going and trying despite the countless failures I experience. Seeing you be passionate about your job is a constant inspiration for me every day. Thank you for encouraging me to go across the country for graduate school, even though it meant we would be long distance. xii TABLE OF CONTENTS BIOGRAPHICAL SKETCH………………………..…………………………………v DEDICATION…………………………………………………………………………x ACKNOWLEDGMENTS…………………………………………………………….xi TABLE OF CONTENTS…………………………………………………………….xii LIST OF FIGURES…………………………………………………………………. xii LIST OF TABLES………………………………………………………………….. xiv CHAPTER 1…………………………………………………………………………...1 CHAPTER 2…………………………………………………………………………...9 CHAPTER 3………………………………………………………………………….35 CHAPTER 4………………………………………………………………………….61 xiii LIST OF FIGURES Figure 1.1: Workflow of PreCISE1 assay and bioinformatics pipeline for CH and germline variant detection…………………………………………………………….22 Figure 1.2: Quality control benchmarks and limit of detection of PreCISE1 test…… 24 Figure 1.3: Expected vs Observed VAF of Control Samples..………………………..25 Figure 1.4: Observed Variants Across All Dilution Series…………………………... 26 Figure 1.5: Summary of reported somatic and germline variants from sample cohort at Weill Cornell Medicine……………………………………………………………..... 27 Figure 1.6: Number of reported somatic and germline variants……………….…...... 28 Figure 1.7: Associations between reported germline and CH variant……………...... 29 Figure 2.1: Summary of CH Variants in the TCGA Cohort…………………………. 43 Figure 2.2: Distribution of VAF…………………………………………................... 44 Figure 2.3: Effects of CH on the Tumor Microenvironment in Pan-Cancer ………... 45 Figure 2.4: Total Leukocyte Fraction…………….………………………………….. 46 Figure 2.5: Effects of CH on the Tumor Microenvironment in Lung Adenocarcinoma……………………………………………………………………... 48 Figure 2.6: Impact of CH variants on pan-cancer differential gene expression.……...49 Figure 2.7: Odds of CH and specific MHC Type 1 HLA major alleles……………... 50 Figure 2.8: Effects of CH on the tumor microenvironment of GBM.……………….. 51 Figure 2.9: Infiltration of CH in Tumor Microenvironment ……………………….... 53 xiv LIST OF TABLES Table 1.1 List of 119 Genes in Panel and Panel Coordinates……….…………….….19 1 CHAPTER 1 INTRODUCTION Initiation of Clonal Hematopoiesis Overtime, tissues throughput the body acquire mutations in cells through natural aging processes or environmental exposures. A bone marrow starts out with healthy hematopoietic stem cells, or blood stem cells1. It is estimated that humans have 50,000 to 200,000 hematopoietic stem cells (HSCs). Each HSC acquires approximately one exonic mutation per decade; by the age of 70, an individual is estimated to carry up to 1.4 million protein-coding variants—averaging 70 mutations per gene—in at least one HSC2. A mutation that confers a fitness advantage can proliferate and become a sizeable population, or clone. The expansion of mutated HSCs is called clonal hematopoiesis (CH) or clonal hematopoiesis of indeterminate potential (CHIP); It is known to increase the risk of hematological malignancies, inflammation, and in general age- related diseases like cardiovascular disease. CH mutations are acquired in the blood stem cells. These sub-clonal stem cells populations differentiate to other immune cell types, and circulate in the blood. Innovations in liquid biopsy and sequencing can detect these mutations with a simple blood draw3. These mutations are considered somatic mutations, or variants in the DNA acquired over one's lifetime. On the contrary, germline mutations are mutations or variants that are inherited. Furthermore, because somatic mutations are in a present in a small subset of cells, the variant allele fraction (VAF), or the proportion of sequencing reads that support the "variant" or "mutation", is lower. The VAF of a germline variant is either 50% (heterogenous allele) or 100% (homozygous allele). The VAF of 2 a somatic mutation can be anywhere from 0.0001%, or the limit of detection of current-day sequencing technologies, to around 30-35%, a cut-off that ensure germline variants with statistically sparse reads are not misclassified as somatic. Clonal Hematopoiesis in Healthy Individuals In 2014, several key studies explored and defined CH in a healthy population4,5,3. Clonal hematopoiesis is characterized typically by single somatic mutations in the exonic regions of known leukemia driver genes3. DNMT3A, TET2, and ASXL1 are genes that represent majority of CH mutation. These 3 genes are epigenetic modifying genes, and thus a loss-of-function (LOF) mutation in these genes can alter the expression landscape of the entire cell and several genes. The specific epigenetic changes are not completely understood. DNMT3A catalyzes the addition of a methyl group on DNA; in other words, DNMT3A increases methylation and subsequently decreases expression of genetic loci. Thus, a LOF mutation in DNMT3A is expected to increase expression of previously repressed genes. TET2 does the opposite by catalyzing a step for demethylation that subsequently increases expressions of genetic loci. Interestingly, a LOF in both genes is associated with worse outcomes across many diseases, such as cardiovascular disease and hematological malignancies. The prevalence of CH increases with age. Depending on the depth of sequencing, at least 40% of individuals above age 70 harbor CH clones2. With error-corrected sequencing, almost every above 70 has CH. It's important to note that a larger clone size is more likely to be associated in a clinical outcome. In this thesis, two cohorts are covered, one of which used error-corrected DNA sequencing and the other used WES. In the cohort that used WES, it is important to note that 3 small CH clones that may be clinically relevant may not be detectable, and thus a patient's CH status can be misclassified. Clonal Hematopoiesis and Disease Mechanisms CH has been associated with many diseases, including cancers which will be explored in the third chapter of this thesis, HIV6, heart disease7,8, and gum disease9. It is important to note that the incidence of CH and the specific genes that are mutated vary from disease to disease. The mechanisms behind the contribution of a LOF mutation in CH genes to worse outcomes is well- defined in the context of cardiovascular disease. Mechanisms related to cardiovascular disease include findings that Tet2-deficient macrophages display heightened inflammatory gene expression and produce higher levels of interleukin-1β (IL-1β) when stimulated with oxidized LDL, TNF-α, and IFN-γ (101). Additionally, increased IL-1β transcript and protein levels were observed in the atherosclerotic plaques of mice transplanted with Tet2-deficient cells, suggesting a potential causal role for IL-1β. Further experiments with isolated macrophages demonstrated that Tet2 helps suppress IL-1β transcription through histone deacetylation. IL-1β propagates a positive pro-inflammatory feedback loop that promotes atherosclerotic plaque growth. Methods for Detecting Clonal Hematopoiesis Two methods have been published for detecting clonal hematopoiesis from peripheral blood samples. The ArCH method is a 9-gene panel and bioinformatics pipeline that uses 4 different variant callers to detect CH10. The panel is limited to 9 genes (DNMT3A, TET2, ASXL1, TP53, CHEK2, AK2, SRSF2, SF3B1, PPM1D). Similar to the method described in chapter 2 of this thesis, it utilizes an input of 250 ng DNA and UMI barcodes and aims for 2000X coverage. The 4 second method leverages ultra-high sensitive sequencing with smMIPS to achieve 4000X coverage11. This test is limited to known SNPs and indels in 104 loci in 74 genes (genomic region corresponding to the mutated region ±6 base pairs). Variant calling is conducted via SAMtools12, which is prone to more false positives than more recent methods like Mutect213, VarScan214, Pindel15, Vardict-Java16. This test is optimized for useful for minimal residual disease (MRD), or re-sequencing of known variants, instead of discovery of novel variants11. CH has also been detected as a secondary finding in Cancer Genomic Profiling (CGP) assays, such as FoundationOne17,18 and MSK-Access, which sequence tumor and matched-blood samples to detect mutations localized in primary tumor cells. CGP assays aim to report only tumor-derived mutations, and thus aim to filter out confounding CH mutations. Differentiating between CH and tumor-derived mutations is more challenging for assays that rely on only peripheral blood samples. Once mutations are characterized as CH-related, they are often reported but not included in the treatment-decision making process for cancer patients because research on the effects of CH on solid tumors and their treatments is limited. Clonal Hematopoiesis and Treatment Decision Making Two studies are the first to include CH in patient stratification for treatment-decision making. In the CANTOS trial, individuals who were administered canakinumab has less likelihood of developing solid tumor cancers19. Canakinumab is a potential treatment for preventing cancer and proof that targeting IL1β blockade can prevent disease effects of CH mutations20. Colchicine prevented atherosclerosis progression in TET2+ mice models and patients in a retrospective analysis of the UK Biobank21. 5 Summary of Thesis Aims This thesis aims to create a new method for characterizing CH so that it can be further studied in multiple disease contexts. The first aim focuses on developing and benchmarking an assay and pipeline for detecting CH and characterizing the related comorbidities. Co-morbidities studied include: HIV, melanoma, cancer (prostate, lung, and breast cancer), COVID-19, obesity, and chronic myeloid leukemia. The second aim zooms in on one disease context: solid tumor cancer, where the role of CH is uncertain. More specifically, I explore whether CH has an impact on the tumor microenvironment (TME) in a large solid tumor cohort, such as lung cancer and DNMT3A mutations. A multi-omics analytical approach was used to evaluate the effects of CH on the TME and proposed possible mechanisms that explain worse outcomes in certain cancer populations. A cell-type deconvolution method estimated the proportion of cell types present in the TME. In summary, I aim to identify CH in healthy and disease patient populations and propose potential mechanisms that explain worse outcomes in solid tumor cancers. 6 REFERENCES 1. Buttigieg, M. M. & Rauh, M. J. Clonal Hematopoiesis: Updates and Implications at the Solid Tumor-Immune Interface. JCO Precis. Oncol. e2300132 (2023) doi:10.1200/PO.23.00132. 2. Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019). 3. Jaiswal, S. et al. Age-Related Clonal Hematopoiesis Associated with Adverse Outcomes. N. Engl. J. Med. 371, 2488–2498 (2014). 4. Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014). 5. Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014). 6. Dharan, N. J. et al. HIV is associated with an increased risk of age-related clonal hematopoiesis among older adults. Nat. Med. 27, 1006–1011 (2021). 7. Jaiswal, S. et al. Clonal Hematopoiesis and Risk of Atherosclerotic Cardiovascular Disease. N. Engl. J. Med. 377, 111–121 (2017). 8. Yu, B. et al. Supplemental Association of Clonal Hematopoiesis With Incident Heart Failure. J. Am. Coll. Cardiol. 78, 42–52 (2021). 9. Hajishengallis, G., Li, X., Divaris, K. & Chavakis, T. Maladaptive trained immunity and clonal hematopoiesis as potential mechanistic links between periodontitis and inflammatory comorbidities. Periodontol. 2000 89, 215–230 (2022). 7 10. Chan, I. C. C. et al. ArCH: Improving the performance of clonal hematopoiesis variant calling and interpretation. Bioinformatics btae121 (2024) doi:10.1093/bioinformatics/btae121. 11. Acuna-Hidalgo, R. et al. Ultra-sensitive Sequencing Identifies High Prevalence of Clonal Hematopoiesis-Associated Mutations throughout Adult Life. Am. J. Hum. Genet. 101, 50–64 (2017). 12. Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021). 13. Van Der Auwera, G. A. et al. From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr. Protoc. Bioinforma. 43, (2013). 14. Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012). 15. Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinforma. Oxf. Engl. 25, 2865–2871 (2009). 16. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108–e108 (2016). 17. Severson, E. A. et al. Detection of clonal hematopoiesis of indeterminate potential in clinical sequencing of solid tumor specimens. Blood 131, 2501–2505 (2018). 18. Sun, D. et al. Abstract 2289: Predicting tumor somatic versus clonal hematopoiesis origin for short variants in liquid assay. Cancer Res. 83, 2289–2289 (2023). 8 19. Woo, J. et al. Effect of Clonal Hematopoiesis Mutations and Canakinumab Treatment on Incidence of Solid Tumors in the CANTOS Randomized Clinical Trial. Cancer Prev. Res. (Phila. Pa.) OF1–OF8 (2024) doi:10.1158/1940-6207.CAPR-23-0342. 20. Ridker, P. M. et al. Antiinflammatory Therapy with Canakinumab for Atherosclerotic Disease. N. Engl. J. Med. 377, 1119–1131 (2017). 21. Zuriaga, M. A. et al. Colchicine prevents accelerated atherosclerosis in TET2 -mutant clonal haematopoiesis. Eur. Heart J. ehae546 (2024) doi:10.1093/eurheartj/ehae546. 9 CHAPTER 1 PRECISE1: TARGETED DNA PANEL FOR DETECTION OF CLONAL HEMATOPOIESIS AND RELATED COMORBIDITIES Summary Clonal hematopoiesis (CH) is a condition characterized by the expansion of a single or a few genetically aberrant hematopoietic stem cells, leading to the detection of somatic mutations in the bone marrow and blood. CH has garnered significant attention in recent years due to its clinical implications in various hematological and non-hematological cancers. This paper describes the development and application of Precise-1, a targeted DNA panel designed for the high coverage sequencing of genes associated with CH together with accompanying analytics. We show how the targeted panel (1300x coverage) can accurately and precisely (93%) detect CH at low variant allele frequencies (<1% VAF). We discuss the rationale behind the importance of detecting somatic mutations defined under clonal hematopoiesis and their potential clinical significance. Introduction Clonal hematopoiesis (CH) is described as the phenomenon where acquired somatic mutations in the hematopoietic stem cells confer a fitness advance that leads to higher proportion of these cells in the bone marrow1. As the mutant clone differentiates, its progeny takes a greater proportion of cells in the blood. Previous studies have shown presence of CH, despite absence of an overt hematological malignancy, is associated with a higher risk of hematological malignancies, heart disease and worse overall survival2. Detecting and monitoring CH, along 10 with other multi-omic or patient data, paves the way for understanding disease initiation and progression. Patients who are at higher risk of developing disease can be identified, while a greater understanding of the biology can lead to the development of personalized interventions and preventive strategies. Advances in next generation sequencing (NGS), and high-throughput molecular biology methods like amplicon sequencing and Universal Molecular Identifier (UMI) barcodes, allow for somatic mutations to be detected at lower levels within the blood, much prior to the development of disease3. Decrease in costs and turn-around time of sequencing has also led to sequencing of serial samples from the same individual, allowing the possibility of tracking the progression of not only the clones but also the initiation or progression of disease. Current methods of detecting CH rely on either NGS or digital PCR. Many academic centers have developed their own experimental and computational workflows for detecting CH. The type of NGS technology directly impacts the limit of detection of clones2, with error-corrected sequencing being able to detect variants as low as 0.1% variant allele fraction (VAF) and whole- exome sequencing having a cut-off of around 5-10% due to the lower coverage per base. CH was first defined with a VAF of at least 2%, but previous studies have shown that disease-risk associations with variants lower than 1% VAF4,5. Other targeted DNA tests and associated bioinformatics pipelines have been developed6,7, but only target a subset of genes related to clonal hematopoiesis and not the genes associated with co-occurring diseases with increased risk. 11 Here, we describe the development and implementation of a new targeted DNA panel and bioinformatics pipeline that detects CH and its associated at-risk diseases accurately and precisely at VAFs as low as 1%. PreCISE1 is an end-to-end assay and pipeline that takes in UMI-barcoded fastq reads and calls variants for any targeted sequencing panel. PreCISE1 can detect somatic variants at low VAF (<1%) with high sensitivity and positive predictive value (PPV) of 90% sensitivity and 100% PPV, while retaining 96% sensitivity and 100% positive predictive value for variants at higher VAF (>10%). Furthermore, the test also detects germline variants related to many associated diseases to clonal hematopoiesis, such as cardiovascular disease and solid tumor cancers. PreCISE1 has already been used to monitor patients to study basic biology, such as initiation of CH in the bone marrow8, and answer translational questions, such as monitoring clonal hematopoiesis clones in cancer cohorts with treatments9. Methods Calculating Limit of Detection Four acute myeloid leukemia (AML) cell lines, with 22 known somatic variants of primary VAFs ranging from 1% to 38%, were used as the technical positive controls. A peripheral blood (PB) samples from a healthy donor (NPB177) and the reference sample NA12878 (ref), were used as the panel of normal (PoN). For the positive controls, the mutant sample was diluted with normal NA12978 with a dilution of 1:2, 1:4, 1:10, 1:50, and 1:80. DNA Input limitations were also tested with 75 ng, 150 ng, and 250 ng (default DNA amount). Based on which variants were observed in the technical positive controls, specificity and sensitivity were calculated. Observed variants for the 22 known somatic variants are reported in Supplementary Table 2. 12 DNA Capture and Sequencing All samples were sequenced using a custom targeted DNA (TWIST Biosciences, South San Francisco, CA). The panel, with a bait territory of 461647 basepairs, captures 119 genes commonly mutated in CH and its co-morbities, such as cardiovascular disease and solid tumors. Experimental steps to extract, prepare DNA, and sequence have been previously described. DNA was extracted using the Quick-DNA Miniprep Plus Kit, 50PREPS (Zymo Research, Irvine, CA). Libraries were prepared from 250ng of genomic DNA using the library preparation kits with amplification from Kapa Biosystems (Wilmington, MA) following the manufacturer’s specifications. The performance of the assay was compared using two fragmentation methods, either sonication (Covaris) or enzymatic via the Hyper Plus Kit (Kapa Biosystems) targeting an average size of 350bp DNA molecules were tagged with dual-index barcodes containing UMIs for error correction. Libraries were sequenced (Illumina, San Diego, CA) using 150bp paired end reads. Sequencing yielded 19 million reads per sample, on average. Bioinformatics Workflow Every sample had a R1 (forward read), R2 (UMI barcode), and R3 (reverse read) files generated from the sequencing core at Weill Cornell Medicine. Adapters were trimmed and UMI barcodes are placed in read headers using fastp v0.20.1 10 with the flags -Q -A -L -w 1 -u 100 -Y 100 -G --correction. Remaining reads were aligned with bwa mem v0.7.1711 and deduplicated with gencore v0.13.012. Bam files for patient samples treated with enzymatic fragmentation went through an additional re-alignment step using FADE v0.2.213 to remove enzymatic- fragmentation related alignment artifacts. Somatic variants were called using VarDict-Java v1.5.114 and its auxiliary sub-functions teststrandbias.R and var2vcf_valid.pl. Flags for VarDict- 13 Java were -z 1 -k 1 -r 2 -x 10 -I 75 -Q 25 -c 1 -S 2 -E 3 -g 4. Variants were annotated with VEP v101.015 and additional datasets of COSMIC v9116 and TCGA17. Germline variants were called using GATK’s HaplotypeCaller and annotated with Funcotator. The bioinformatics workflow can be found at https://github.com/eipm/PreCISE. Quality Control Picard v2.23.318 tools CollectHSMetrics with the flag --COVERAGE_CAP 2000 and insertsizemetrics were used to get mapping quality control values for all 1037 samples. Mosdepth v0.3.219 was used to get depth at the base-pair level for each gene for 742 samples. Samples with at least 10 million total, mean bait coverage of 1000, 100 million on target bases, and “fold 80 base penalty” 20 were used in the analysis. Variant calling and filtration Germline variants were called using GATK’s best practices using MarkDuplicates, BaseRecalibrator, ApplyBQSR, HaplotypeCaller. Variants were filtered via VariantFiltration using the following options `-filter "QD < 2.0" --filter-name "QD2" -filter "QUAL < 30.0" --filter-name "QUAL30" filter "SOR > 3.0" --filter-name "SOR3" -filter "FS > 60.0" --filter-name "FS60" filter "MQ < 40.0" --filter-name "MQ40" -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" -filter "ReadPosRankSum < - 8.0" --filter-name "ReadPosRankSum-8" -filter "FS > 200.0" --filter-name "FS200" -filter "ReadPosRankSum < -20.0" --filter-name "ReadPosRankSum-20"`. 14 Somatic variants were filtered out using the following steps: 1. Select variant with loss of function, or a consequence of Missense, splice acceptor/donor, stop gain/loss, frameshift, inframe mutations 2. A panel of normal was generated from X healthy samples. Any variants that were in the panel of normal were removed. 3. Annotated as low_complexity 4. Potentially contaminating SNPs with a VAFq50 > 0.4 5. VAF > 0.01 and VAF < 0.3; <1% VAF when force-calling for serial samples 6. Exclude Likely germline variants AF < 0.25% (gnoMAD_AF) 7. Quality control values: a. At least 3 supporting forwards and reverse reads: AltFwd >= 3, AltRef >= 3 b. At least 50 supporting reads total: Depth > 50 c. VEP Entropy Values of EntropyLeft > 1 & EntropyCenter > 1 & EntropyRight > 1 d. Quality score greater than 45: QUAL > 45 e. Mean base position in read greater than 25 and less than 75 and a non-zero standard deviaton : PSTD != 0 & PMEAN > 25 & PMEAN < 75 f. Strand-bias odds-ratio less than 2.5 and non-zero: !((ODDRATIO > 2.5 & SBF < 0.1) | ODDRATIO == 0) g. Mean base quality greater than 30: QMEAN > 30 h. MicroSatellite length less than 10: MSI < 10 or for VAF greater than 0.1, MSI*MSILEN > 5 8. Remove variants recurring in >10% of samples 15 Somatic variants that did not pass the above filters were rescued if they met the following conditions: 1. Variants that were defined previously as pathogenic with an occurrence of at least 10 in COSMIC, labeled likely_pathogenic by VEP, and detected in TCGA Pathogenic variants were defined based on literature, that lead to a LOF in these specific regions: a. ASXL1 exon 12-13 b. SF3B1 exons 12-17 c. FLT3 exons 12-13, or protein changes Asp835 Ile836 Asn841 Tyr842 d. DNMT3A frameshift, stop_gain, splice-site variants, or protein changes Arg882, Arg736, Arg635, Ser714, Gly631, Leu737 e. TET2 frameshift, stop_gain, splice-site variants, or protein changes Arg175, Gly245, Arg248, Arg273, Arg249, Arg282 f. RUNX1 g. ATRX h. CBL protein change TER i. CEBPRA protein change TER j. NOTCH1 protein change TER k. RAD21 protein change TER l. U2AF1 m. TP53 protein changes Arg175, Gly245, Arg248, Arg273, Arg249, Arg282 n. EZH2 protein changes Phe670, Cys539 16 o. BCOR p. ETV6 q. NPM1 with a duplicated or insertion of basepairs “TCTG” or in exon 13 r. KIT protein changes Asp816, Asp820, Asn822, Val825, Lys826, Val540, Val560 s. SF3B1 protein changes Lys700, Glu622, Arg625, His662, Lys666, Asp742 t. IDH protein changes Arg140, Arg132, Arg172 u. JAK2 protein changes v. SRSF2 protein change Pro95 w. MPL protein changes Trp515 Tyr591 Ala519 Leu510 Ala506, Thr487, Tyr252, Ser204 2. If they are known hotspot regions: a. KIT protein changes Asp816, Asp820, Asn822, Val825, Lys826, Val540, Val560 or insertion/deletions in exons 10 and 11 b. SF3B1 protein changes Lys700, Glu622, Arg625, His662, Lys666, Asp742 c. FLT3 protein changes Asp835, Ile836, Asn841, Tyr842 d. IDH protein changes Arg140, Arg132, Arg172 e. NPM1 duplication or insertion in exon 13 f. JAK2 mutation in exon 12 or protein change Val617 g. DNMT3A protein changes Arg882, Arg736, Arg635, Ser714, Gly631, Leu737 h. TP53 protein changes Arg175, Gly245, Arg248, Arg273, Arg249, Arg282 i. SRSF2 protein changes Pro95 j. MPL protein changes Trp515, Tyr591, Ala519, Leu510, Ala506, Thr487, Yyr252, Ser204 17 k. EZH2 protein changes Phe670, Cys539 3. Have a MCAP score greater than 0.025 Germline variants were filtered out using the following steps: 1. Select variant with loss of function, or a consequence of missense, splice acceptor/donor, stop gain/loss, frameshift, inframe mutations 2. Annotated as benign by SIFT or PolyPhen (Adzhubei et al., 2010; Ng & Henikoff, 2003). 3. Potentially contaminating somatic variants from patient samples 4. Variants found frequently in gnomAD_AF and 1000 genomes project with a cut-off of X 5. Quality control values: a. At least 10 supporting alternative reads b. VAF above 30% The variant filtration steps can be found at https://github.com/eipm/PreCISE. 18 Table 1.1 List of 119 Genes in Panel and Panel Coordinates. A targeted DNA panel was designed to capture the exon regions of 119 genes that are related to clonal hematopoiesis and related comorbidities. 19 ACTA2 GATA2 PIK3CA ACTC1 GLA PIM1 ANGPTL4 GNAS PKP2 ANKRD26 GNB1 PMS2 APC GREM1 POLD1 APOA5 HIST1H1E POLE APOB IDH1 POT1 APOC3 IDH2 POU2AF1 ARID1A JAK2 PPM1D ASXL1 KCNH2 PRKAG2 ATM KCNQ1 PTEN BAP1 KDM1A RAD51C BARD1 KMT2D RAD51D BCL2 KRAS RUNX1 BCOR LDLR RYR2 BCORL1 LMNA SCN5A BMPR1A LPA SF3B1 BRAF LPL SMAD3 BRCA1 MBD4 SMAD4 BRCA2 MITF SPEN BRIP1 MLH1 SRCAP CARD11 MPL SRP72 CBL MSH2 SRSF2 CDH1 MSH6 STK11 CDK4 MUTYH TET2 CDKN2A MYBPC3 TGFBR1 CEBPA MYD88 TGFBR2 CHD2 MYH11 TMEM43 CHEK2 MYH7 TNFRSF14 20 COL3A1 MYL2 TNNI3 DDX41 MYL3 TNNT2 DNMT3A NBN TOE1 DSC2 NOTCH1 TP53 DSG2 NPC1L1 TPM1 DSP NPM1 U2AF1 EPCAM NRAS U2AF1L4 ETV6 PALB2 XPC FBN1 PAX5 XPO1 FLT3 PCSK9 ZRSR2 GATA1 PIGA 21 Results Quality Control The Precise1 panel covers 119 genes that are commonly associated as CH or leukemia-drivers (Table 1.1), cardiovascular disease and solid malignancies. Briefly, the PreCISE1 assay intakes either DNA from peripheral blood or bone marrow aspirates. DNA is sheared, barcoded, amplified, and captured with probes that cover the exonic regions of the genes of interest (Figure 1.1A). The corresponding PreCISE1 bioinformatics analysis pipeline takes in sequencer-generated fastq files and reports germline and CH mutations (Figure 1.1B). 958 patient samples and 68 control samples (total 1026 samples) have been used for the PRECise1 panel and pipeline validation and analysis for this paper. To confidentially detect somatic variants above 1%, samples with enough coverage passed for analysis. 959 samples (93%), oh which 898 are patient samples and 61 are controls, passed quality control filters described in the methods section. Across the 959 samples, the mean target coverage is 1470x (median 1108x), mean bait coverage of 2016x, mean total reads of 18.8 million reads, and average on-target rate of 63% (Figure 1.2A-C). The mean depth across all genes is 1327 is reads (Figure 1.2D). The gene with the lowest coverage is CEBPA with a mean median coverage below 750x; CEBPA is historically known to be difficult to sequence via short-read sequencing due to high GC content20. 22 Figure 1.1: Workflow of PreCISE1 assay and bioinformatics pipeline for CH and germline variant detection. Visual representation of bioinformatics pipeline. (A) The experimental workflow isolates DNA material from either blood PMBC or bone marrow (B) The bioinformatics pipeline can be broken down into three sections, which begins with sequencer-generated fastq files and outputs of germline and somatic variant calls relevant to the panel. Limit of Detection 61 positive and negative control samples passed quality control, from which 825 variants were detected. The positive control samples have 22 unique known variants with a respective expected VAF. A dilution series, via mixing NA12878 as negative control, was used to determine if known variants were able to be detected at lower VAF to establish a threshold for the panel. The expected and observed VAF for each of the control variants was compared via a linear model and showed high correlation 23 with a R^2 value of 0.97 (p < 0.001) (Figure 1.2E). Each control variant was sequenced multiple times and with a variety of dilution. A comparison of the VAF for each of the 22 control variants is shown in Figure 1.3. The amount of DNA input was also a factor in determining the panel’s sensitivity. While 250ng of DNA was the default concentration, an input of 125ng and 75ng yielded a sensitivity of 90% and 95% respectively and observed similar VAF (Figure 1.2F). Only variants that pass filter and have a VAF above 1% are reported, and thus are defined as high confidence (Figure 1.2G). Forced calls, or variants that don’t pass all filters, are reported if there are multiple timepoints for one patient for disease monitoring. Variants with an expected VAF over 2% were successfully observed in 93% of control samples across all dilutions; compared to variants less than 1% were successfully observed in 90% of control samples. Across batches, VAFs of the positive control variants remained consistent, even at lower concentrations of 10% and 5% sample DNA (Figure 1.3). The positive predictive value (PPV) above a 1% expected VAF is 100%. CEPBA had the lowest coverage across samples. In the control samples, CEBPA variant Arg286ProfsTer35 was detected in 77% of all control cases, and over 87% of all cases with a VAF > 2%. 24 Figure 1.2: Quality control benchmarks and limit of detection of PreCISE1 test. Quality control values for the panel in parts describe the (a) mean target coverage (b) total reads captured and (c) the on-target rate across different library different fragmentation methods used in the library preparation steps. Expected and observed VAF for variants that were detected based on the different library preparation methods of (d) DNA fragmentation and (e) DNA input. (f) The percentage of observed expected variants above 1% and 2% VAF are reported. (g) Boxplot for the average depth of coverage across selected 45 genes in the panel. 25 Figure 1.3: Expected vs Observed VAF of Control Samples. VAF of observed control variants across different dilutions of the control samples. Variants above the 1% dashed line are reported, and some are still detected by the pipeline. 26 Figure 1.4 Observed Variants Across All Dilution Series. VAF of observed control variants across unique sequencing runs, with the different amount of DNA and dilutions. Mean and standard deviation of the observed VAF is reported for each control variant used in sequencing of control samples through multiple batches. 27 Figure 1.5: Summary of reported somatic and germline variants from sample cohort at Weill Cornell Medicine. Somatic and germline variants observed in the samples that passed quality controls. (a) Somatic variants are categorized by consequence and ranked according to the most-occurring gene. (b) Germline variants are color-coded by consequence. Variant Calling Analysis Not all patient variants are included in this paper, variants calling from a subset of 410 samples are reported in this paper. We detected 4726 variants prior to applying the filters described in the methods section. After applying all post-variant calling filters 28 and conducting a manual review, 100 samples were found to have 129 somatic mutations in 26 genes covered by the panel (Figure 1.5A). The incidence rate of CH is 24% (100/410 samples) in our cohort. 63 samples have 1 CH variant (Figure 1.6A).The most mutated genes are ASXL1, DNMT3A, and TET2. 78 samples have 1 somatic mutation, 15 have 2 somatic mutations, and 7 samples have 3 or more somatic mutations. Figure 1.6: Number of reported somatic and germline variants. (a) Number of CH and (b) germline variants detected post-filtering for each sample in the discovery cohort. (c) A fisher-test was used to determine the odds ratio of the association between the presence of germline and somatic mutations and calculated for DDR germline mutations and DTA CH mutations. 73 samples (18%) have a total of 123 germline variants in 43 genes covered by the panel (Figure 1.5A). 37 samples have 1 germline mutation, 25 have 2 germline mutations, and 11 samples have 3 or more germline mutations (Figure 1.6B). While the germline variant calling did not have positive control samples for each sequencing batch, 8 patients whose blood were sequenced at two different timepoints reported the 29 same 13 germline variants. 26 samples have germline mutations in DNA Damage Response (DDR) genes ATM, PPM1D, and TP53. p = 0.023 p = 0.003 p = 0.022 Any Germline Any CH DDR Germline Any CH DDR Germline DTA CH 0 1 2 Odds Ratio (95% CI) G ro up Figure 1.7: Associations between reported germline and CH variant. A fisher-test was used to determine the odds ratio of the association between the presence of germline and somatic mutations, and calculated for DDR germline mutations and DTA CH mutations. A fisher-test was used to evaluate the odds ratio of acquiring CH given the presence of a germline variant (Figure 1.7). Across all germline, there is no statistical increase or decrease of odds for acquiring CH (OR = 1.05, p = 0.726, CI: 0.79-1.40). DNA- Damage Response (DDR) genes TP53, PPM1D, ATM, and CHEK2 are associated with progression of clonal hematopoiesis and hematological malignancies21. There is a trend towards increase in odds of presence of CH for the subset of samples with germline mutations in DDR genes (OR = 1.76, p = 0.039, CI: 1.03-2.21) and a subset of samples with DDR germline mutations and somatic variants in DNT3A, TET2, or ASXL1 (DTA) genes (OR = 2.00, p = 0.022, CI: 1.00-2.29). 30 Discussion We developed the PreCISE1 as an end-to-end assay and pipeline for detecting CH from patient samples to clinically relevant germline and somatic variant calls. The assay and pipeline consider quality metrics as well as known variants from literature and databases of large cohorts of cancer patients. The additional filtering steps that incorporate clinical knowledge make the remaining variants are clinically relevant for clonal hematopoiesis and associated co-morbidities. Through dilution series of AML cell lines, we demonstrated that the pipeline could detect somatic variants at low VAF accurately (93% of expected variants above 1%). Applications of method across various co-morbidities Other NGS-based assays have been used to detect variants in solid malignancies, and the corresponding tests have been commercialized as personalized cancer genomic profiling tests. While reporting tumor-related variants, clonal hematopoiesis variants are often discovered as false positives22,23,24. As a result, many assays detect clonal hematopoiesis because the variants are not tumor-derived; however, it is not well studied if these variants influence the tumor biology, and thus are detected only to be thrown out. Previous studies have reported certain combination of CH and types of treatments can lead to expansion of the CH clones, and thus a secondary hematological malignancy25. As a result, these variants should be included in treatment decision making as more knowledge of the influence of the variants are 31 discovered. More research needs be done on the connection between clonal hematopoiesis and solid tumors. The PreCISE1 targeted panel and pipeline has been used for understanding the associations between CH and co-morbidities such as hematological malignancies and solid tumor cancers. Through sequencing multiple timepoints of blood draws and the bone marrow, the panel derived insights into how CH clones disseminate unproportionally throughout the hematological compartment8. PreCISE1 was also used to track clonal hematopoiesis in patients with neuroendocrine tumors treated with peptide receptor radionuclide therapy to evaluate outcomes of therapy-related neoplasms26. Limitations The PreCISE1 panel is limited to the subset of genes known to be associated with leukemia, cardiovascular risk, and tumorigenesis risk. Genes related with other co- morbidities shown to associated with CH are not included in the targeted panel. For example, TCL1A and TERT germline variant have been shown via a GWAS to increase the risk of CH27,28, but we cannot evaluate if patients in our cohort have germline variants in genes beyond the panel coverage. 32 REFERENCES 1. Jaiswal, S. et al. Age-Related Clonal Hematopoiesis Associated with Adverse Outcomes. N. Engl. J. Med. 371, 2488–2498 (2014). 2. Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019). 3. Desai, P. et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 24, 1015–1023 (2018). 4. Assmus, B. et al. Clonal haematopoiesis in chronic ischaemic heart failure: prognostic role of clone size for DNMT3A - and TET2 -driver gene mutations. Eur. Heart J. 42, 257–265 (2021). 5. Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015). 6. Chan, I. C. C. et al. ArCH: Improving the performance of clonal hematopoiesis variant calling and interpretation. Bioinformatics btae121 (2024) doi:10.1093/bioinformatics/btae121. 7. Vlasschaert, C. et al. A practical approach to curate clonal hematopoiesis of indeterminate potential in human genetic data sets. Blood 141, 2214–2223 (2023). 8. Osman, A. E. G. et al. Paired bone marrow and peripheral blood samples demonstrate lack of widespread dissemination of some CH clones. Blood Adv. 7, 1910–1914 (2023). 9. Singh, A. S. et al. DNMT3A and TET2 mutant Clonal Hematopoiesis May Drive a Proinflammatory State and Predict Enhanced Response to Immune Checkpoint Inhibitors. Blood 138, 4295–4295 (2021). 33 10. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). 11. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM. Preprint at http://arxiv.org/abs/1303.3997 (2013). 12. Chen, S. et al. Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data. BMC Bioinformatics 20, 606 (2019). 13. Gregory, T. et al. Characterization and mitigation of fragmentation enzyme- induced dual stranded artifacts. NAR Genomics Bioinforma. 2, lqaa070 (2020). 14. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108–e108 (2016). 15. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016). 16. Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–D947 (2019). 17. The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). 18. Picard toolkit. Broad Institute, GitHub repository (2019). 19. Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018). 20. Behdad, A., Weigelin, H. C., Elenitoba-Johnson, K. S. J. & Betz, B. L. A Clinical Grade Sequencing-Based Assay for CEBPA Mutation Testing. J. Mol. Diagn. 17, 76–84 (2015). 34 21. Bowman, R. L., Busque, L. & Levine, R. L. Clonal Hematopoiesis and Evolution to Hematopoietic Malignancies. Cell Stem Cell 22, 157–170 (2018). 22. Finkle, J. D. et al. Validation of a liquid biopsy assay with molecular and clinical profiling of circulating tumor DNA. Npj Precis. Oncol. 5, 63 (2021). 23. Ptashkin, R. N. et al. Prevalence of Clonal Hematopoiesis Mutations in Tumor- Only Clinical Genomic Profiling of Solid Tumors. JAMA Oncol. 4, 1589 (2018). 24. Severson, E. A. et al. Detection of clonal hematopoiesis of indeterminate potential in clinical sequencing of solid tumor specimens. Blood 131, 2501–2505 (2018). 25. Coombs, C. C. et al. Therapy-Related Clonal Hematopoiesis in Patients with Non- hematologic Cancers Is Common and Associated with Adverse Clinical Outcomes. Cell Stem Cell 21, 374-382.e4 (2017). 26. Singh, A. et al. Mutant PPM1D - and TP53 -Driven Hematopoiesis Populates the Hematopoietic Compartment in Response to Peptide Receptor Radionuclide Therapy. JCO Precis. Oncol. e2100309 (2022) doi:10.1200/PO.21.00309. 27. Silver, A. J., Bick, A. G. & Savona, M. R. Germline risk of clonal haematopoiesis. Nat. Rev. Genet. 22, 603–617 (2021). 28. Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586, 763–768 (2020). 35 CHAPTER 2 CLONAL HEMATOPOIESIS IS ASSOCIATED WITH CHANGES IN THE SOLID TUMOR MICROENVIRONMENT Summary Clonal hematopoiesis (CH), a condition characterized by somatic mutations in blood cells, has been associated with increased cancer risk and adverse outcomes in solid tumors. This study investigates the prevalence and impact of CH on the tumor microenvironment (TME) and clinical outcomes across 24 cancer types in The Cancer Genome Atlas (TCGA) cohort. CH was identified in 5% of patients (280/5607) using whole-exome sequencing, with DNMT3A, TET2, and ASXL1 (DTA mutations) being the most frequently mutated genes. CH incidence increased with age and varied significantly across cancer types. Through deconvolution of RNA-sequencing data, we found CH to be associated with distinct alterations in TME composition, including decreased tumor fraction, increased fibroblast infiltration, and trends towards elevated leukocyte proportions. In lung adenocarcinoma (LUAD) and glioblastoma (GBM), CH-positive patients exhibited decreased infiltration of key immune cell populations, such as CD8+ T cells and T regulatory cells, correlating with worse survival outcomes. Specifically, LUAD patients with DNMT3A mutations and high monocyte infiltration had significantly poorer survival, indicating a pro-tumor inflammatory environment. 36 CH was shown to be linked to reduced overall survival (OS) and progression-free intervals (PFI) in pan-cancer analyses, with particularly pronounced effects in LUAD, GBM, and colon adenocarcinoma (COAD). These findings highlight the multifaceted role of CH in modulating immune infiltration and tumor progression, offering insights into its contribution to the immunosuppressive TME and unfavorable clinical outcomes in solid tumor patients. Introduction Clonal hematopoiesis (CH) is described as the phenomenon where hematopoietic stem cells acquire advantageous somatic mutations that expand into a dominant subclonal population. Furthermore, these mutations, mostly in epigenetic and DNA-damage response genes, are passed to progeny and circulate through the body, and thus have the potential to alter function of the immune system. These subclonal populations of mutated immune cells are linked to adverse outcomes, such as elevated risk of hematologic malignancies and cardiovascular disease1. CH is common and associated with adverse outcomes in non-hematological cancers2. In the Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) cohort, 26.5% of solid tumor cancer patients harbored CH mutations3. In the Trans-Omics for Precision Medicine (TOPMED) cohort, CHIP was associated with higher risk of cancer but not colorectal or lung cancer4. While patients with solid tumors treated with cytotoxic therapies are at greater risk of obtaining a 37 therapy-related myeloid neoplasm (tMN), in two separates solid tumor cancer cohorts, mortality was likely due to progression of the primary solid tumor rather than a transformation to MDS/AML2,5. In the MSK-IMPACT cohort, canonical DTA-CH mutations were not associated with increased risk of secondly hematological malignancies6. It is not well understood why CH leads to larger mortality rates in cancer patients. The 3 canonical mutations for clonal hematopoiesis are DNTM3A, TET2, and ASXL1, also known as “DTA” mutations. They represent the genes where a majority of CH variants are observed1. These DTA genes play essential roles in regulating DNA methylation, epigenetic programming, and gene expression in hematopoietic cells. DNMT3A (DNA Methyltransferase 3 Alpha) encodes an enzyme involved in adding methyl groups to DNA, a process called de novo DNA methylation7. On the contrary, TET2 (Ten-Eleven Translocation Methylcytosine Dioxygenase 2) encodes for a protein that catalyzes the oxidation of 5-methylcytosine, a crucial step in DNA demethylation7. ASXL1 (Additional Sex Combs Like 1) influences chromatin structure and transcriptional regulation7. 2 additional genes important to this paper are PPM1D and TP53, which are important to regulating the DNA-Damage Response (DDR). TP53 encodes the p53 protein, which is a crucial tumor suppressor that responds to cellular stress, such as DNA damage, by halting cell division to allow for DNA repair or initiating apoptosis if the damage is irreparable. PPM1D (Protein Phosphatase, Mg2+/Mn2+ Dependent 1D) encodes a phosphatase enzyme that helps in deactivating certain proteins involved in cellular stress responses, especially in the 38 p53 pathway. This enzyme acts as a negative regulator of p53 by dephosphorylating it, thus preventing the p53 protein from triggering cell cycle arrest or apoptosis. In the context of cardiovascular diseases, mice with TET2 mutations have increased infiltration of macrophages with pro-inflammatory markers, worsening the atherosclerotic plaque and heart failure8. Similarly, several studies explore how infiltrating leukocytes impact treatment outcome in cancer patients9,10. Thorsson et al. explored how variation in tumor somatic mutations, gene expression, immune cell type composition, TCR diversity affect overall survival in The Cancer Cell Atlas (TCGA) cohort11. Lowest overall survival was in tumors dominated by M2 macrophages in low-grade gliomas and tumors that have the highest relative infiltration of M1 macrophages and CD8 T cells and high TCR diversity. Few studies address the presence and role of CH in malignant transformation or mortality from solid tumors, such as if and how progression of CH alters the tumor microenvironment (TME). CH mutations in infiltrating leukocytes have been studied in the context of breast cancer, where tumor infiltrating leukocytes were isolated and observed to have a higher frequency of CH than the leukocytes in peripheral blood12. Kleppe et al. suggests somatic mutations in infiltration leukocytes have the potential to affect the tumor microenvironment. Another study observed CH-mutated immune cells infiltrating the TME in non-small cell lung cancer (NSCLC). Individuals with CH-infiltration of the tumor microenvironment had worse odds of adverse outcomes13. 39 In cohort of 103 ovarian cancer patients, CH is associated with worse PFS and trending towards worse OS14. Detecting CH can be an important step in treatment decision making. In the CANTOS trial, individuals who were administered canakinumab has less likelihood of developing non-hematological malgnancies15. Canakinumab is a potential treatment for preventing cancer and and proof that IL1β blockade can prevent disease effects of CH mutations. Colchicine prevented atherosclerosis progression in TET2+ mice models and patients in a retrospective analysis of the UK Biobank16. In this study, we investigate the role of clonal hematopoiesis in the immune system of solid tumor patients. Using DNA derived from blood-normal samples in the Cancer Genome Atlas (TCGA) cohort, we perform variant calling to identify CH variants. By characterizing the association between CH and leukocyte infiltration, we aim to uncover how CH may contribute to adverse outcomes in cancer patients, with a particular focus on the influence of CH on anti-inflammatory cell types within the tumor microenvironment. Methods TCGA Dataset From the TCGA cohort, patients were stratified on whether they had both tumor and blood matched samples. Individuals with hematological malignancies were removed 40 from any analysis. Clinical data (age, sex, treatment outcomes) were used from the original TCHA cohort. Imputed ethnicities were used instead of reported ethnicities17. Calling CH in the TCGA Dataset From BAM files of Whole Exome Sequencing (WES) of the blood-normal samples, variant calling pipeline was used to detect somatic mutations. Variant calling pipeline follows the same steps from a previous study. Briefly, variants were called using MUTECT2, annotated with Functotator, and filtered for coding-region mutations in known CH-driver genes. Variants detected in the blood were then called for in the tumor samples, using MUTECT2 and –L parameter to specify loci of CH mutations detected from blood WES. VAF cut of 2% due to lower sequencing depth in WES. Modeling Tumor Infiltration of Immune Cell Types Kassandra was used for cell-type deconvolution from RNASeq of tumor samples. The dataset for infiltration within tumor tissues from TCGA project predicted by Kassandra can be downloaded from https://www.science.bostongene.com/kassandra/downloads18. A penalized rank- normalized logistic model was used to evaluate whether infiltration of immune cell types was associated in cancers due to a CH mutation. The model was corrected with age, gender, and tumor stage as co-variates. 41 Total leukocyte proportion was calculated by combining the proportions of NK Cells, B cells, T cells, Neutrophils, Monocytes, and M0 macrophages. Total lymphocyte fraction was already measured by Kassandra. Associations between Outcomes and CH Cox models were used to find associations between CH and 4 outcomes: overall survival (OS), disease specific survival (DSS), progression-free interval (PFI), and disease-free interval (DFI). The cox model included the following co-variates: age, cancer type, gender (where applicable), clinical stage, and treatment outcome. P- values were adjusted for multiple-test correction via the Benjamini-Hochberg procedure. The models also calculated effects of different CH genes, and the genes were classified into categories: DNMT3A, TET2, ASXL1, JAK2, Splicing factors (SRSF2), DNA-Damage Response (DDR: PPM1D or TP53), Multiple (patients with multiple CH mutations), and Other. For further analysis, only CH genes with at least N = 3 for at least 1 cancer type was included. As a result, DNMT3A, TET2, and DDR remained for further analysis. 6 cancer types had both a N>3 for DNMT3A mutations and a N>3 mutations: BRCA, COAD, GBM, HNSC, LUAD, and LUSC. Interaction Model between Survival, CH and Cell-type Infiltration Cox proportional hazards regression model was used to model the effects of CH on survival outcomes with cell type-specific infiltration as an interaction term and age 42 and gender as co-variates. The model iterated through each combination of cancer type, cell type, and CH mutation. P-values were adjusted for multiple-test correction via the Benjamini-Hochberg procedure. Gender only included where applicable. Only included Cancer and Classification combinations with N > 3 were included in the analysis, as used for the outcomes survival models. Differential Gene Expression and Gene Set Enrichment Analysis (GSEA) Differential gene expression was measured via DESeq2 from gene counts from RNAseq of tumor samples19. GSEA was employed to compare the expression patterns according to CHIP status based on their concordance to the hallmark pathways in the Molecular Signatures Database (MSigDB)20 (36). For GSEA of scRNA-seq data, we used gseGO from clusterProfiler21 for Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) ontologies. P-values were adjusted via Benjamini-Hochberg procedure and had a cut-off of 0.05. HLA Typing and Associations with CH OptiType was used for HLA Typing, downloaded from Genomic Data Commons (GDC)11. On a per-cancer basis, a logistic regression model was used to evaluate the odds of CH for MHC Class-I HLA subtypes A, B, and C and their major alleles. Only the major alleles were used for the model, as including the minor alleles would reduce the statistical power. Samples with inconsistent or missing HLA allele calls across multiple OptiType calls were removed from analysis. Hazard ratios were characterized, and p values were adjusted for multiple test corrections. 43 Results CH in TCGA Cohort Figure 2.1: Summary of CH Variants in the TCGA Cohort. (a) The proportion of patients with CH increases with age (b) The proportion of patients with CH varies among solid tumor cancer types (c) Age of individuals varies for each cancer type (d) A summary of the CH mutations detected in the cohort characterized by gene name, cancer type, mutation type, and VAF. CH status was characterized for 5607 patients within the TCGA cohort, across 24 cancer types, that have both tumor and matched-blood DNA sequencing. 5% of the TCGA cohort (280/5607) have detected at least 1 CH mutation from the blood sample, with a total of 300 CH mutations detected. The incidence of CH increases with age 44 (Figure 2.1A), a trend that consistent with other cohorts1,3. The proportion of individuals with CH is different across the cancer types, with Uterine Carcinosarcoma (UCS), Colon adenocarcinoma (COAD), and Kidney Chromophobe (KICH) having the highest proportions (Figure 2.1B). This can be attributed to age of the cancer cohorts (Figure 2.1C). For example, UCS and COAD have the highest proportion of CH+ patients, but also these two cohorts an older age compared to the median of the entire TCGA cohort. Cancer types with CH+ proportions that are not associated with age are: PRAD, LIHC, KIRP, OV, GBM, and SARC. Breast invasive carcinoma (BRCA) and Uterine Corpus Endometrial Carcinoma (UCEC) have the largest number of CH-positive patients with 55 and 46 patients, respectively. The most commonly mutated genes are DNMT3A (106 patients, 38%) , TET2 (23%), and ASXL1 (9%), which are the canonical DTA mutations expected with CH (Figure 2.1C). The mean VAF across all 300 CH variants is 17.3%. A breakdown of VAF for each gene is in Figure 2.1A. Figure 2.2: Distribution of VAF. (a) Breakdown from VAF for each gene mutated. Effects on CH on Tumor Infiltration and Outcomes 45 Figure 2.3: Effects of CH on the Tumor Microenvironment in Pan-Cancer and Lung Adenocarcinoma. Based on CH status, this study measured changes in proportion of (a) major cell types present in the tumor microenvironment pan-cancer (b) Kaplan- meier curve depicting the overall survival odds of patients with and without CH across all cancer types within the TCGA cohort and overall survival probability with a cutoff of VAF > 20% for patients with CH. (c) Summary of odds ratios from cox proportional hazard models for overall survival (OS), disease specific survival (DSS), progression-free interval (PFI), and disease-free interval (DFI) across all cancers (meta-analysis) and 6 cancer types. The model predicted the associated between CH and outcomes, while correcting for age and gender as co-variates. Significant stars note adjusted p-values. (d) Heatmaps from penalized generalized linear model that modeled effects of CH positive status and infiltrating immune cell types for each 46 cancer type and each cancer type. Model corrected for age and gender with Z-scores and adjusted p-values. To explore the tumor microenvironment, Kassandra was used for cell type deconvolution on the RNA-sequencing of the tumor biopsy samples. The 4 major cell types within the tumor microenvironment are tumor, leukocytes, fibroblasts, and endothelium cells. In tumors with clonal hematopoiesis, labeled CH+, across solid tumor samples, there is a decrease total cancer cell proportion (p = 0.0351, Wilcoxon- test), trending increase in leukocyte fraction (p = 0.114, Wilcoxon-test), increase in fibroblasts proportion (p = 2.86 x 10^-6, Wilcoxon-test), and decrease in endothelium proportion (p = 0.0471, Wilcoxon-test) (Figure 2.3A). Exploring all cancer types separately, leukocyte fraction remains consist between patients with and without CH, with non-significant differences in the proportions (Wilcoxon-test, BH-adjusted p- values) (Figure 2.4). Figure 2.4: Total Leukocyte Fraction. Barplot of total leukocyte cell type proportion in CH+ and CH- patients across 22 cancer types studied from TCGA cohort with at least 2 CH+ patient 47 A cox model was used to evaluate the effects of CH on 4 reported cancer outcomes: overall survival (OS), disease specific survival (DSS), progression-free interval (PFI), and disease-free interval (DFI) across all cancers. The model included co-variates of age, gender, cancer type, specific CH gene mutated. Kaplan-Meier curves show a trend in a decrease in survival probability (p = 0.13) and decrease in progression free interval (p=0.13). When comparing pan-cancer CH-positive patients with a VAF of at least 20%, there is a significant decrease in OS and PFI (p = 0.019 and 0.054, respectively) (Figure 2.3B). Effects of CH were calculated for 6 cancers and CH- specific mutations (Figure 2.3C). Meta-analysis refers to the pan-cancer cox model in Supplemental Figure 2. Lung adenocarcinoma (LUAD), glioblastoma (GBM) and Colon adenocarcinoma (COAD) has significant decrease in overall survival probability, but only GBM and COAD has a decrease in disease-specific survival. Other cancer types were not included due to low power, as they had too few mutations for each of the CH genes (see Methods). To understand the effects of CH on leukocyte infiltration in the tumor microenvironment across all cancer types, A rank-normalized generalized linear model was used to find associations between CH positive status and survival outcomes, with infiltrating cell type proportion as an interaction term. The model was corrected for co-variates age and gender. P-values were adjusted with the Bonferroni method. The effect of CH and cell type fraction varied across cancer types. The model also calculated effects of different CH genes, and the genes were classified into 48 categories: DNMT3A, TET2, ASXL1, JAK2, Splicing factors (SRSF2), DNA- Damage Response (DDR: PPM1D and TP53), Multiple (patients with multiple CH mutations), and Other (Figure 2.3D). The categories JAK2, Splicing factors, and Other did not have significant associations with cell type infiltration proportions, likely due to low statistical power. Focusing on LUAD Figure 2.5: Effects of CH on the Tumor Microenvironment in Lung Adenocarcinoma. Stratified by CH variant present, changes in specific leukocyte cell types were measured for the Lung Adenocarcinoma (LUAD) cohort, from a rank-normalized inverse transformation generalized linear model. For DNMTEA-positive LUAD patients, plots show (a) effects of CH status and changes in cell type proportions on overall survival are shown via a Kaplan-Meier curve (b) effects on the differential gene expression and (c) gene-set enrichment analysis of pathways that were up- and down-regulated (d) and cell type proportions for B-cells, CD8+ T-cells, lymphocytes, M0 macrophages, monocytes, and Tregs are shown, stratified for CH status. (e) Odds of MHC-Type 1 HLA major alleles present with a positive CH status in all LUAD patients. Within the lung adenocarcinoma (LUAD) cohort of 345 patients, 17 patients had CH, where 35% of CH+ patients harbored a TET2 (n = 6) mutation, 35% harbored a 49 DNMT3A (n = 6) mutation, and 17% (n = 3) had mutation in a DDR gene. A CH mutation was associated with a significant decrease in survival odds (p adj < 0.01 respectively) (Figure 2.5A). In LUAD patients with the DNMT3A mutation, a higher proportion of monocytes is associated with significantly worse survival (702.18 (HR > 1), with a significant p-value (0.00089) (Figure 2.5A). Decrease in CD8 T cells, N Cells, and Tregs also resulted in statistically significant survival probability in LUAD patients with a DNMT3A mutation. A mutation in DNMT3A, TET2, and DDR genes were statistically associated with increase proportion of infiltrating lymphocytes, T cells, and M0 macrophages (Figure 2.5D). DNMT3A was also associated with increase infiltration of Tregs while TET2 was associated with an increase in M2 macrophages. A DNMT3A mutation was not associated with changes in proportion of Monocytes, CD8 T cells, or B cells (Figure 2.5D). Figure 2.6: Impact of CH variants on pan-cancer differential gene expression. (a) Differential gene expression in sold tumor patients with and without DNMT3A somatic mutations. Gene expression is relative to CH-negative solid tumors. (b) Gene- set enrichment analysis of significant differentially expression genes in CH-positive patients. 50 To determine the impact of CH variants on gene expression in the tumor microenvironment, differential gene expression was conducted from RNAseq-derived gene counts from the tumor on a per-CH variant basis. 346 genes were up- or down- regulated in DNMT3A-positive patients compared to CH-negative patients (Figure 2.6A). Gene-set enrichment analysis (GSEA) revealed pathways that are differentially expressed in presence of specific CH mutations. A DNMT3A mutation resulted in decreased expression of pathways involved in chromatin separation, chromosome organization, and mitotic cell division (Figure 2.6B). In DNMT3A-positive LUAD patients, 32 genes were differentially expressed (Figure 2.5B). GSEA revealed pathways for leukocyte-mediated immunity, adaptive immunity, and leukocyte activation were significantly suppressed (Figure 2C). Figure 2.7: Odds of CH and specific MHC Type 1 HLA major alleles. Forest plot is based on logistical regression to measure odds of CH given a specific MHC Type 1 HLA major allele and is stratified by cancer type. UCEC THCA STAD SARC READ LUSC LUAD LGG KIRP HNSC COAD CESC BRCA BLCA 0 25 50 75 100 A*30B*44A*34C*18 B*35B*37B*41 B*37 A*66A*01B*57B*18 A*36C*12A*25B*18 C*06A*11B*50A*34 B*38B*13B*49A*30 B*08A*02A*11B*45 A*25B*18 C*05A*24 A*32A*23 A*26B*58B*18 B*45A*30 B*35B*42C*17A*68B*18 Odds Ratio UCEC THCA STAD SARC READ LUSC LUAD LGG KIRP HNSC COAD CESC BRCA BLCA 0 25 50 75 100 A*30B*44A*34C*18 B*35B*37B*41 B*37 A*66A*01B*57B*18 A*36C*12A*25B*18 C*06A*11B*50A*34 B*38B*13B*49A*30 B*08A*02A*11B*45 A*25B*18 C*05A*24 A*32A*23 A*26B*58B*18 B*45A*30 B*35B*42C*17A*68B*18 Odds Ratio UCEC THCA STAD SARC READ LUSC LUAD LGG KIRP HNSC COAD CESC BRCA BLCA 0 25 50 75 100 A*30B*44A*34C*18 B*35B*37B*41 B*37 A*66A*01B*57B*18 A*36C*12A*25B*18 C*06A*11B*50A*34 B*38B*13B*49A*30 B*08A*02A*11B*45 A*25B*18 C*05A*24 A*32A*23 A*26B*58B*18 B*45A*30 B*35B*42C*17A*68B*18 Odds Ratio UCEC THCA STAD SARC READ LUSC LUAD LGG KIRP HNSC COAD CESC BRCA BLCA 0 25 50 75 100 A*30B*44A*34C*18 B*35B*37B*41 B*37 A*66A*01B*57B*18 A*36C*12A*25B*18 C*06A*11B*50A*34 B*38B*13B*49A*30 B*08A*02A*11B*45 A*25B*18 C*05A*24 A*32A*23 A*26B*58B*18 B*45A*30 B*35B*42C*17A*68B*18 Odds Ratio 51 A logistic regression model was used to evaluate the odds of CH status for MHC Class-I HLA subtypes A, B, and C and their major alleles. Across 5225 TCGA samples, there were 70 unique major alleles for HLA type A (20 alleles), B (35 alleles), and C (17 alleles) (Figure 2.7). 13 MHC HLA-A, 14 HLA-B, and 5 HLA-C major alleles were associated with a positive CH status. HLA-A*30 allele was associated with CH in BLCA, THCA, and LGG cancers. HLA-B*18 allele has highest odds of occurrence in HNSC, LUSC, STAD, COAD, and UCEC cohorts. For LUAD, 4 HLA major alleles has significantly higher odds ratios: B*45, A*11, A*02, B*08 (Figure 2.5E). Other Cancers Figure 2.8: Effects of CH on the tumor microenvironment of GBM. (a) Changes in specific leukocyte cell types were measured for the Glioblastoma (GBM) cohort, from a rank-normalized inverse transformation generalized linear model. Z-scores represent large changes in proportion compared to the mean, and p-values are adjusted. For all CH-positive and TET2-positive GBM patients, (b) cell type proportions for M0 macrophages, M2 macrophages, plasma B cells, and Tregs are shown, stratified for CH status (c) effects of CH status and changes in cell type proportions on overall survival are shown via a Kaplan-Meier curve (d) effects on the differential gene expression and (e-f) gene-set enrichment analysis of expression pathways that were up- and down-regulated. 52 In the GBM cohort of 145 patients, 13 have CH, where 3 (23%) have a DNMT3A mutation, 3 (23%) have a TET2 mutation, (7.7%) 1 has a DDR mutation, 3 (23%) have more than 1 mutation, and 3 (23%) have other mutations. A CH mutation in the GBM cohort is associated with worse overall survival and disease specific survival outcomes (Figure 2.8A). A mutation in TET2 and DBMT3A was associated with a decrease in infiltration of several immune cell types: plasma B cells, T helper cells, Tregs, and Monocytes, and an increase in M0 macrophages (Supplemental Figure 5 A- B; Supplemental Figure 4). TET2 mutation was also associated with a decrease in CD4 T cells. TET2 mutation in the GBM cohort had significant decrease in OS and DSS probability (Figure 2.8C). When accounting for cell types as an interaction term, a TET2 mutation with lower Tregs was associated with worse survival outcomes (padj = 0.0011). TET2 mutation in the GBM cohort also led to worse outcomes with lower median proportions of plasma B cells (p adj = 0.0013), M2 macrophages (p = 0.0062), M0 macrophages (p adj = 0.007), Neutrophils (p adj = 0.0077), PD-1+ CD8 T Cells (p adj = 0.012), and T Cells (p adj = 0.013) (Figure 2.8B). In GBM patients, GSEA revealed that 11 genes had increased expression in CH-positive samples, however, they were not linked to any specific pathways. Colon adenocarcinoma (COAD) has CH incidence rate of 11.7% (28/238), where the most common CH mutations are DNMT3A (n = 7) and TET2 (n = 5). COAD patients had worse overall and disease-specific odds, even after correcting for age and gender (?) No specific CH mutation was associated with outcomes. TET2 mutations were associated with decreased infiltration of NK cells, B cells, neutrophils, and monocytes. 53 DNMT3A mutations were associated with increased infiltration of lymphocytes, T cells, and CD4 T cells and decreased infiltration of NK cells and monocytes. CH- positive patients with higher lower non-plasma B cells and PD-1-negative cells trends towards but is not significant with lower survival probability (?). Breast invasive carcinoma (BRCA) has the largest number of CH patients in TCGA. While a positive CH status does not confer worse outcomes, there is a trend for DDR mutations (p = 0.056). 6 of the 55 CH-positive BRCA patients have a DDR mutation. The most common mutations are DNMT3A (n = 22), TET2 (n = 9), DDR (n = 6), and ASXL1 (n = 2). Infiltration of NK cells, neutrophils, and monocytes is lower and infiltration of lymphocytes, T cells is higher in the CH-positive BRCA subset. No specific cell types conferred a significant change in survival outcomes. Figure 2.9: Infiltration of CH in Tumor Microenvironment. Barplot of gene-specific variants detected in BRCA cohort in only peripheral blood (PB) and both PB and tumor biopsy. CH does not alter TCR and BCR diversity, MSI, and TMB 54 Higher TCR and BCR diversity is associated with age in the TCGA cohort. However, there are no statistically significant associations between Shannon entropy, evenness, or richness scores for both TCR and BCRs. CH status did not affect MSI or TMB scores. Discussion To address the influence of CH in solid tumor cancer patients, we sought to determine the following (i) changes in the infiltrating immune cells by deconvolution of the RNAseq (ii) relationship of CH burden and transcriptome profile and (iii) and their potential implications on clinical outcomes. Here we report that CH prevalence increased with age and varied across cancer types. We also showed higher burden of mutations of VAF of greater than 20% was associated with worse survival probability, even after corrected for age. Using RNAseq of the tumor biopsy, we found that CH alters the proportion of cell types in the tumor microenvironment in many solid tumor cancers. Across all cancers, there is a trend towards increased leukocyte fraction and significant increase in fibroblasts. Recent studies explored observed CH-mutated immune cells infiltrating the TME in non-small cell lung cancer (NSCLC). Individuals with CH-infiltration of the tumor microenvironment had worse odds of adverse outcomes13. Likewise, we observed that myeloid cells from LUAD patients with CH had worse outcomes in the TCGA cohort. In LUAD, TET2-positive and DNMT3A-positive patients has decrease proportion of lymphocytes, T cells, CD4 T cells, and M0 macrophages, which suggests an anti- 55 inflammatory (pro-tumor) environment, where immune suppression may allow the tumor to grow more freely without being attacked by the immune system. DNMT3A+ LUAD patients with high monocytes, low CD8+ T cells, low Tregs, and low had even worse survival probability. High monocytes might contribute to a pro-tumor environment, particularly if they are differentiating into tumor-associated macrophages. Differential gene expression highlighted a DNMTA+ patients expression suppression of their leukocyte mediated immunity, leukocyte activate, and lymphocyte activation pathways. Overall, a CH positive status suggests a predominantly pro-tumor (anti-inflammatory) environment with weak immune surveillance and poor tumor-killing capacity in lung adenocarcinoma, which could explain the lower survival probability. Another study of 135 patients with invasive gliomas, CH mutations correlated with poorer outcomes but were limited to genes outside the canonical CH mutations as their targeted panel did not include ASXL1, DNMT3A, PPM1D, and TET2 genes in their study22. Another study showed that lowest overall survival was in tumors dominated by M2 macrophages in low-grade gliomas and tumors that have the highest relative infiltration of M1 macrophages and CD8 T cells and high TCR diversity11. Similarly, the CH-positive GBM patients had worse survival outcomes. In the GBM cohort, we showed that CH was associated with decreased plasma B cells, T helpers, T regs and monocytes and increased M0 macrophages. Decreased plasma B cells, T helper cells, and monocytes suggest an overall reduction in immune activation and surveillance, which favors a pro-tumor (anti-inflammatory) environment. TET2+ GBM patients 56 with lower plasma B cells, and lower M0 macrophages proportions are also associated with lower survival probability. Previous associations between Colon adenocarcinoma (COAD) and CH have not been described. In our analysis, CH+ COAD patients have worse outcomes and altered immune state in the tumor microenvironment. In another cohort of 103 ovarian cancer (OV) patients, CH is associated with worse PFS and trending towards worse OS14. We did not observe this finding in OV, due to a small number of positive CH patients (n = 9/149, 6%) in the OV cohort. Increase in fibroblasts across all tumors can be due to Cancer-Associated Fibroblasts (CAFs) that remodel the tumor microenvironment to support growth and metastasis. CH has been correlated with an increase in fibroblasts in the context of cardiovascular disease patients in the CANTOS trial23. During the adaptive immune response, V(D)J recombination results in the highly diverse repertoire of antibodies/immunoglobulins and T cell receptors (TCRs) found in B cells and T cells, respectively. Previously, TCR and BCR repertoire analysis revealed variance in repertoire diversity across the 33 tumor types in the TCGA cohort14. Why certain tumors have less diversity in TCR and BCR repertoire is not well understood, however the authors suggest differential B-cell infiltration and clonal expansion may be possible causes. Our results did not support significant difference between CH prevalence and diversity of the TCR and BCR repertoire. 57 Major histocompatibility complex (MHC) class I human leukocyte antigen (HLA) alleles play a critical role in antigen presentation during self or non-self-immune recognition processes that ultimately initiate the adaptive immune response. Due to selective pressure of the immune system, cancer cells with HLA alleles that are less expressed on the cell surface are better able to evade the adaptive immune response and persist as the tumor matures and selection for those alleles leads to a loss of HLA heterozygosity13. We did observe several HLA alleles that are associated with CH. In another study, higher HLA presentation was associated with older age in a pan-cancer analysis24. Limitations of this study include tumor sample purity and sample size limitations influencing analysis of infiltrating leukocyte proportions in the tumor microenvironment. Low sample sizes limited analysis to a small subset of cancer types. Infiltration was assessed via deconvolution of RNA-seq, where some cell types are difficult to deconvolute or differentiate from one another. Furthermore, the absence of single-cell analysis prevents pinpointing the source of dysregulated gene expression pathways in CH patients. As future large-scale cancer datasets become available, they will enable more in-depth investigations into CH and its impact on the tumor microenvironment (TME). Furthermore, CH status was assessed from WES, so smaller clones were not detected and many patients with CH are misclassified. Deeper and more targeted sequencing of CH is warranted, especially to further characterize the relationship between lower VAF CH and outcomes. 58 REFERENCES 1. Jaiswal, S. et al. Age-Related Clonal Hematopoiesis Associated with Adverse Outcomes. N. Engl. J. Med. 371, 2488–2498 (2014). 2. Coombs, C. C. et al. Therapy-Related Clonal Hematopoiesis in Patients with Non- hematologic Cancers Is Common and Associated with Adverse Clinical Outcomes. Cell Stem Cell 21, 374-382.e4 (2017). 3. Ptashkin, R. N. et al. Prevalence of Clonal Hematopoiesis Mutations in Tumor- Only Clinical Genomic Profiling of Solid Tumors. JAMA Oncol. 4, 1589 (2018). 4. Desai, P. et al. Association of clonal hematopoiesis and mosaic chromosomal alterations with solid malignancy incidence and mortality. Cancer cncr.35455 (2024) doi:10.1002/cncr.35455. 5. Chien, K. S. et al. Cancer patients with clonal hematopoiesis die from primary malignancy or comorbidities despite higher rates of transformation to myeloid neoplasms. Cancer Med. 13, e7093 (2024). 6. Stonestrom, A. J. et al. High risk and silent clonal hematopoietic genotypes in patients with non-hematologic cancer. Blood Adv. bloodadvances.2023011262 (2023) doi:10.1182/bloodadvances.2023011262. 7. Gaulin, C., Kelemen, K. & Arana Yi, C. Molecular Pathways in Clonal Hematopoiesis: From the Acquisition of Somatic Mutations to Transformation into Hematologic Neoplasm. Life 12, 1135 (2022). 8. Fuster, J. J. et al. Clonal hematopoiesis associated with TET2 deficiency accelerates atherosclerosis development in mice. Science 355, 842–847 (2017). 59 9. Del Prete, A., Schioppa, T., Tiberio, L., Stabile, H. & Sozzani, S. Leukocyte trafficking in tumor microenvironment. Curr. Opin. Pharmacol. 35, 40–47 (2017). 10. Lança, T. & Silva-Santos, B. The split nature of tumor-infiltrating leukocytes: Implications for cancer surveillance and immunotherapy. Oncoimmunology 1, 717–725 (2012). 11. Thorsson, V. et al. The Immune Landscape of Cancer. Immunity 48, 812-830.e14 (2018). 12. Kleppe, M. et al. Somatic mutations in leukocytes infiltrating primary breast cancers. Npj Breast Cancer 1, 15005 (2015). 13. Pich, O. et al. 8O Clinical implications of clonal hematopoiesis in the tumor microenvironment of non-small cell lung cancer. ESMO Open 8, 101654 (2023). 14. Arends, C. M. et al. Dynamics of clonal hematopoiesis under DNA-damaging treatment in patients with ovarian cancer. Leukemia 38, 1378–1389 (2024). 15. Woo, J. et al. Effect of Clonal Hematopoiesis Mutations and Canakinumab Treatment on Incidence of Solid Tumors in the CANTOS Randomized Clinical Trial. Cancer Prev. Res. (Phila. Pa.) OF1–OF8 (2024) doi:10.1158/1940- 6207.CAPR-23-0342. 16. Zuriaga, M. A. et al. Colchicine prevents accelerated atherosclerosis in TET2 - mutant clonal haematopoiesis. Eur. Heart J. ehae546 (2024) doi:10.1093/eurheartj/ehae546. 17. Romanel, A., Zhang, T., Elemento, O. & Demichelis, F. EthSEQ: ethnicity annotation from whole exome sequencing data. Bioinformatics 33, 2402–2404 (2017). 60 18. Zaitsev, A. et al. Precise reconstruction of the TME using bulk RNA-seq and a machine learning algorithm trained on artificial transcriptomes. Cancer Cell 40, 879-894.e16 (2022). 19. Michael Love, S. A. DESeq2. Bioconductor https://doi.org/10.18129/B9.BIOC.DESEQ2 (2017). 20. Liberzon, A. et al. The Molecular Signatures Database Hallmark Gene Set Collection. Cell Syst. 1, 417–425 (2015). 21. Xu, S. et al. Using clusterProfiler to characterize multiomics data. Nat. Protoc. (2024) doi:10.1038/s41596-024-01020-z. 22. Okamura, R. et al. High prevalence of clonal hematopoiesis-type genomic abnormalities in cell-free DNA in invasive gliomas after treatment. Int. J. Cancer 148, 2839–2847 (2021). 23. Li, X. & Clarke, M. C. H. Expansion of fibroblast-like cells may explain the CANTOS meta-analysis findings for patients with clonal hematopoiesis. Nat. Cardiovasc. Res. 3, 23–25 (2024). 24. Castro, A. et al. Strength of immune selection in tumors varies with sex and age. Nat. Commun. 11, 4128 (2020). 61 CONCLUSION Aim 1: Novel End-to-End Assay and Pipeline for CH Detection In the first aim of my thesis, the PreCISE1 assay and pipeline were develop to accurately and consistently detect clonal hematopoiesis (CH) and germline mutations at low variant allele fractions (VAF). This is the first assay designed to explore CH and its associated comorbidities. It integrates quality metrics and known variants from literature and cancer patient databases to ensure clinical relevance, focusing on CH- associated co-morbidities. Validation with AML cell lines showed that the pipeline reliably detects somatic variants at low VAFs, achieving 93% accuracy for variants above 1% VAF. PreCISE1 addresses a gap in current NGS-based assays that allow for both discovery of new variants and characterization of co-morbidities in the same patient. Additionally, this work enabled the establishment of a CH clinic at Weill Cornell Medicine (WCM), where the PreCISE1 assay has been used to analyze over 1,000 samples through one-time and serial blood and bone marrow samples. The assay has enabled research studies across a wide range of diseases, including cardiovascular conditions, cancer, HIV, spaceflight-related health challenges, and COVID-19. The codebase for the pipeline is reproducible and publicly available, promoting transparency and collaboration. The assay has already been used to address fundamental biology, including the initiation of CH in the bone marrow1, and to address translational questions by 62 tracking clonal hematopoiesis clones in astronauts after spaceflight2 and cancer cohorts undergoing treatment3. However, the panel is limited to genes linked to leukemia, cardiovascular, and tumorigenesis risks, excluding other relevant co-morbidities. For example, it does not cover germline variants like TCL1A and TERT, which have been associated with increased CH risk through GWAS studies, limiting its ability to assess broader genetic contributions4,5. Aim 2: CH in Solid Tumors, Outcomes, and Proposed Mechanisms The second aim of my thesis investigates the impact of clonal hematopoiesis (CH) on the immune landscape and clinical outcomes in solid tumor patients. CH prevalence increases with age and varies across cancer types, with higher mutation burdens (VAF > 20%) linked to poorer survival. RNA-seq analysis revealed that CH reshapes the tumor microenvironment (TME) by increasing leukocytes and fibroblasts, creating a pro-tumor, anti-inflammatory environment that suppresses immune response. In lung adenocarcinoma (LUAD), patients with TET2 and DNMT3A mutations showed reduced lymphocytes and increased monocytes, correlating with worse survival. Similarly, CH in glioblastoma (GBM) patients was associated with lower immune activation, increased M0 macrophages, and poor prognosis. Although prior studies identified CH's adverse impact in some cancers6,7,8, this work highlights CH’s tumor-specific influence, particularly the lack of a pan-cancer 63 immune signature. CH in colon adenocarcinoma (COAD) and ovarian cancer (OV) is linked to worse outcomes, though the small sample size limited statistical power in OV. The study also notes an increase in fibroblasts across cancers, consistent with findings in cardiovascular disease, suggesting shared mechanisms. While the diversity of TCR/BCR repertoires across tumor types was assessed, no significant differences were found between CH status and immune diversity. The study emphasizes the role of MHC class I HLA alleles in immune evasion by cancer cells, with several HLA alleles being associated with CH. Limitations include challenges with sample purity, small cohort sizes, and the use of RNA-seq deconvolution to infer immune cell infiltration. Future studies should utilize single- cell RNA-seq and deeper targeted sequencing to better understand CH's role in the TME and assess smaller clones overlooked by whole-exome sequencing. Overall, this study underscores the importance of using blood as a reference point in precision oncology to detect CH and include its affects in treatment-decision making. Future Directions of CH Research Future directions include investigating the mechanisms by which clonal hematopoiesis (CH) influences aging and inflammation. In vitro studies will be important to validating findings from in silico analysis from sequencing data. Expanding studies to larger cohorts, such as those from the UK Biobank and All of Us, will provide further insights. Further research needs to be conducted on CH’s impact on other immune- mediating diseases such as HIV, autoimmune conditions (e.g., rheumatoid arthritis), 64 gum disease, and solid tumor cancers. Additionally, incorporating CH into treatment decision-making processes could enhance personalized medicine approaches. 65 REFERENCES 1. Osman, A. E. G. et al. Paired bone marrow and peripheral blood samples demonstrate lack of widespread dissemination of some CH clones. Blood Adv. 7, 1910–1914 (2023). 2. Mencia-Trinchant, N. et al. Clonal Hematopoiesis Before, During, and After Human Spaceflight. Cell Rep. 33, 108458 (2020). 3. Singh, A. S. et al. DNMT3A and TET2 mutant Clonal Hematopoiesis May Drive a Proinflammatory State and Predict Enhanced Response to Immune Checkpoint Inhibitors. Blood 138, 4295–4295 (2021). 4. Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586, 763–768 (2020). 5. Weinstock, J. S. et al. Aberrant activation of TCL1A promotes stem cell expansion in clonal haematopoiesis. Nature 616, 755–763 (2023). 6. Pich, O. et al. 8O Clinical implications of clonal hematopoiesis in the tumor microenvironment of non-small cell lung cancer. ESMO Open 8, 101654 (2023). 7. Sim, H. et al. Increased inflammatory signature in myeloid cells of non-small cell lung cancer patients with high clonal hematopoiesis burden. Preprint at https://doi.org/10.1101/2024.01.02.573827 (2024). 8. Kleppe, M. et al. Somatic mutations in leukocytes infiltrating primary breast cancers. Npj Breast Cancer 1, 15005 (2015). A. Personal Statement