UNDERSTANDING OF BIOPHYSICAL PROPERTIES OF GLYCOCALYX AND ITS APPLICATION IN CANCER IMMUNOTHERAPY A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Sangwoo Park August 2023 © 2023 Sangwoo Park ABSTRACT The cancer glycocalyx, a layer of complex sugar chains covering eukaryotic cell surfaces, plays a crucial role in defending against immune surveillance. Cancer cells strategically produce cell-surface mucins, major components of the glycocalyx, to evade immune recognition and elimination. Whether the structural properties of the glycocalyx also physically shield cancer cells from immune recognition has not been fully resolved. Here, I report physical and chemical properties of mucins and recombinant production methods to synthesize homogenous mucin materials with previse O-glycosylation patterns for biomedical applications. To assess the impact of mucins structural changes on the glycocalyx structure, I utilize an interference-based imaging technique called Scanning Angle Interference Microscopy (SAIM) to measure the glycocalyx thickness with nanoscale precision. I present a detailed protocol for optical setup and live-sample preparation for SAIM imaging. Using glycoengineering strategies and SAIM approach, I reveal how the surface density, glycosylation, and crosslinking of cancer-associated mucins contribute to the nanoscale material thickness of the glycocalyx, and further analyze the effect of the glycocalyx thickness on resistance to effector cell attack. I uncovered a strong reciprocal relationship between the thickness of the glycocalyx and immune cell killing. Natural Killer (NK) cell- mediated cytotoxicity exhibits a nearly perfect inverse correlation with the glycocalyx thickness of target cells regardless of the specific glycan structures present, suggesting that the physical properties of glycocalyx may be key determinants of cancer immune evasion. Changes in glycocalyx thickness as small as 10 nanometers can significantly alter susceptibility to immune cell attack. I further suggest strategies for overcoming the glycocalyx physical barrier through the cellular engineering of immune cells. These strategies include the surface display of glycocalyx-editing enzymes on the NK surface for improved penetration of the glycocalyx barrier. Furthermore, I explore the application of pharmacological and metabolic inhibitors in disrupting the glycocalyx structure for immunotherapy, highlighting the potential to enhance immune cell killing and improve efficacy in future cancer treatment. iii BIOGRAPHICAL SKETCH Sangwoo Park was born in Changwon, South Korea in 1991. Sangwoo began his scientific career as an undergraduate in the Department of Physics at the Korea Advanced Institute of Science and Technology (KAIST). In 2013, her served in the Korean military as a firefighter and an emergency medical technician (EMT) for 911 in South Korea. Upon returning to KAIST in 2015, he joined Professor Tae-Young Yoon’s lab to learn single-molecule techniques for cancer study. Upon receiving his B.S. from KAIST, Sangwoo joined Proteina, Inc to develop a single-molecule co-IP technology. Sangwoo then went on to pursue a Ph.D. in Biophysics at Cornell University under the mentorship of Professor Matthew Paszek. There he uncovered the inverse correlation between the nanoscale glycocalyx structure and immune cell attack. With this finding, he is developing a new cell-based immunotherapy targeting the glycocalyx on cancer cells. In 2022, Sangwoo has been selected as The Pamela Delp Polashenski M.D. Breast Cancer Research Fellow from Breast Cancer Coalition of Rochester to develop the immunotherapy targeting Triple Negative Breast Cancer. In 2022, Sangwoo has been selected as a cohort of the BioEntrepreneurship Initiative at Cornell University and Nucleate Activator program for maximizing the impact of current research as well as developing a business model for the new immunotherapy. iv ACKNOWLEDGMENTS First and foremost, I would like to express my deepest gratitude to my advisor, Professor Matthew Paszek, for his unwavering support, invaluable guidance, and mentorship throughout my PhD journey. Matt has been more than a mentor to me; he has been a true research father, always pushing me to excel and providing me with the tools and knowledge necessary to achieve my goals. Despite my initial struggles in understanding the research topic, Matt stood by me, offering guidance and unwavering support. His belief in my abilities and his willingness to wait until the end allowed me to find my footing and grasp the intricacies of the research. I am truly grateful for his patience, expertise, and constant belief in my abilities. I would like to express my sincere gratitude to Professor Jan Lammerding for his exceptional expertise in cancer biology and cell biology. Throughout my research journey, his guidance, knowledge, and unwavering support have been invaluable. He always provided the fastest response to my inquiries, cheered me on during challenging times, and generously shared any cell lines and information needed for my experiments. I am truly grateful for his mentorship and the significant role he has played in shaping my work. I would also like to extend my deepest appreciation to Professor Warren Zipfel for his guidance and contributions as a member of my committee. His dedication to academic excellence and his passion for advancing microscopy technologies have been a constant source of inspiration. I am immensely grateful for his invaluable insights and the transformative impact he has had on my work. v I would like to extend my heartfelt appreciation to all the members of Paszek lab. Their collaboration, support, and camaraderie have created a stimulating and nurturing environment for research. Their diverse perspectives and expertise have enriched my own understanding and have contributed significantly to the success of my work. My gratitude also extends to my family, whose unwavering love, encouragement, and sacrifices have been the cornerstone of my academic journey. They have instilled in me a passion for learning and have always believed in my potential, even when I doubted myself. Their support and belief in me have been a source of strength and inspiration. I would like to acknowledge Professor Gregory Ray in BioEntrepreneurship initiative, for his valuable insights and mentorship. His expertise in the field of entrepreneurship has broadened my perspective and helped me navigate the complex intersection of science and business. I am grateful for his guidance and the opportunities he has provided me during my time as a fellow. Also, I would like to extend a special thank you to the BioEntrepreneurship fellows, particularly Christian Peitz, for their invaluable contribution to my PhD journey. Our collaboration and shared experiences in developing Candy Therapeutics have been nothing short of remarkable. Christian's entrepreneurial mindset, innovative experiments, and dedication to advancing immunotherapy have been truly inspiring to me. I would like to express my heartfelt gratitude to the individuals involved in the Breast Cancer Coalition of Rochester, especially Silvia Gambacorta-Hoffman and Holly Anderson, for their generous support and guidance throughout my year-long endeavor to develop cancer immunotherapies for the treatment of breast cancer. Their unwavering commitment to vi advancing cancer research and their invaluable assistance have played a pivotal role in my journey. Through this experience, I have developed a deep sense of purpose and dedication to conducting research that is essential for cancer patients. I am truly grateful for their belief in my work and for the opportunities they have provided me. I would like to express my heartfelt appreciation to all members of Korean Graduate Student Association (KGSA) for their unwavering support and the enriching experiences I have had as a member and as the President of the organization. As the President of KGSA, I have had the privilege of working with a dedicated team of individuals who have shown unwavering commitment to the betterment of our community. Through KGSA, I have been able to meet and connect with numerous Koreans at Cornell, forming lifelong friendships and support networks. I would like to extend my sincere gratitude to Professor Tae-Young Yoon for providing me with invaluable opportunities to study single molecule technologies, learn how to approach research in general, and guiding me on the path I should pursue. His guidance and mentorship have been instrumental in shaping my research journey and fostering my passion for the field. Thank you for your unwavering support and for being an inspiring mentor. Lastly, I want to express my deepest gratitude to my wife, Claudia Ro. Her unwavering love, understanding, and encouragement have been my greatest source of strength throughout this challenging journey. Her patience, sacrifice, and belief in me have been the driving force behind my success. I am forever grateful for her unwavering support and for being my pillar vii of strength. I couldn't have achieved anything without you, Claudia. Thank you for standing by my side. Thank you for being my rock and for sharing this incredible journey with me. To all those mentioned above and to the countless others who have contributed in ways both big and small, I offer my heartfelt thanks. Your support and encouragement have been invaluable, and I am truly grateful for the role each of you has played in my academic and personal growth. viii TABLE OF CONTENTS BIOGRAPHICAL SKETCH .................................................................................................. iii ACKNOWLEDGMENTS ....................................................................................................... iv LIST OF FIGURES .................................................................................................................. x LIST OF TABLES .................................................................................................................. xii LIST OF ABBREVIATIONS ............................................................................................... xiii CHAPTER 1: INTRODUCTION ........................................................................................... 1 CHAPTER 2: RECOMBINANT MUCIN BIOTECHNOLOGY AND ENGINEERING .................................................................................................................................................. 10 INTRODUCTION .................................................................................................................... 11 RECOMBINANT MUCIN PRODUCTION ............................................................................ 17 O-GLYCOENGINEERING IN MAMMALIAN CELLS ........................................................ 30 QUALITY CONTROL AND EVALUATION OF GLYCOSYLATION ............................... 32 CHEMICAL AND ENZYMATIC MODIFICATION OF RECOMBINANT MUCINS ........ 38 APPLICATIONS AND OUTLOOK ........................................................................................ 46 ACKNOWLEDGEMENTS ..................................................................................................... 47 CHAPTER 3: AZIMUTHAL BEAM SCANNING MICROSCOPE DESIGN AND IMPLEMENTATION FOR AXIAL LOCALIZATION WITH SCANNING ANGLE INTERFERENCE MICROSCOPY ...................................................................................... 78 INTRODUCTION .................................................................................................................... 79 MATERIALS ........................................................................................................................... 86 METHODS ............................................................................................................................... 91 NOTES ................................................................................................................................... 106 CHAPTER 4: IMMUNOENGINEERING CAN OVERCOME THE GLYCOCALYX ARMOR OF CANCER CELLS .......................................................................................... 115 INTRODUCTION .................................................................................................................. 116 RESULTS ............................................................................................................................... 119 DISCUSSION ......................................................................................................................... 142 MATERIALS AND METHODS ........................................................................................... 146 ACKNOWLEDGEMENTS ................................................................................................... 170 SUPPLEMENTAL FIGURES AND TABLE ........................................................................ 171 CHAPTER 5: IDENTIFY PHARMACOLOGICAL AND METABOLIC INHIBITORS THAT DISRUPT THE NANOSCALE STRUCTURE OF CANCER CELL GLYCOCALYX AND INVESTIGATE THEIR APPLICABILITY IN IMMUNOTHERAPY .......................................................................................................... 200 INTRODUCTION .................................................................................................................. 201 ix RESULTS ............................................................................................................................... 203 DISCUSSION ......................................................................................................................... 209 MATERIALS AND METHODS ........................................................................................... 210 CHAPTER 6: CONCLUSIONS AND FUTURE DIRECTIONS ................................... 218 x LIST OF FIGURES Figure 1.1 Schematic of the glycocalyx .................................................................................... 2 Figure 2.1 Mucin backbone and glycosylation ........................................................................ 12 Figure 2.2 Applications of recombinant mucins ..................................................................... 14 Figure 2.3 Human O-glycosylation pathway map ................................................................... 16 Figure 2.4 Validated probes for evaluation of glycosylation .................................................. 34 Figure 2.5 Approaches for engineering and functionalization of mucin O-glycans ............... 38 Figure 3.1 Surface generated interference and axial localization in SAIM ............................ 80 Figure 3.2 Principles of circle-scanning excitation ................................................................. 81 Figure 3.3 Comparison of static and circle-scanned image quality in TIRFM and SAIM ..... 83 Figure 3.4 Circle-scanning reduces reconstruction artifacts in cellular imaging with SAIM . 84 Figure 3.5 Mapping the plasma membrane topography of adherent cells with SAIM ........... 85 Figure 3.6 Sample preparation for SAIM imaging .................................................................. 91 Figure 3.7 Circle-scanning system construction ..................................................................... 95 Figure 3.8 Arrangement of lenses within the laser scanning subsystem ................................. 97 Figure 3.9 Definition of laser orientation and alignment within the optical system ............... 98 Figure 3.10 Scanning mirror power supplies and driver layout ............................................ 100 Figure 3.11 Circle-scanning system calibration .................................................................... 101 Figure 3.12 Control flow diagram of a generalized experimental sequence ......................... 104 Figure 4.1 Cellular model for investigating Muc1-mediated protection against cytotoxic effector cells .......................................................................................................................... 123 Figure 4.2 Molecular determinants of the mucin material barrier and mucin-mediated protection ............................................................................................................................... 128 xi Figure 4.3 Enhanced receptor-mediated activation can overcome the mucin barrier ........... 134 Figure 4.4 Tethering StcE mucinase to the NK-92 cell surface enhances their cytolytic efficiency ............................................................................................................................... 138 Figure 4.5 Zip-NK-92 cells for modular display of glycocalyx-editing enzymes ................ 141 Figure 4.S1 Additional analysis of Muc1-mediated protection against NK cell cytotoxic function .................................................................................................................................. 171 Figure 4.S2 Measurement of the glycocalyx material thickness with Ring Scanning Angle Interference Microscopy (Ring-SAIM). ................................................................................ 172 Figure 4.S3 Validation of glycoengineered cell lines ........................................................... 174 Figure 4.S4 Galectin-1 and galectin-3 interactions with Muc1 O-glycans ........................... 176 Figure 4.S5 Galectin-1 does not significantly affect the mobility and mobile fraction of cell surface Muc1 .......................................................................................................................... 177 Figure 4.S6 Exogenous treatment and NK-92 GzmB-Gamillus validation .......................... 178 Figure 4.S7 StcE-NK cell validation and cytotoxicity .......................................................... 179 Figure 5.1 Ac5GalNTGc inhibits O-glycosylation ................................................................. 204 Figure 5.2 Ac5GalNTGc reduces the physical glycocalyx thickness and increases protection against cytotoxic effector cells .............................................................................................. 206 Figure 5.3 P-3FAX-Neu5Ac reduces the physical glycocalyx thickness and increases protection against cytotoxic effector cells ............................................................................. 208 xii LIST OF TABLES Table 2.1 Synthetic mucin tandem repeats (TR) fabricated through custom gene synthesis (CGS) ........................................................................................................................................ 21 Table 4.S1 Complete sequence for HER2-specific CAR ...................................................... 180 Table 4.S2 Complete sequence for cytosolic mScarlet-I ....................................................... 182 Table 4.S3 Complete sequence for Glycocalyx-editing (GE) enzyme variants and leucine zipper ...................................................................................................................................... 183 xiii LIST OF ABBREVIATIONS ADC, Antibody-drug conjugate; cDNA, coding DNA; CBM40, Carbohydrate-binding module 40; CeGL, Chemoenzymatic glycan labelling; CGS, Custom gene synthesis; CHO, Chinese-hamster ovary; CMAH, Cytidine monophospho-N-acetylneuraminic acid hydroxylase; CMP, Cytosine monophosphate; CRISPR/Cas9, Clustered regularly interspaced short palindromic repeat/targeted Cas9 endonuclease; FTIR, Fourier transform infrared; GalNAc, N-acetylgalactosamine; GalNAzME, N-azidoalaninyl galactosamine; GBP, Glycan Binding Proteins; GGTA1, Glycoprotein α-galactosyltransferase 1; GlcNAc, N- acetylglucosamine; HEK, Human embryonic kidney; ISOGlyP, Isoform-Specific O- Glycosylation Prediction; KI/KO, Knockin/Knockout; Pro, Proline; PTS, Proline Threonine Serine-rich domain; UDP, Uridine diphosphate; MGE, Metabolic glycoengineering; Neu5Ac, N-acetylneuraminic acid; Neu5Gc, N-glycolylneuraminic acid; PNA, Peanut agglutinin; ppGalNAcT, Polypeptide N-acetylgalactosaminyltransferase; SNA, Sambucus Nigra lectin; Ser, Serine; StcE, Secreted protease of C1 esterase inhibitor; Thr, Threonine; ZFN, Zinc- finger nuclease; TR, Tandem repeat; 1 CHAPTER 1 INTRODUCTION The mammalian glycocalyx is a layer of complex sugar chains that covers the surface of eukaryotic cells1. This glycocalyx layer is involved in a multitude of essential biological processes, including cell migration2, the immune response3–5, and embryonic development6. The composition and structure of the glycocalyx are dynamically regulated by various factors, including the expression levels of glycan scaffolds such as glycoproteins, glycolipids, and proteoglycans1,7, as well as the activity of glycan-editing enzymes like glycotransferases8,9 and the cellular metabolism associated with the hexosamine biosynthetic pathway10. Mucins, which are major components of the glycocalyx, are biopolymers composed of an unstructured polypeptide backbone and a dense grafting with O-glycans linked to serine (Ser) and threonine (Thr) residues11,12. These glycan side chains, which can account for 50% or more of the glycoprotein molecular weight, can be linear or branched and often negatively charged at neutral pH due to their capping with sialic acids or sulfate groups13,14. The exceptional physical properties of mucins arise from their unique molecular and chemical structure. The arranged O-glycans have strong interactions with water molecules, which enable mucins to hydrate, lubricate, and protect biological interfaces15. Secreted and cell- surface mucins are primarily responsible for safeguarding mucosal surfaces against dehydration, mechanical stresses, oxidative degradation, and viral and bacterial infections. 2 Fig. 1.1. Schematic of the glycocalyx. a. Cartoon showing the glycocalyx barrier against immune cell interaction. Thicker glycocalyx states impede the engagement of immune cells, leading to a reduction in immune cell-mediated killing. b. Schematic of Mucin-1 (Muc1) in the glycocalyx. The use of mucins in biomedical applications is limited by the difficulty of purifying large volumes of high-quality mucins from natural sources. This is due to a number of factors, including batch-to-batch variation, limited bodily secretion volume, poor scalability of collection procedures, risk of pathogen contamination, lack of established sterilization protocols, and differential mucin expression and glycosylation patterns in inflamed or diseased animal organs16. The 2007 heparin contamination crisis, which resulted in several deaths and hundreds of adverse reactions, has led to increased regulatory scrutiny of naturally sourced bioproducts. Heparin, a sulfated polysaccharide typically sourced from pig intestines, has been used as an antithrombic medicine since the 1930s. However, the 2007 crisis highlighted the risks and supply chain vulnerability of animal-derived therapeutics17. Recombinant production is an alternative to natural sourcing that allows for the stable and controlled production of homogeneous mucin materials. Second, recent advances in host cell production systems have made it possible to produce mucins with more previse O- glycosylation patterns. This is important because the O-glycans on mucins play a critical role in their biochemical and physical properties. Third, glycoengineering efforts have included 3 the reconstruction of O-glycosylation pathways in bacteria, yeast, and plants, as well as the development of engineered mammalian production systems that can generate desired O- glycan structures with minimal heterogeneity. This means that designer mucins with precise biochemical and physical properties can be created, which could be used in advanced biomaterials, therapeutics, immunomodulatory agents, and drug delivery systems. Cancer cells strategically produce significant amounts of cell-surface mucins, resulting in thicker glycocalyx structure, which provides the ability of defense against recognition and elimination by immune cell18. These tumor-associated mucins, which include truncated and more highly sialylated structures, can directly influence the tumor immune microenvironment19–24. For example, through the engagement of Siglecs, the macrophage galactose receptor, and other immune cell receptors, tumor-associated mucins or O-glycans have suppressed the activities of different types of immune cells such as CD4+ T, CD8+ T, and NK cells4,23,25. Notably, we previously observed an inverse correlation between NK cell- mediated cytotoxicity and the expression of cell-surface mucins on target cells26. Truncation of mucin-type O-glycans has been shown to enhance NK antibody-dependent, cell-mediated cytotoxicity, while elongation of O-glycans has been reported to provide protection to tumor cells26–29. However, the impact of structural changes of such glycans on the biophysical properties of the glycocalyx, including its thickness, remains to be elucidated. In addition, it is still unclear whether the effects on NK cell resistance are attributed to specific receptor interactions, changes in mucin structural attributes, or other consequences of modified glycosylation26. To address these questions, novel tools are needed to investigate the physical structure and properties of the glycocalyx. 4 In recent years, various imaging techniques have been developed to visualize the live cell glycocalyx with nanometer scale resolution30–33. Among these technologies is Scanning Angle Interference Microscopy (SAIM), a localization microscopy technique that utilizes standing waves of excitation light to axially localize fluorescently labeled structures with nanoscale precision34–38. To enhance the precision of SAIM for glycocalyx materials research, a new implementation was recently introduced using a pair of high-speed, galvanometer- controlled mirrors to generate a revolving circle or "ring" of excitation light at specific sample incidence angles. This approach, called Ring-SAIM, aims to improve sample illumination and measurement accuracy compared to standard SAIM implementations36. In Chapter 2, we present the potential of mucins as a biomaterial for various biotechnological applications. Recombinant technologies can be used to fabricate mucins with specific physical and biochemical properties. Mucin's pendent O-glycans have various biochemical activities, including immunomodulation and suppression of pathogen virulence. Engineered cell production systems allow the scalable synthesis of recombinant mucins with precisely tuned glycan side chains, which can be further modified using metabolic and chemoenzymatic strategies. These advancements offer exciting possibilities for biomedical applications ranging from in vitro models to therapeutics. In Chapter 3, we applied azimuthal beam scanning to Scanning Angle Interference Microscopy, called Ring-SAIM, as an effective method to eliminate coherence artifacts in widefield microscopy caused by interference fringes from dirt on optics and internal reflections to accurately measure the glycocalyx thickness in live mammalian cells. Azimuthal beam scanning works by rapidly rotating the excitation beam through its azimuth, 5 eliminating uneven illumination. The method can be applied to not only TIRF microscopy but also in scanning angle interference microscopy (SAIM). In this work, we explain the design and construction of an optimized SAIM instrument, including the optical configuration, peripheral devices, and system calibration. In Chapter 4, we discuss the role of the cancer cell glycocalyx in immune evasion and how immunoengineering can overcome it. Cancer-associated mucins and their glycosylation contribute to the nanoscale material thickness of the glycocalyx and modulate functional interactions with cytotoxic immune cells. Natural Killer (NK) cell-mediated cytotoxicity is inversely correlated with the glycocalyx thickness of target cells, and changes in glycocalyx thickness can alter susceptibility to immune cell attack. Immunoengineering strategies, such as chimeric antigen receptors or glycocalyx-editing enzymes, can enhance cytotoxicity against mucin-bearing target cells and overcome the glycocalyx armor of cancer cells. In Chapter 5, we identify pharmacological and metabolic inhibitors that disrupt the nanoscale structure of the cancer cell glycocalyx and test their applicability in immunotherapy. Small-molecule inhibitors of cellular glycosylation and tumor metabolism have been used in clinical development for oncology applications. We focus on investigating the efficacy of three inhibitors, Ac5GalNTGc (peracetylated C-2 sulfhydryl substituted GalNAc) and P-3FAX-Neu5Ac (pan-sialytlransferase inhibitor), in reducing the thickness of glycocalyx. Given the significance of glycocalyx thickness reduction in increasing immune cell killing, we also examined the impact of using these inhibitors on immune cell susceptibility. Additionally, we further explored the potential of combining the CD19 CAR with these inhibitors to further enhance the efficacy of CAR in future applications. 6 REFERENCES 1. Essentials of glycobiology. (Cold Spring Harbor Laboratory Press, 2022). 2. Janik, M. E., Lityńska, A. & Vereecken, P. Cell migration—The role of integrin glycosylation. Biochimica et Biophysica Acta (BBA) - General Subjects 1800, 545–555 (2010). 3. Park, S. et al. Mucins form a nanoscale material barrier against immune cell attack. bioRxiv (2022) doi:10.1101/2022.01.28.478211. 4. Hudak, J. E., Canham, S. M. & Bertozzi, C. R. Glycocalyx engineering reveals a Siglec-based mechanism for NK cell immunoevasion. Nat Chem Biol 10, 69–75 (2014). 5. Marth, J. D. & Grewal, P. K. Mammalian glycosylation in immunity. Nat Rev Immunol 8, 874–887 (2008). 6. Haltiwanger, R. S. & Lowe, J. B. Role of Glycosylation in Development. Annu. Rev. Biochem. 73, 491–537 (2004). 7. Reitsma, S., Slaaf, D. W., Vink, H., Van Zandvoort, M. A. M. J. & Oude Egbrink, M. G. A. The endothelial glycocalyx: composition, functions, and visualization. Pflugers Arch - Eur J Physiol 454, 345–359 (2007). 8. Nairn, A. V. et al. Regulation of Glycan Structures in Animal Tissues. Journal of Biological Chemistry 283, 17298–17313 (2008). 9. Schachter, H. Biosynthetic controls that determine the branching and microheterogeneity of protein-bound oligosaccharides. Biochem. Cell Biol. 64, 163–181 (1986). 7 10. Chatham, J. C., Nöt, L. G., Fülöp, N. & Marchase, R. B. HEXOSAMINE BIOSYNTHESIS AND PROTEIN O-GLYCOSYLATION: THE FIRST LINE OF DEFENSE AGAINST STRESS, ISCHEMIA, AND TRAUMA. Shock 29, 431–440 (2008). 11. Hattrup, C. L. & Gendler, S. J. Structure and Function of the Cell Surface (Tethered) Mucins. Annu. Rev. Physiol. 70, 431–457 (2008). 12. Bansil, R. & Turner, B. S. Mucin structure, aggregation, physiological functions and biomedical applications. Current Opinion in Colloid & Interface Science 11, 164–170 (2006). 13. Hang, H. C. & Bertozzi, C. R. The chemistry and biology of mucin-type O-linked glycosylation. Bioorganic & Medicinal Chemistry 13, 5021–5034 (2005). 14. Sun, L. et al. Installation of O-glycan sulfation capacities in human HEK293 cells for display of sulfated mucins. J Biol Chem 298, 101382 (2022). 15. Crouzier, T. et al. Modulating Mucin Hydration and Lubrication by Deglycosylation and Polyethylene Glycol Binding. Adv. Mater. Interfaces 2, 1500308 (2015). 16. Marczynski, M., Winkeljann, B. & Lieleg, O. Advances in Mucin Biopolymer Research: Purification, Characterization, and Applications. Biopolymers for Biomedical and Biotechnological Applications 181–208 (2021) doi:10.1002/9783527818310.ch6. 17. Szajek, A. Y. et al. The US regulatory and pharmacopeia response to the global heparin contamination crisis. Nat Biotechnol 34, 625–630 (2016). 18. Ghasempour, S. & Freeman, S. A. The glycocalyx and immune evasion in cancer. FEBS J (2021) doi:10.1111/febs.16236. 19. Burchell, J. M., Beatson, R., Graham, R., Taylor-Papadimitriou, J. & Tajadura-Ortega, V. O-linked mucin-type glycosylation in breast cancer. Biochem Soc Trans 46, 779–788 (2018). 8 20. Gupta, R., Leon, F., Rauth, S., Batra, S. K. & Ponnusamy, M. P. A Systematic Review on the Implications of O-linked Glycan Branching and Truncating Enzymes on Cancer Progression and Metastasis. Cells 9, E446 (2020). 21. Mereiter, S., Balmaña, M., Campos, D., Gomes, J. & Reis, C. A. Glycosylation in the Era of Cancer-Targeted Therapy: Where Are We Heading? Cancer Cell 36, 6–16 (2019). 22. RodrÍguez, E., Schetters, S. T. T. & van Kooyk, Y. The tumour glyco-code as a novel immune checkpoint for immunotherapy. Nat Rev Immunol 18, 204–211 (2018). 23. van de Wall, S., Santegoets, K. C. M., van Houtum, E. J. H., Büll, C. & Adema, G. J. Sialoglycans and Siglecs Can Shape the Tumor Immune Microenvironment. Trends in Immunology 41, 274–285 (2020). 24. Rømer, T. B. et al. Mapping of truncated O-glycans in cancers of epithelial and non- epithelial origin. Br J Cancer 125, 1239–1250 (2021). 25. Perdicchio, M. et al. Sialic acid-modified antigens impose tolerance via inhibition of T-cell proliferation and de novo induction of regulatory T cells. Proc Natl Acad Sci U S A 113, 3329–3334 (2016). 26. Madsen, C. B. et al. Glycan Elongation Beyond the Mucin Associated Tn Antigen Protects Tumor Cells from Immune-Mediated Killing. PLoS ONE 8, e72413 (2013). 27. Suzuki, Y. MUC1 carrying core 2 O-glycans functions as a molecular shield against NK cell attack, promoting bladder tumor metastasis. Int J Oncol (2012) doi:10.3892/ijo.2012.1411. 28. Okamoto, T. et al. Core2 O-glycan-expressing prostate cancer cells are resistant to NK cell immunity. Molecular Medicine Reports 7, 359–364 (2013). 9 29. Tsuboi, S. et al. A novel strategy for evasion of NK cell immunity by tumours expressing core2 O-glycans. EMBO J 30, 3173–3185 (2011). 30. Paszek, M. J. et al. Scanning angle interference microscopy reveals cell dynamics at the nanoscale. Nat Methods 9, 825–827 (2012). 31. Möckl, L. et al. Quantitative Super-Resolution Microscopy of the Mammalian Glycocalyx. Developmental Cell 50, 57-72.e6 (2019). 32. Son, S. et al. Molecular height measurement by cell surface optical profilometry (CSOP). Proc Natl Acad Sci U S A 117, 14209–14219 (2020). 33. Kuo, J. C.-H. & Paszek, M. J. Glycocalyx Curving the Membrane: Forces Emerging from the Cell Exterior. Annu Rev Cell Dev Biol 37, 257–283 (2021). 34. Paszek, M. J. et al. The cancer glycocalyx mechanically primes integrin-mediated growth and survival. Nature 511, 319–325 (2014). 35. Ajo-Franklin, C. M., Ganesan, P. V. & Boxer, S. G. Variable Incidence Angle Fluorescence Interference Contrast Microscopy for Z-Imaging Single Objects. Biophysical Journal 89, 2759–2769 (2005). 36. Colville, M. J., Park, S., Zipfel, W. R. & Paszek, M. J. High-speed device synchronization in optical microscopy with an open-source hardware control platform. Sci Rep 9, 12188 (2019). 37. Carbone, C. B., Vale, R. D. & Stuurman, N. An acquisition and analysis pipeline for scanning angle interference microscopy. Nat Methods 13, 897–898 (2016). 38. Lambacher, A. & Fromherz, P. Fluorescence interference-contrast microscopy on oxidized silicon using a monomolecular dye layer. Appl. Phys. A 63, 207–216 (1996). 10 CHAPTER 2* RECOMBINANT MUCIN BIOTECHNOLOGY AND ENGINEERING Abstract Mucins represent a largely untapped class of polymeric building block for biomaterials, therapeutics, and other biotechnology. Because the mucin polymer backbone is genetically encoded, sequence-specific mucins with defined physical and biochemical properties can be fabricated using recombinant technologies. The pendent O-glycans of mucins are increasing implicated in immunomodulation, suppression of pathogen virulence, and other biochemical activities. Recent advances in engineered cell production systems are enabling the scalable synthesis of recombinant mucins with precisely tuned glycan side chains, offering exciting possibilities to tune the biological functionality of mucin-based products. New metabolic and chemoenzymatic strategies enable further tuning and functionalization of mucin O-glycans, opening new possibilities to expand the chemical diversity and functionality of mucin building blocks. In this review, we discuss these advances, and the opportunities for engineered mucins in biomedical applications ranging from in vitro models to therapeutics. * This chapter is reprinted with permission from: Park S, Kuo JCH, Reesink HL, Paszek MJ. Recombinant Mucin Biotechnology and Engineering. Advanced Drug Delivery Reviews (2023). doi.org/10.1016/j.addr.2022.114618 Author Contribution: S.P. managed the project and generated figures and tables. 11 Graphical abstract 1. Introduction – History of mucin bioproducts Mucins evolved in early Metazoan life as protective biopolymers for cell and tissue interfaces1. Molecularly, mucins are protein and sugar co-polymers defined by their unstructured polypeptide backbone and dense grafting with serine (Ser) and threonine (Thr) linked O-glycans2,3. The glycan side chains, which can constitute 50% or more of the glycoprotein molecular weight, are linear or branched and often negatively charged at neutral pH due to their capping with sialic acids or sulfate groups4,5 (Fig. 2.1). The extraordinary physical properties of mucins emerge from their unique molecular and chemical structure. The arrayed O-glycans strongly interact with water molecules, enabling mucins to hydrate, lubricate, and protect biological interfaces 6. Secreted and cell-surface mucins are chiefly responsible for protecting mucosal surfaces against dehydration, mechanical stresses, oxidative degradation, and infection from viruses and bacteria. From an applied perspective, known human interest in mucins dates to antiquity, where ancient Greeks were said to apply snail mucus as anti-inflammatory and anti-aging balms 7. Until recently, purified mucin products have primarily been considered for hydration 12 and lubrication of tissues, include the eye, cartilage, tendons, mouth, and vocal cords. Interest in mucin biotechnology has surged in the past decade with increasing appreciation that mucins can have potent biological activity in addition to remarkable physical properties. Recent studies have implicated mucin O-glycans in the regulation of the immune system, as well as the phenotype of symbiotic and pathogenic microorganisms8–13. For instance, mucin O-glycans can suppress the virulence of Pseudomonas aeruginosa, an opportunistic pathogen that is a common source of hospital infections14,15. Fig 2.1. Mucin backbone and glycosylation. (A) Mucins can be divided into secreted and membrane-associated family members. All mucins contain a serine, threonine, and proline rich region, often composed of variable number of tandem repeats (TRs), that are sites of O- glycosylation. In membrane-associated mucins, the heavily glycosylated domain is linked to a single pass transmembrane anchor and short cytoplasmic tail. Secreted mucins can be divided into gel-forming and non-gel forming subtypes. Gel-forming mucins contain N- and C- terminal cysteine-rich domains (orange, round) and contribute to the viscoelastic properties of mucus. (B) Mucins are defined by their densely grafted O-glycan structures on serine and 13 threonine residues. O-glycosylation in mammals is initiated by transfer of N- acetylgalactosamine (GalNAc; shown) to the side chain of threonine or serine. Abbreviations: Gal, galactose; GlcNac, N-acetylglucosamine; GalNAc, N-glycolyl neuraminic acid; Ser, serine; Thr, threonine. Shorter mucin fragments, including small mucin peptides for anti-tumor vaccines, can be constructed through direct glycopeptide synthesis or chemoenzymatic approaches16. Larger mucins are conventionally harvested from natural sources, including bovine submaxillary gland17, pig gastric mucosa18, chicken egg (ovomucin)19, jellyfish bodies (qniumucin)20, and snail pedal mucus21. These larger mucins are increasingly attracting attention for a wide array of applications that include anti-fouling coatings22–25, anti-aging creams26,27, wound healing agents28,29, anti-microbials30,31, saliva and tear film substitutes32–35, lubricants36–38, contact lens drops and coatings39, advanced biomaterials 40,41, and drug delivery agents42. (Fig. 2.2) Despite this potential, broader application of larger mucins in biomedical applications is being throttled by the current reliance on their harvesting from natural sources. Purifying large volumes of high-quality mucins from natural sources is challenging due to batch-to- batch variation, the limited volume of bodily secreted mucins from animal organs, reliance on collection procedures with poor scalability, risk of pathogen contamination, lack of established sterilization protocols, and differential mucin expression and glycosylation patterns when the animal organ is inflamed or diseased 43. The use of naturally sourced bioproducts is subject to increasing regulatory scrutiny following the 2007 contamination crisis with the sulfated polysaccharide, heparin. Typically sourced from pig intestines, heparin has been injected as an antithrombic medicine since the 1930s. In 2007, administration of contaminated heparin resulted in several deaths in the United States and hundreds of adverse 14 reactions worldwide, drawing attention to the risks and supply chain vulnerability of animal derived therapeutics44. Fig 2.2. Applications of recombinant mucins. Biomaterials: mucin biopolymers are under active investigation as building blocks for biocompatible materials and hydrogels. Therapeutics: mucin glycans have bioactivities that are being explored for immune modulation, attenuation of microbial virulence, and other applications. Drug delivery: mucins can be chemically functionalized to serve as carriers for drugs and other therapeutic agents. Lubrication: mucin-based lubricants like lubricin (shown) can hydrate, protect, and lubricate materials ranging from cartilage to contact lenses. Non-fouling coatings: mucin surface coatings can resist protein deposition and microbial interactions. Anti-adhesive coatings: mucins have potential as non-immunogenic alternatives to PEG and other synthetic polymers for surface coatings on liposomes and nanoparticles. 15 As an alternative to natural sourcing, recombinant production provides an opportunity for stable and controlled biosynthesis of homogenous mucin materials. Recent advances in host cell production systems have enabled the recombinant biosynthesis of mucins with more precise O-glycosylation patterns. Notable glycoengineering efforts include the bottom-up reconstruction of O-glycosylation pathways in bacteria, yeast, and plants, as well as the development of engineered mammalian production systems to generate desired O-glycan structures with minimal heterogeneity (See below). With these advances, recombinant technology presents an opportunity to fabricate designer mucins with precise biochemical and physical attributes that may be critical for advanced biomaterials, therapeutics, immunomodulatory agents, and drug delivery systems. Production strategies for larger mucins are still in the early stages of research and development compared to processes for other biologics, such as antibodies. However, clinical demand for new biolubricants and biomaterials has motivated research into high titer and cost-effective production of larger mucins. As a notable example, the mucin-like glycoprotein, lubricin, has long attracted commercial attention due to its remarkable ability to hydrate and lubricate tissue with its central mucin domain, as well as its ability to adhere to diverse tissue matrices through its N- and C-terminal globular domains 45. Due to these abilities, lubricin has attracted keen interest as a biolubricant for diverse medical conditions, including as a treatment for post-traumatic osteoarthritis, dry mouth, dry eye disease, carpal tunnel syndrome, and surgical adhesions 46–48. Full-length recombinant lubricin was first produced at small scales nearly two decades ago at Wyeth Corp49. Using strategies originally developed for recombinant antibody production, Greg Jay and colleagues later achieved higher titer production of human lubricin in Chinese Hamster Ovary (CHO) cells, suggesting 16 that scalable production of therapeutic mucin products is feasible and commercially viable using industry-standard production platforms50. Recently, new customized and functional lubricin products have been produced using fully synthetic coding DNA (cDNA) sequences constructed through custom gene synthesis (CGS), suggesting how modern advances in DNA “printing” may be leveraged for the encoding of native and customized mucins for scalable manufacture 51,52. Fig 2.3. Human O-glycosylation pathway map. Graphic depiction of O-glycosylation pathways with mucin-related glycosyltransferase genes. The basic O-GalNAc structure (Tn- antigen) is generated by a family of ppGalNAcT enzymes on serine or threonine residues primarily in the Golgi apparatus. The Tn-antigen can be extended to form one of the primary core structures (core 1-4) or capped with sialic acid to form the sialyl-Tn antigen. The core structures can subsequently be elongated or branched and capped through sialyltransferases, fucosyltransferases, sulfotransferases, and other enzymes. Main core structures and examples of their extended structures are shown in bold. Glycan symbols are drawn according to the Symbol Nomenclature for Glycans (SNFG) format. In this review, we discuss the important technical advances that are leading to increased consideration of recombinant mucins as building blocks and scaffolds for biomaterials, drug delivery systems, and other biomedical applications, including in vitro 17 model systems. The emphasis of the review is on larger mucins that cannot be synthesized at scale using alternative approaches other than natural sourcing. Progress in CGS combined with new algorithms for optimally encoding repetitive biopolymers have substantially advanced the speed at which new mucins can be designed and tested. Advances in mammalian cell production platforms and genome editing have made scalable biosynthesis of mucins with precisely tuned glycosylation patterns a practical reality. Comprehensive cellular resources have come online for manufacturing mucins with precisely tuned O-glycans53,54. Technologies for further refinement of mucin O-glycans through metabolic and chemoenzymatic methods can endow mucins with new functionalities and transform them into functional carriers of drugs and other bioactive payloads. Together, these advancements portend a future era of design and manufacture of customized mucins to solve pressing biomedical challenges. 2. Recombinant mucin production 2.1. Mucin encoding for recombinant manufacturing Mucins are defined by their proline (Pro), Thr, and Ser rich polymeric regions (PTS domains) that are heavily O-glycosylated at Thr and Ser residues. Canonical mucin family members include the gel-forming mucins (Muc2, 5AC, 5B, 6, and 19), transmembrane mucins (Muc1, 3A, 3B, 4, 12, 13, 15, 16, 17, 18, 20, and 21), secreted mucins (Muc7 and 8), and several other family members that are not fully classified (Muc9, 14, and 22)55. In addition to the canonical family members, multiple glycoproteins contain densely O-glycosylated, PTS domains. In our definition of mucins, we include all canonical mucin family members, as well as additional glycoproteins that contain an O-glycosylated sequence of at least 100 contiguous amino acids with 5% or greater Pro content and 20% or greater Ser/Thr content. By this 18 definition, lubricin and densely O-glycosylated transmembrane proteins, such as podocalyxin, are classified as mucins. The PTS domains of mucins are often comprised of a variable number of tandem repeats (TR) that differ in sequence and length across the different mucin family members. For instance, the TR of Muc1 is a degenerate 60 base pair sequence with most human alleles comprising between 20 and 125 copies of the repeat (each allele can have a unique repeat numbers). Recombinant production of shorter mucin fragments with a limited number of TRs has generally been successful using standard approaches 56,57. Partial PTS domains of a variety of human mucins including Muc1, Muc2, Muc3A/B, Muc5AC, Muc5B, Muc6, Muc7, Muc9, Muc13, Muc16, Muc17, Muc19, Muc20, Muc21 and Muc22 have been recombinantly expressed51,52,54,58–60. For larger mucins with complete PTS domains, the highly repetitive nature of the repeats, as well as the high GC content of some Variable number tandem repeats (VNTRs), can pose significant technical challenges in cloning, sequencing, and recombinant biosynthesis. These challenges are underscored by the fact that only recently have the full sequences of all human mucins been revealed61. Specialized cloning methods that deal with repetitive DNA sequences, including overlap elongation Polymerase Chain Reaction (PCR)62, overlap extension rolling circle amplification63, and recursive directional ligation methods64–68, often generate heterogenous products of different sizes and involve iterative procedures that are tedious to optimize. Highly repetitive DNA sequences also are prone to recombination during propagation of plasmids and viral vectors. Moreover, the fidelity of nearly all DNA processing steps, including replication, can be compromised in host production systems by slippage and other errors linked to repetitive sequences69. Thus, genomic stability of repetitive 19 mucin cDNAs during extended cell cultivation may be of greater concern for manufacturing of mucins compared to other bioproducts. For instance, likely recombination of TRs has been reported during vector propagation and cell line development for production of full-length recombinant Muc1 and lubricin using native cDNAs51,56,57. 2.1.1. Genetic strategies for encoding and stable expression of larger mucins The emergence of rapid and cost-effective de novo DNA synthesis affords new opportunities to avert the inherent pitfalls of working with repetitive, native mucin cDNAs 70. CGS has revolutionized protein engineering and synthetic biology by enabling low-cost “printing” of non-repetitive DNA sequences. In typical CGS workflows, complete genes are assembled from short, overlapping DNA oligonucleotides that are chemically synthesized. Highly repetitive DNA sequences are not amenable to efficient CGS due to the inability to design the unique sequence overlaps that guide precise assembly. As a powerful workaround, de novo genes for repetitive polypeptide sequences can be optimized for CGS using codon- scrambling algorithms that identify the least-repetitive synonymous coding sequences for the desired polypeptide repeats52. Originally developed by Tang and colleagues for elastin-like polymers, this strategy exploits codon-redundancy to minimize nucleotide repetition in the design of recombinant cDNA sequences while conserving the native amino-acid sequence of the desired proteins 71. Optimizing recombinant mucin cDNAs through codon-scrambling has the potential to improve the stability of mucin gene sequences, as it tactfully evades genomic instabilities of highly repetitive nucleotide sequences that are innate to native mucin TR domains. Consistent with this idea, codon-scrambled, synonymous cDNAs can faithfully produce full-length recombinant MUC1 and lubricin in suspension-adapted human embryonic kidney (HEK293-F) cells that are difficult to achieve with the corresponding native cDNAs 51. 20 As a notable example, a lubricin-like glycoprotein with 59 perfect repeats of the consensus repeat, KEPAPTTP, has been stably produced for over 2 months in continuous culture using a fully synthetic, codon-scrambled gene 51. Mucin genes can be stably integrated into the genomes of standard mammalian production systems through selection of cells following transient transfection with vectors containing a suitable selection marker54. For more efficient genomic integration, commercial transposase systems have been reliable for stable integration of large mucin cDNAs. The piggyBacTM system for genome editing is particularly well-suited for delivering mucin cDNAs due to its large cargo-carrying capacity of over 200 kilobases72. Notably, no overt signs of recombination in long mucin TRs have been observed in stable cell lines generated using the piggyBac transposase 52,73–75. Viral systems for stable cell line generation should be used with caution. In addition to the concerns of viral contamination in the recombinant products, highly repetitive mucin cDNAs are susceptible to homologous recombination in retroviruses and lentiviruses 76. For example, attempts at lentiviral mediated integration of full length Muc1 cDNAs have resulted in expression of highly truncated products, consistent with recombination of the repetitive TRs 73. By comparison, stable lines generated with the piggyBac transposase for the same cDNA generated homogenous products of high molecular weight 73. Additional genetic strategies have proven useful in the development of high- productivity cell lines for recombinant mucins. Epigenetic regulators that mediate high and consistent expression of recombinant proteins 77 have been reported to enhance the recombinant production of full-length lubricin 50. 21 Table 2.1. Synthetic mucin tandem repeats (TR) fabricated through custom gene synthesis (CGS) TR sequences Mucin type Number of repeats Reference PDTRPAPGSTAPPAHGVTSA Muc1 10,21,42, 84 [46] Muc1 7 [58] PDTRPAPGATAPPAHGVTSA Muc1 (mutant) 21 [52] PDTRPAPGATAPPAHGVTAA Muc1 (mutant) 21 [52] PDARPAPGATAPPAHGVTAA Muc1 (mutant) 21 [52] PSPPITTTTTPPPTTT Muc2 10 [58] GTQTPTPTPITTTTTVTPTPTPT Muc2 7 [58] PLPVTDTSSASTGHAT Muc3A/B 9 [58] STTSAPTT Muc5AC 18 [58] TTAVPPTPSATTLDPSSASAPPE Muc7 7 [58] TSDIITASSPNDGLIT Muc13 9 [58] TSTPSEGSTPFTSMPVSTMPVVTSEAST Muc17 5 [58] SESSASSDGPHPVITPSRA Muc20 8 [58] SSGASTATNSESSTV Muc21 10 [58] SETTVTSTAG Muc22 15 [58] KEPAPTTP Lubricin 59 [51] Lubricin 20 [52] DAATPAP Synthetic mucin 40, 80 [52] DAATPAPP Synthetic mucin 40, 80 [52] PPASTSAPG Synthetic mucin 40, 80 [52] 2.1.2. de novo mucin production Can we design de novo mucin sequences with unique glycosylation patterns? Several high molecular weight, fully synthetic mucins have been rationally designed and constructed using CGS combined with codon-scrambling algorithms. For instance, new mucins have been constructed with TRs designed based on statistical analysis of mucin O-glycosylation sites (PPASTSAPG) or analysis of N-acetylgaslactosamine (GalNAc) transfer efficiency (DAATPAP and DAATPAPP)52,78. These approaches have also been applied for recombinant production of Muc1 variants with dozens and hundreds of mutations to its Ser and Thr glycosylation sites to create mucin variants with distinct glycosylation patterns 52. The codon 22 scrambler algorithm does not attempt to optimize codon frequency usage for a particular host organism, and, therefore, may not provide the optimal sequence for host cell productivity 71. For lower TR numbers, standard codon optimization routines that minimize codon usage bias in a specific host can provide sufficient codon scrambling for CGS. For instance, Narimatsu and colleagues have applied standard codon optimization tools for successful CGS of over 20 mucin constructs with distinct TRs 54,79. A list of synthetic mucin TR sequences that have been fabricated through CGS following encoding through either standard codon optimization or specialized codon scrambling approaches is presented in Table 1. Scientists are starting to uncover the polypeptide sequence determinants for O-glycan initiation and extension, providing valuable insight for the rational design of new or engineered mucin sequences. Mucin O-glycosylation is initiated by a family of polypeptide n- acetylgalactosaminyltransferases (ppGalNAcT) that catalyze the transfer of GalNAc to a Ser or Thr residue on the protein substrate80. The ppGalNAcTs are type II transmembrane proteins that typically reside in the eukaryotic Golgi apparatus. Each ppGalNAcT poses a luminal facing catalytic domain and a lectin domain that together determine substrate preference and enzyme specificity. Many of the 20 ppGalNAcTs in mammals possess some degree of overlap in sequence preference and, thus, can function with some redundancy in O-glycan initiation81. However, unique position-sensitive features have been uncovered for the ppGalNAcT isoforms, and some of the isoform structural differences that determine peptide substrate preference have been resolved 81–83. Technological advancements are poised to further accelerate the understanding of glycosyltransferase specificities. For instance, rapid cell-free synthesis and mass spectrometry analysis have been combined to uncover peptide specificities of human glycosyltransferases, including ppGalNAcT-1 and ppGalNacT-2 84. Such 23 understanding may be leveraged in the future to engineer transferase-specific glycosylation patterns. Several powerful bioinformatics tools have been developed for O-glycosylation site prediction. The popular NetOGlyc4.0 server uses neural networks and O-glycoproteomics training data for prediction of mucin-type O-glycosylation in mammalian glycoproteins with an accuracy of approximately 75%85,86. Isoform-Specific O-Glycosylation Prediction (ISOGlyP) is a powerful algorithm for de novo prediction of O-glycosites in a contextual manner that can account for the specific ppGalNAcT isoforms that are expressed in the host cell86. ISOGlyP has a reported accuracy of 70% in glycosite prediction86. Context specific mining of the O-glycoproteome also can be conducted in the GlycoDomain Viewer, which incorporates data from multiple human and animal cell lines, as well as some organs and body fluids86. Together, better definition of the sequence determinants of mucin O-glycosylation, combined with predictive bioinformatics tools and advances in gene synthesis, are advancing the feasibility of fabricating de novo mucins. 2.2. Cellular host production systems for recombinant mucins The repertoire of cellular O-glycosylation machinery is a primary criterion for selecting an appropriate host cell system for recombinant mucin production. The initiating mucin GalNAc, which is often referred to as the Tn antigen, is subsequently elongated into the four primary core structures through the sequential actions of enzymes that are organized throughout the Golgi stacks (Fig. 2.3). The most common extension is through the core 1 synthase (C1GalT-1) to form the T antigen, Galβ1-3GalNAc, which can be branched by C2GnT-1,3 to form the core 2 structure, Galβ1-3(GlcNACβ1-3)GalNAc. C1GalT-1 function is dependent on the presence of the essential chaperone COSMC87. Expression of the core 3 24 synthase, β3GnT-6, is largely restricted to the gastrointestinal and respiratory mucosa and appends N-acetylglucosamine (GlcNAc) to the Tn antigen to form the core 3 structure, GlcNAcβ1-3GalNAc. The core 3 structure subsequently can be branched by C2GnT-2 to form the core 4 structure. The primary core structures can be further elongated and branched by N-acetyllactosamine chains and/or terminated by blood group ABH-related structures, fucose, sialic acids, and sulfate groups (Fig. 2.3). Only mammalian cells natively express the full repertoire of glycosyltransferases that can recapitulate human O-glycosylation patterns. However, other eukaryotic cell platforms are under development in which human O- glycosylation pathways are being constructed from the ground up for the purpose of recombinant mucin production. Yeast cells have a functional Golgi apparatus that can generate mucin-type O-glycans if the appropriate machinery is introduced through genetic engineering. Yeast natively produce UDP-GlcNAc and UDP-Glc but lack an epimerase for conversion of the nucleotide sugar building blocks into UDP-GalNAc and UDP-Gal for construction of mucin O-glycans. Yeast also do not contain genes homologous to the ppGalNAcT gene family that initiates O- glycosylation nor genes for core O-glycan extension. To create strains capable of producing mucin-type glycopeptides, genes encoding Bacillus subtilis UDP-Gal/GalNAc 4-epimerase, human UDP-Gal/GalNAc transporter, human ppGalNAcT-1, and Drosophila melanogaster core 1 β1–3 GalT have been introduced into Saccharomyces cerevisiae88. The engineered strain can produce mucin-type glycopeptides containing O-linked GalNAc and core 1 structures. In yeast, the endogenous O-mannosylation pathway initiated in the ER is likely to compete for substrate sites with the Golgi-localized GalNAc O-glycosylation88,89. Future engineering efforts may be needed to attenuate O-mannosylation of mucin peptides and 25 mitigate concerns of immunogenicity and product heterogeneity. Nevertheless, the rapid cell growth, high protein yields, and commercial scalability of yeast expression systems make them an attractive candidate for production of mucins with simple O-glycans. Plants are another non-animal platform that offers the potential to design and build the pathways for mucin O-glycosylation from the bottom up. Human O-glycosylation has been successfully reconstructed in plants through the introduction of Pseudomonas aeruginosa UDP-Gal/GalNAc C4-epimerase and a combination of human ppGalNAcTs90. Unlike yeast, plants do not have competing O-glycosylation pathways for Ser and Thr residues91. However, plants produce another type of protein O-glycosylation, in which Pro is converted to hydroxyproline and subsequently appended with various O-glycans92. Hydroxyproline-linked O-glycans have been found on recombinant IgA1 and Muc1 peptides produced in plant cells93,94. Since hydroxyproline conversion does not appear to be essential for plant cell growth and viability, removal of the hydroxyproline-linked glycosylation pathway through genetic engineering may be possible95. Insect cell lines have garnered attention as a lower-cost alternative to mammalian cells for recombinant protein production. Insect cell lines offer several manufacturing advantages, including minimal risk for human-specific viral contamination and straightforward scalability due to the availability of optimized media formulations, no CO2 requirement for growth, and lower temperature requirements. Advances in baculovirus technology have simplified high- yield transgene expression in insect cells96. The most common insect cell platforms are derived from the fall armyworm Spodoptera frugiperda (Sf9, Sf21), the cabbage looper Trichoplusia ni (High Five™), and the fly Drosophila melanogaster (S2)97. All insect cell lines investigated have multiple genes homologous to the human ppGalNAcT gene family and 26 can natively generate mucin-type O-glycosylation98. Tn-antigens and core 1 type structures have been confirmed in insect cell lines, and extended core 1 and core 2 structures have been reported in drosophila embryos, confirming the innate capacity of insect cells to construct complex O-glycans99–101. Analysis of mucin-type O-glycans released from recombinant PSGL- 1 produced in Sf9 and High Five cells has revealed a large repertoire of complex O-glycan structures102. However, these structures include O-glycans containing hexuronic acid and phosphocholine substitutions, which are not found in humans and may be immunogenic. Insect cells also have very limited capacity to produce glycans with terminal sialic acid, and introduction of sialylation machinery is most likely required for abundant generation of mucin-type sialoglycans102,103. Thus, while insect cells are a promising platform for mucin production, careful glycoengineering is required for synthesis of sialylated human O-glycans without competing non-human glycan structures. CHO cells have been the most widely used mammalian cell platform for recombinant protein production, including therapeutic antibodies. Advantages of CHO cells include their robust growth, ease of clonal selection, adaptability to suspension culture for increased yields, the capability for complex glycosylation, and extensive safety and regulatory history. While CHO is an excellent candidate platform for mucin biosynthesis, O-glycosylation in conventional CHO production systems differs from human in several important ways, some of which have the potential to illicit immunological responses in humans 104 (Fig. 2.1B). First, CMP-Neu5Ac hydroxylase is inactive in human cells but its activity in CHO cells allows for the glycosylation of proteins with Neu5Gc sialic acids, which can provoke antibody responses105. Interestingly, the ability to synthesize Neu5Gc enhances the sialylation of CHO- driven proteins compared to those from human HEK293 cells106,107. Second, CHO cells may 27 also possess α1,3-galactosyltransferase activity, which is absent in humans, and generate protein products with Galα1,3-Gal residues (α-Gal) that are known to elicit adverse anaphylaxis reactions108. Although α-Gal is typically associated with N-glycans, α-Gal has been reported on Muc1 in cell lines that have α1,3-galactosyltransferase activity109. Third, several other glycosylation enzymes that are expressed in human are missing in CHO cells. For example, CHO cells lack α1,3/4-fucosyltransferase and galactose α2,6 sialyltransferase that are found in humans 104,110–112. Concern of adverse immunological reaction of CHO derived mucins is mitigated by their low expression of cytidine monophospho-N- acetylneuraminic acid hydroxylase (CMAH) and glycoprotein α-galactosyltransferase 1 (GGTA1), which primarily govern CMP-Neu5Ac hydroxylase and α1,3-galactosyltransferase activity, respectively. Recent advances in gene editing technology also enable straightforward knockout (KO) of CMAH and GGTA1, as well as knockin (KI) of genes to humanize O- glycosylation in CHO (See below), which may further improve the safety profile of CHO- derived mucins113. Recombinant mucin production in human cell lines can circumvent these problems by avoiding the risk of immunogenic reactions from non-human glycans, as well as generating products that only bear native human glycans. Advances in cell expression systems and media formulations have allowed for substantially increased productivity with human cell lines. Biotherapeutic products produced from the HEK293 and fibrosarcoma HT-1080 cell lines are now approved by the US Food and Drug Administration (FDA)114. Additional biotherapeutic products produced in the PER.C6, HKB-11, CAP and HuH-7 human cell lines are currently being evaluated by the FDA114. Of the human host production systems, HEK293 has emerged as a leading candidate to become the standard bearer for mucin manufacturing. In-depth 28 pathway understanding has led to remarkable glycoengineering efforts to create HEK293 progeny that generate precise and homogenous O-glycan structures (See below)53,54. Several HEK293 cell lines, including 293-F, 293-H, and 293-6E, have been adapted to high-density suspension growth in serum-free medium, enabling high density cultivation in bioreactors115– 117. At present, dozens of mucins and mucin TR sequences with expected glycosylation patterns have been successfully generated in HEK293 platforms52,54. Notably, recombinant mucins that have been produced in HEK293 and CHO cells display different glycosylation patterns. For instance, lubricin produced in 293-F cells displays a mix of core 1 and core 2 O- glycan structures, similar to the glycosylation patterns of native lubricin in humans118,119, whereas recombinant lubricin from CHO cells predominantly displays core 1 O-glycan structures, some of which may be sulfated 50,119. A possible disadvantage of using human cell lines is the potential for human-specific viral contamination, although this risk can be mitigated with viral inactivation and removal steps during downstream bioprocessing120 . Compared to CHO cells, HEK293 cells have a higher tendency to clump during growth in suspension, ultimately limiting cell proliferation, viability, and productivity due to oxygen limitations. Interestingly, expression of recombinant cell-surface mucins can substantially mitigate HEK293-F clumping in suspension121. Like CHO cells and other immortalized cell lines, HEK293 cells exhibit genomic instability characterized by chromosomal translocations, copy number alterations, and other events, raising concerns of karyotypic and phenotypic drift during prolonged cultivation122–124. The potential for drift in O-glycosylation patterns during long-term cultivation and subcloning in human cell lines must still be addressed. Since the HEK293 cell line originates from the 29 kidney of an aborted human embryo 125, ethical and religious concerns related to its use as a manufacturing platform have been raised. Fed-batch and perfusion cultures, in which additional media components and nutrients are added in small batches, semi-continuously, or in a continuous manner, are the typical standards for high-yield biomanufacturing126. The optimal development of these processes requires detailed knowledge of the production cell line and its specific metabolic requirements for a particular product. Metabolic waste products, such as ammonia, can interfere with the sialyation of O-glycan structures, highlighting the potential for generating different O-glycan structures depending on the specific metabolic conditions of the bioreactor127. Careful selection of media formulations, feeding schedules, and bioreactor operating conditions may be necessary to minimize generation of any unwanted glycan structures, including antigenic glycans and those with undesired biochemical activity. Knowledge of how nutrient availability and cellular metabolic program influences O-glycosylation during commercial production is rudimentary compared to the corresponding knowledge for N-glycosylation, which has been investigated in detail in the context of antibody production. Seemingly minor but important downstream bioprocessing strategies, including chromatography operations, must still be developed for mucins. For instance, solutions containing large mucins may not be amendable to ultimate sterilization through dead-end filtration using 0.22 µm or smaller filters, an industry standard approach. Nevertheless, the overall advances in genetic tools and cell production systems have improved production quantity and quality of the recombinant mucin and mucin-like proteins, paving the way for commercially viable mucin biomanufacturing. 30 3. O-Glycoengineering in mammalian cells There is a long history of genetic approaches to engineer the glycosylation machinery in mammalian cells for desired glycan phenotypes128. Earliest examples of these efforts come from the isolation of CHO glycosylation mutants by random mutagenesis and selection for resistance against cytotoxic plant lectins 129–131. Careful characterizations of an impressive range of lectin-resistant CHO mutants have led to key discoveries of glycosylation genes and pathways that regulate glycan biosynthesis130. With advancing knowledge in the glycosylation machinery, focus has also been placed on engineering mammalian cells by overexpressing, silencing, or knocking out specific glycosylation genes to create products with improved functionality, pharmacokinetics, and safety132,133. Decades of cumulative understanding in glycan biosynthesis has also enabled the assembly of detailed glycosylation gene maps to guide rational designs of genetic engineering efforts in cells 133. Combinatorial KI and KO strategies to globally engineer multiple glycosylation genes are providing tremendous opportunities in generating stable cell lines with desired glycosylation capabilities128. As one of many glycoengineering examples, mammalian cells have been rationally engineered by KI of ST6GAL1 in the background of KO of B4GALNT3/4 to eliminate terminal GalNAc and achieve near complete α-2,6-linked sialylation of N-glycans on biologics for improved pharmacokinetics134. Precise and stable manipulation of glycosylation genes in mammalian cells can now be rapidly achieved at high efficiency through recent developments in nuclease-based gene editing tools such as zinc-finger nucleases (ZFNs), transcription activator–like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat/targeted Cas9 endonuclease (CRISPR/Cas9)128. Untargeted manipulation of glycosylation genes for 31 modulating protein glycosylation status remains a popular and powerful strategy for recombinant protein production135,136. However, the untargeted integration of ectopic glycosylation genes typically does not offer fine control over expression levels, and low knockdown efficiency using RNA interference can result in residual gene activities, both of which can lead to recombinant products with unintended glycosylation137. Precision KI and KO strategies avoid these pitfalls and can provide stable, homogenous glycosylation patterns. An outstanding early example is the ZFN targeted KO of COSMC in CHO cells to construct stable ‘SimpleCells’ capable of generating homogenous O-GalNac glycans, marking a key advance towards precision glycoengineering 138. The ZFN-mediated SimpleCell strategy has also been applied to HEK293 and 12 human cancer cell lines from different organs 85, indicating that such strategies are broadly applicable in mammalian systems. CRISPR/Cas9 genome editing system has emerged as a highly versatile tool to perform precise gene targeting for glycoengineering, particularly for combinatorial KO of genes. A validated guide RNA library is available for highly efficient CRISPR/Cas9 targeting of the complete human glycosyltransferase genome139. Multi-gene targeting can be achieved with CRISPR/Cas9 through simultaneous introduction of multiple guide RNAs into cells along with the CRISPR/Cas9 machinery140. Multi-gene targeting can greatly shorten the process of higher-order mutant generation and screening, as evidenced by the large library of glycoengineered HEK293 cell lines that Narimatsu, Clausen, and colleagues have generated in recent years using these approaches 53,54,141. A isogeneic library has been created for edited O-glycosylation pathways that includes engineered lines for relatively homogenous construction of Tn, sialyl-Tn, T, sialyl-T, disialyl-T, and core 3 O-glycans, as well as increased O-glycan sulfation and core 2 extension53,54,141. Although these engineered systems 32 have primarily been used for basic research, such as evaluation of lectin-glycan binding specificities, their potential application in mucin manufacturing is obvious. Control over where glycans are initiated and elongated along the mucin backbone is still difficult to achieve through glycoengineering alone. Part of this challenge stems from the presumably redundant nature of the 20 ppGalNAcTs in initiating O-glycosylation. However, the realization that ppGalNAcT isoenzymes may have stronger than realized preferences for specific mucin substrate sequences raises the possibility of tuning glycan site occupancy through rational manipulation of ppGalNAcT expression 58,142,143. O-glycans largely have been overlooked as a design parameter in the biomanufacturing, as most of the attention has focused on N-glycosylation. The recent advances in precision O-glycan editing should usher a more complete understanding of the importance of specific O-glycan structures in the in vivo function of mucins and other biologics containing mucin domains, including how O-glycans may mediate desired or undesired interactions with the immune system. For instance, comparatively little is known about the biological importance of O-glycan sulfation or O- glycan heterogeneity on a single mucin backbone. 4. Quality control and evaluation of glycosylation Evaluating glycosylation patterns on recombinant proteins is essential for developing the proteins in therapeutic applications. Mass spectroscopy (MS) analysis remains the gold standard for comprehensive analysis of glycan structures and glycan site occupancy. In typical glycomics workflows, glycans are released from the polypeptide backbone and analyzed to generate detailed information about the glycan structures. Although structural details can be obtained, information about the occupancy of the glycans along the polypeptide backbone is 33 inevitably lost in glycomics when the glycans are released. Since there is no single enzyme that can release all O-glycan structures, chemical methods are currently employed to harvest glycans. Under alkaline conditions, the O-glycosidic linkage between the reducing glycan sugar and Ser/Thr residues is labile and readily hydrolyzed144. The annotation of glycan structures in MS datasets has historically required high levels of human expertise. However, advances in software, construction of structural databases, and standardization of experimental workflows are increasingly simplifying MS analyses145. Standardized guidelines, such as the Minimum Information Required for A Glycomics Experiment (MIRAGE), should be followed to ensure rigorous and reproducible glycomics analyses146. MS analysis of O-glycopeptides can provide additional information regarding glycan site occupancy and heterogeneity. Historically, O-glycopeptides analysis of larger mucins has been challenging due to 1) the resistance of mucins to complete proteolytic digestion with conventional enzymes, 2) the large number of glycopeptides that can be generated from a single large mucin, and 3) the dozens of possible O-glycan structures that can potentially occupy any Ser or Thr site. Standard algorithms in commercial software packages are ill- equipped to handle the massive computational task of considering all theoretically possible glycopeptides for a large mucin. However, highly efficient algorithms recently have been developed that substantially decrease the amount of processing time required to confidently complete glycosite localizations and glycopeptide identifications for mucin derived samples147. Glycosite identification can also be simplified through mucin synthesis in glycoengineered cell lines that generate homogenous O-glycan structures148. A notable advancement in O-glycoproteomics is the identification of a growing number of bacterial O- glycan-specific proteases (O-glycoproteases) that can specifically degrade mucins. These 34 enzymes include OgpA from Akkermansia muciniphila (OpeRATOR), secreted protease of C1 esterase inhibitor (StcE) from enterohaemorrhagic Escherichia coli, SmEnhancin (SmE) from Serratia marcescens, and IMPa from Pseudomonas aeruginosa, which successfully have been applied for O-glycopeptide generation or enrichment in glycoproteomics workflows that are well-suited for characterization of recombinant mucins 149–153. Fig 2.4. Validated probes for evaluation of glycosylation. All indicated probes have been validated for specificity on printed or cell-based glycan arrays in the indicated references. Additional binding motifs are indicated, if known. 35 For more routine or targeted analyses, biomolecules such as lectins or antibodies for specific glycans can serve as resources of quality control of glycosylation features in recombinant mucin production. Printed glycan arrays from the Consortium for Functional Glycomics and new cell-based glycan arrays have provided more complete definition of lectin binding specificities141,154 (Fig. 2.4). Naturally derived lectins that recognize the Tn-antigen include Helix pomatia agglutinin (HPA) and Helix aspersa agglutinin (HAA), which can also recognize terminal GlcNAc. Core O-glycans are specifically recognized by Amaranthus caudatus lectin (ACL), Peanut agglutinin (PNA), Maclura pomifera lectin (MPL), and Artocarpus integrifolia lectin (AIA, Jacalin). ACL recognizes the T (Galβ1-3 GalNAc), sialyl T (α2-3-sialylated Galβ1-3 GalNAc), and disialyl T (α2-3-sialylated Galβ1-3 [α2-6-sialylated] GalNac) structures, as well as core 2 O-glycans. Jacalin and MPA specifically bind the non- substituted, initiating GalNAc and various substituents at the 3’ position, including galactose (T structure) and GlcNAc (core 3). PNA recognizes the T structure and tolerates GlcNAc substitution at the 6-position of GalNac (core 2). The Maackia amurensis hemagglutinin (MAH/MAL-II) recognizes the α2-3-sialic acid of the sialyl T and disialyl T structure, as well as 3-O-sulfated galactose155. Sambucus Nigra lectin (SNA) and Maackia amurensis leucoagglutinin (MALI/MAA) probe α2,6 and α2,3 linked sialic acid, respectively, but they likely have a much stronger preference for N-glycans 154,155. SNA has been reported to have minimal reactivity with the sialyl Tn structure79. Natural lectins with selectivity for additional O-glycan epitopes, including N-acetyllactosamine chains, blood group ABH-related structures, and fucose, have been identified154. Although most lectins are derived from natural sources, several recombinant glycan binding proteins (GBPs) with affinity to O-glycans have been described in recent years. 36 Compared to natural lectins, recombinant GBPs offer advantages that can include lower batch-to-batch variability, supply chain consistency, and opportunities for further engineering. The carbohydrate-binding module 40 (CBM40) from Clostridium perfringens NanI has high affinity to α2,3-linked sialosides ( of ~ 30 µM)156. The GBP has been engineered into a higher affinity divalent form, diCBM40156. Unlike MAL-II, diCBM40 specifically recognizes α2,3-linked sialosides with minimal cross-specificity for 3-O-sulfated epitopes. Siglec-like adhesins derived from Streptococcus mitis (10712BR) and Streptococcus gordonii IaBR) have specific reactivity with sialylated O-glycans53,157. HsaBR specifically binds to the sialyl T structure and may tolerate the core 2 branch53,158. The preferred binding epitope of 10712BR is core 2 O-glycans with two terminal α2-3SA residues, making it one of the only GBPs or lectins available with a preference for core 2 structures53. Recently, mucin-binding reagents have been developed from catalytically inactivated mucinases. The C-terminal glycan-binding domain of StcE, named X409, has general binding properties for mucins with a wide variety of O-glycan substitutions, as well as specific affinity for the disialyl T structure 54,150,159. Similarly, GBPs called Lectenz® have been developed from catalytically inactivated enzymes for detecting α2,3-linked sialosides and sialic acid in a linkage independent manner 79. While the specificities of most commercial glycan antibodies against printed or cell-based glycan arrays are not available, several sialosyl-Tn antibody (B72.3, CC49, STn219) have been validated as highly specific to their intended target 160. In addition to affinity probes for quality control of glycosylation pattern on target proteins, several biochemical and label-free approaches for glycan screening are applicable to mucins. Periodic acid-Schiff’s reagent (PAS) staining has been widely used for analysis of carbohydrates and glycoconjugates. O-glycan constituents including GalNAc, GlcNAc, and 37 sialic acid provide strong responses in standard colorimetric assays based on the PAS reagent161. Fourier transform infrared (FTIR) spectrometry has been developed for one-step, label-free evaluation of protein glycosylation. FTIR spectra provide unique signatures that are sensitive to minor changes in glycosylation and may be useful for rapid detection of changes in mucin glycosylation162,163. FTIR is also an excellent tool for monitoring metabolic incorporation of azido sugars into glycans (See below), since azide moieties have a strong FTIR peak at ~2120 cm−1 in the “bio-silent” region of the IR spectra164. Similarly, Raman spectroscopy can provide sensitive detection of alkyne functionalized glycans165. Together, the combination of such approaches with MS-based glycomics and affinity-probe analysis can provide the necessary analysis of glycosylation for process development, process monitoring, and final product assessment. 38 Fig 2.5. Approaches for engineering and functionalization of mucin O-glycans. (A) Aldehydes are selectively introduced into sialic acids by periodate oxidation or into terminal galactose and N-acetylgalactosamine with galactose oxidase. The aldehydes can subsequently be coupled with aminooxy containing chemistries using aniline as a catalyst. (B) Glycan structures can be labelled, extended, and modified in vitro through chemoenzymatic approaches. (C) The cellular glycome and the glycosylation patterns on individual mucins can be tuned through individual or combinatorial knockout and knockin of glycosyltransferases (D) Metabolic glycoengineering exploits exogenous unnatural sugars, such as Ac4ManNAz, to functionalize glycan structures for subsequent bioconjugation with click chemistry. 5. Chemical and enzymatic modification of recombinant mucins Mucin biopolymers have complex chemistries that make them ideal building platforms for diverse chemical conjugations40. In addition to a plethora of protein engineering approaches134,166–169, O-glycosides that heavily decorate mucin backbones present tantalizing 39 and as-of-yet mostly untapped opportunities for bioconjugation to increase mucin functionality. In this regard, for example, glycoengineering has become a critical tool for the bioconjugation of drug payloads and for introducing chemical handles to enable molecular assembly and targeted conjugation of bioactive materials41,170,171. GalNAc and sialic acids on mucins are ideally suited for chemical modification and subsequent bioconjugation reactions that are highly specific, easy to perform in aqueous solutions, and proceed at fast kinetics with little or no inoffensive by-products that typify bioorthogonal click chemistry169. In this section, we will focus on the enormous efforts in chemical glycoengineering and briefly discuss how these efforts may be applied to recombinant mucins (Fig. 2.4). 5.1. Glycan tagging by direct oxidation Glycans are rich in natural monosaccharides displaying a vicinal diol that can be oxidatively cleaved by sodium periodate to generate an aldehyde handle172,173, which can subsequently undergo imine/oxime ligation for conjugating molecules bearing hydrazine or aminooxy groups174. Periodate oxidation has emerged as a popular approach to directly modify glycans as the oxidation can be carefully tuned to selectively target sialic acids through mild conditions175,176. The practical applications of these approaches have been greatly propelled by nucleophilic catalysts, such as aniline, which allow imine/oxime ligation to proceed with fast kinetics at neutral pH to label biomolecules177–179. An important example is the development of Periodate oxidation and Aniline catalyzed oxime Ligation (PAL) that has permitted the glycan labelling of live cells without cytotoxicity180,181. Note that periodate indiscriminatory labels a variety of sialic acid containing structures on both N- and O- glycans175, which may be useful to maximally derivatize mucins if the lack of conjugation site-specificity does not impede downstream applications41. 40 Alternatively, enzymatic oxidation may be employed as a highly selective strategy. Terminal GalNAc monosaccharides, which are exclusively found on mucins O-glycans, can be enzymatically and selectively oxidized to generate aldehyde handles using a specific galactose oxidase182. Genetically engineered variants of galactose oxidases have been developed to specifically target a range of other monosaccharides183. Note that homogenous decoration of mucins with terminal GalNAc can be achieved in SimpleCell platforms 138. Enzymatic approaches may also sidestep potential off-target oxidation by sodium periodate on vicinal amino alcohols of terminal Ser and Thr 184,185, which may be alternatively circumvented by N-terminal protection strategies186. Note that imine/oxime ligations can suffer from reactions towards carbonyl groups that occur in complex biological system, but these groups should be absent on purified recombinant mucins unless introduced. The high efficiency and the minimal requirement of an intact live cell system make chemical and enzymatic oxidation, in combination with imine/oxime ligation, particularly attractive conjugation strategies of purified mucin products. With the fast-growing interests in using recombinant mucins as building materials for pharmaceutical and tissue engineering applications187, it is highly conceivable that these approaches hold great potential for conjugating desirable functional groups and cargos to mucins for increased functionalities. 5.2. Glycoengineering with tagged monosaccharide analogs Instead of direct modifications of natural monosaccharides already installed on glycans, chemical handles can also be introduced onto mucin glycans using modified unnatural monosaccharide building blocks that contain bioorthogonal functional group. Glycotransferases are known to tolerate structural promiscuity for substrate analogs that share high similarity with their natural counterparts188,189. Notably, studies in the late 1980s 41 demonstrated that even large fluorophores attached to the C9 position of sialic acids can be tolerated by sialyltransferases190. Since then, unnatural monosaccharides with a variety of functional groups have been developed and successfully incorporated by glycotransferases onto glycans, including ketone, thiol, azide, alkyne, alkene, cyclopropane, isonitrile and cyclooctyne191–193. Unnatural monosaccharides bearing an azide or alkyne are among the most widely biorthogonal handles for chemical glycoengineering. While azides are fully abiotic and alkynes are mostly found outside of vertebrates194, these two functional groups are both small in size and remain intact in biological systems until desired reaction as they are essentially inert to other biomolecules169. Azide and alkyne are complementary reaction partners and once tagged, the azide modified glycans can be stably conjugated to alkyne-containing molecules (and vice versa) through azide-alkyne cycloaddition, which can be copper- catalyzed or accelerated by strain-promoted cyclooctynes to avoid the complication of copper in living systems. The rapid growth of the field of chemical glycoengineering has vastly advanced the number of available unnatural monosaccharides and complementary bioorthogonal functional groups, and greatly improved reaction conditions for bioorthogonal click chemistry to enable fast and efficient bioconjugation192. In general, two main approaches have been employed to incorporate unnatural monosaccharides for chemical glycoengineering: metabolic glycoengineering (MGE) and chemoenzymatic glycan labelling (CeGL) 5.2.1. Metabolic glycoengineering MGE relies on the presence of live cells for the uptake, processing, and incorporation of unnatural monosaccharides into glycans via cellular biosynthetic pathways192. This 42 approach is particularly powerful as simple monosaccharide precursors for glycan biosynthesis are relatively easy to chemically synthesize and modify with desired functional groups. Once inside the cell, MGE exploits endogenous enzymes in glycan biosynthetic pathways to further process the precursor analogs for nucleotide activation and incorporation onto glycoconjugates by glycosyltransferases. MGE has expanded over the past three decades and emerged as a major tool for introducing chemical handles into a broad range of glycans in living systems for bioconjugation, using a variety of monosaccharide analogs and numerous ligation chemistries that are now readily available195–199. Notably, MGE has made use of unnatural GalNAc and analogues for N-acetylmannosamine (ManNAc)193,200, which is the committed biosynthetic precursor of sialic acids that frequently caps the glycans. Thus, besides the impressive collection of toolkits, one major advantage of MGE is the potential to decorate recombinant mucins with useful functional groups during biosynthesis195. The main drawbacks of MGE lie in the variability of glycan labelling efficiency and specificity. The success of MGE depends on both the efficiency of incorporation of unnatural monosaccharides into glycans and the efficiency of the subsequent bioconjugation201–204. Monosaccharide analog design and the choice of host cell type can significantly impact the specificity, efficiency and the rate of the metabolic incorporation196,203,205–208. Optimized ligation reactions can also drastically increase the efficiency of the subsequent bioconjugation204. These are thus important considerations for efficient and reliable mucin derivatization via MGE. Crosstalk between biosynthetic pathways can lead to the interconversion of monosaccharide analogs into unintended precursors and skew their incorporation into “off-target” glycan structures. An important example is the epimerization of GalNAc analogs, catalyzed by the UDP-GlcNAc/GalNAc epimerase (GALE), into the 43 corresponding GlcNAc counterparts that are then incorporated into glycoproteins at high efficiency 209. As mucins have the potential to carry a variety of glycan structures, these complications may pose significant challenges if homogenous and chemically or structurally defined functionalized mucin products are sought after. Tremendous efforts to unravel enzyme specificity in cellular biosynthetic pathways have led to improved metabolism and incorporation of unnatural monosaccharides200,210–213. Notably, the realization that native enzyme active sites may not optimally accommodate and catalyze unnatural monosaccharides has motivated the bump-and-hole strategy210,212, where mutant enzymes are rationally designed and engineered to contain an enlarged active site (hole) for novel analogs bearing a bulky functional group (bump). These efforts have led to the development of a remarkable analog, N-(S)-azidoalaninyl galactosamine (GalNAzME), that is not epimerized into GlcNAc, and a mutant pyrophosphorylase structurally engineered to accommodate and metabolize GalNAzME to enable precision metabolic labelling of mucin O-glycans in live cells 214. Additional metabolic incorporation of GalNAzME can be boosted by the expression of a rationally designed c-and-hole GalNAc transferase. Once functionalized with GalNAzME, cell surface and secreted mucins can then undergo azide- alkyne cycloaddition for bioconjugation. Interestingly, bypassing metabolic bottlenecks by the introduction of custom designed biosynthetic enzymes may also dramatically improve the incorporation of poorly metabolized alkyne-bearing sugar analogues215. It is conceivable that similar approaches can likewise enrich mucins with specific chemical handles of choice. With the advent of powerful cell glycoengineering strategies for tuning specific mucin O- glycosylation patterns 53,54, these approaches should have great potential towards producing recombinant mucins with defined functionalization via MGE128,216. 44 5.2.2. Chemoenzymatic glycan labelling Complementary to MGE, CeGL employs cell-free recombinant glycosyltransferases to directly install modified monosaccharides from a suitable glycosyl donor onto mature glycoproteins 217. This strategy circumvents the need for metabolic processing of cell- permeable monosaccharide analogs by live-cell biosynthetic pathways, which involve several enzymatic steps and metabolic bottlenecks. CeGL can therefore potentially accommodate unnatural sugars with bulkier functional groups to maximally exploit the promiscuous nature of glycosyltransferase enzymes for their substrates. On the other hand, the lack of an endogenous biosynthetic system also requires CeGL approaches to use activated sugar- nucleotide analogs, which can be challenging to synthesize and modify, that are ready to be transferred onto a suitable glycan acceptor by glycosyltransferases. Although first described in 1979218, CeGL has recently re-emerged as a powerful technique to combine with bioorthogonal click chemistry for introducing chemical handles into glycoconjugates217,219. Like MGE, once tagged, glycoproteins such as mucins can undergo bioconjugation for further functionalization and derivatization via click reactions. However, the popular use of CeGL for glycoengineering has been largely hampered by the small number of available unnatural monosaccharides217. CeGL can be used to selectively functionalize mucin O-glycans by exploiting the inherent substrate specificity of known glycosyltransferases. One common approach has employed a mucin O-glycan specific sialylatransferase, ST3Gal1, to selectively incorporate CMP-sialic acid analogs bearing either an azide or alkyne handle into mucin O-glycans for subsequent conjugations220–222. Efforts on expanding the CeGL toolbox have led to the development of novel nucleotide sugar derivatives of UDP-GalNAc and UDP-GlcNAc