Artificial Intelligence-Assisted Single-Cell Ramanome for Large-Scale Microbiome Data Mining towards Biological Discovery and Sustainable Development
Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.
During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.
Single-cell technologies are transforming microbiome research by enabling high-resolution analysis of individual microbial cells across complex systems, towards novel biological discovery and sustainable development. However, the scale, complexity, and heterogeneity of modern datasets demand computational frameworks capable of extracting robust functional and taxonomic insights to inform system-level diagnostics and applications. In this dissertation, we developed both the hardware and software components of the artificial intelligence (AI)-assisted PyRamanome platform, integrating single-cell Raman spectroscopy (SCRS) and cell-sorting modules with multi-omics datasets to establish a scalable framework for large-scale microbiome phenotyping, genotype-phenotype mapping, and discovery of novel biological agents—advancing sustainable system diagnostics and applications in the context of global climate change and the United Nations Sustainable Development Goals (SDGs). First, we developed a label-free strategy combining SCRS, Raman-activated cell sorting (RACS), and targeted metagenomics to identify functionally novel microorganisms. This led to the discovery and conceptualization of methanotrophic polyphosphate-accumulating organisms (PAOs) capable of simultaneous methane oxidation and phosphorus recovery—a previously unrecognized function in low-carbon resource management—with potential to generate $180–509 million and reduce greenhouse gas emissions by 3.8–13.9 fold if applied to 10% of U.S. manure wastewater, supporting the circular economy and multiple SDGs. Second, to scale the single-cell Ramanome-enabled phenotypic analysis, we developed the standardized PyRamanome software, a Python-based workflow for standardized SCRS data processing, operational phenotypic unit (OPU)-enabled functional characterizations, and deep learning-based classification of taxonomy and phenotypes. Third, this framework was applied across >90,000 single cells spanning pure cultures, rhizosphere microbiomes across varying genotypes and crop types, and wastewater microbiome across North America. We uncovered niche-driven phenotypic structuring in the rhizosphere linked to plant fitness and identified OPUs predictive of wastewater treatment health diagnostics. Across all ecosystems, AI-assisted Ramanome analytics enabled high-resolution functional profiling and classification with >95% accuracy. Fourth, this AI-assisted PyRamanome framework was integrated with fluorescence-activated cell sorting (FACS) to examine targeted microbial responses to climate warming in grassland soils, where warming increased both taxonomic and functional diversity of PAOs and enhanced microbial network stability, supporting their role in resilient phosphorus cycling for sustainable resource management. Lastly, we synthesized the global role of PAOs across terrestrial, aquatic, and engineered systems, highlighting their global ubiquity, evolutionary origin, co-cycling of phosphorus with carbon and nitrogen, and central function in biogeochemical models and sustainable nutrient management. Together, this work provides an innovative and unified AI-assisted single-cell omics platform, PyRamanome, for large-scale microbiome research, advancing novel biological discovery and enabling explainable, predictive, and climate-resilient solutions for sustainable development across complex environmental and engineered systems.