High-Throughput Characterization of Foodborne Pathogens using Next-Generation Sequencing
Next-generation sequencing (NGS) is being increasingly employed to characterize food-associated microbes and communities, including those which pose a threat to human health. As the amount of publicly available genomic data from these organisms increases, (i) rapid, scalable methods for inferring biological function from large amounts of NGS data are needed, and (ii) meaningful biological conclusions derived using these methods can be leveraged to improve safety along the food supply chain. The studies reported here detail the application of whole-genome sequencing (WGS) to two groups of organisms which differ in terms of the challenges they pose to human health: (i) non-typhoidal Salmonella enterica, a well-characterized, Gram-negative foodborne pathogen which boasts a large repertoire of established computational methods for analyzing WGS data derived from it, and (ii) the lesser-sequenced Bacillus cereus group, which consists of closely related, Gram-positive, spore-forming species which vary in their ability to cause disease in humans. For Salmonella enterica, antimicrobial resistance (AMR) was of particular concern; WGS was used to characterize 90 AMR strains isolated from either human or bovine hosts from New York or Washington State. In addition to predicting phenotypic resistance to a panel of twelve antimicrobials with high accuracy (mean sensitivity and specificity of 97.2% and 85.2%., respectively), in silico characterization of AMR determinants present in all isolates unveiled significant geographic and host associations, including quinolone resistance, which was only observed in human isolates from Washington State. Additionally, one multidrug-resistant, colistin-susceptible Salmonella Typhimurium strain was found to harbor mcr-9, a novel plasmid-mediated colistin resistance gene. For Bacillus cereus, classification of isolates based on virulence potential was the primary focus. An in silico typing tool designed to rapidly identify B. cereus group virulence factors and taxonomic affiliation using WGS data is described. This application, named BTyper, was used to query all Bacillus cereus group genomes submitted to NCBI’s Genbank database (n = 662, accessed April 6, 2017). Additionally, BTyper was used to characterize the genomes of 33 B. cereus group strains isolated in conjunction with a 2016 outbreak. Thirty genomes were classified as emetic Bacillus cereus and predicted to be the cause of a single-source outbreak using a combination of computational, microbiological, and epidemiological methods. Overall, the results presented here showcase how NGS can be used to characterize food-associated microbes at greater resolution than preceding technologies. Additionally, computational and statistical methods used to analyze Illumina data derived from foodborne pathogens are emphasized. The tools and methods detailed here can serve as a guide for deriving biologically informed conclusions from WGS data.