HIGH-THROUGHPUT CHARACTERIZATION OF
FOODBORNE PATHOGENS USING
NEXT-GENERATION SEQUENCING
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Laura M. Carroll
August 2019
©c 2019 Laura M. Carroll
ALL RIGHTS RESERVED
HIGH-THROUGHPUT CHARACTERIZATION OF FOODBORNE
PATHOGENS USING NEXT-GENERATION SEQUENCING
Laura M. Carroll, Ph.D.
Cornell University 2019
Next-generation sequencing (NGS) is being increasingly employed to char-
acterize food-associated microbes and communities, including those which
pose a threat to human health. As the amount of publicly available genomic
data from these organisms increases, (i) rapid, scalable methods for inferring
biological function from large amounts of NGS data are needed, and (ii) mean-
ingful biological conclusions derived using these methods can be leveraged to
improve safety along the food supply chain. The studies reported here detail
the application of whole-genome sequencing (WGS) to two groups of organ-
isms which differ in terms of the challenges they pose to human health: (i) non-
typhoidal Salmonella enterica, a well-characterized, Gram-negative foodborne
pathogen which boasts a large repertoire of established computational methods
for analyzing WGS data derived from it, and (ii) the lesser-sequenced Bacillus
cereus group, which consists of closely related, Gram-positive, spore-forming
species which vary in their ability to cause disease in humans.
For Salmonella enterica, antimicrobial resistance (AMR) was of particular con-
cern; WGS was used to characterize 90 AMR strains isolated from either hu-
man or bovine hosts from New York or Washington State. In addition to pre-
dicting phenotypic resistance to a panel of twelve antimicrobials with high
accuracy (mean sensitivity and specificity of 97.2% and 85.2%., respectively),
in silico characterization of AMR determinants present in all isolates unveiled
significant geographic and host associations, including quinolone resistance,
which was only observed in human isolates from Washington State. Addi-
tionally, one multidrug-resistant, colistin-susceptible Salmonella Typhimurium
strain was found to harbor mcr-9, a novel plasmid-mediated colistin resistance
gene.
For Bacillus cereus, classification of isolates based on virulence potential was
the primary focus. An in silico typing tool designed to rapidly identify B. cereus
group virulence factors and taxonomic affiliation using WGS data is described.
This application, named BTyper, was used to query all Bacillus cereus group
genomes submitted to NCBI’s Genbank database (n = 662, accessed April 6,
2017). Additionally, BTyper was used to characterize the genomes of 33 B. cereus
group strains isolated in conjunction with a 2016 outbreak. Thirty genomes
were classified as emetic Bacillus cereus and predicted to be the cause of a single-
source outbreak using a combination of computational, microbiological, and
epidemiological methods.
Overall, the results presented here showcase how NGS can be used to char-
acterize food-associated microbes at greater resolution than preceding technolo-
gies. Additionally, computational and statistical methods used to analyze Illu-
mina data derived from foodborne pathogens are emphasized. The tools and
methods detailed here can serve as a guide for deriving biologically informed
conclusions from WGS data.
BIOGRAPHICAL SKETCH
Laura M. Carroll grew up in Houghton, Michigan. She attended Michigan State
University from 2009 to 2014, where she received a Professorial Assistantship
through the university’s Honors College to conduct research under the direc-
tion of Professor Brad Marks. As a member of the Biosystems Engineering Food
Safety Laboratory, Laura spent five years developing mathematical models to
predict the thermal inactivation of foodborne pathogens in various food ma-
trices, with an emphasis on modeling the physiological response of Salmonella
enterica to prolonged periods of sublethal thermal stress.
After graduating with a B.S. in Genomics and Molecular Genetics and a B.A.
in History, Laura began her graduate studies at Cornell University under the
direction of Professor Martin Wiedmann. As a doctoral student, Laura’s re-
search focused on (i) developing bioinformatic pipelines to rapidly character-
ize bacteria in silico using next-generation sequencing data, and (ii) using those
pipelines to analyze large genomic data sets from bacterial isolates and micro-
bial communities. During her time at Cornell, Laura received a National Sci-
ence Foundation (NSF) Graduate Research Fellowship, and, later, a NSF Grad-
uate Research Opportunities Worldwide (NSF GROW) award, which allowed
her to spend time as a visiting researcher with Professor Tanja Stadler’s Com-
putational Evolution Group at ETH Zurich in Switzerland. She additionally
spent several months as a graduate intern with IBM’s Industrial and Applied
Genomics Group, where she was first introduced to metagenomic and meta-
transcriptomic data analysis methods. After completing her Ph.D., Laura will
be focusing primarily on metagenomic and metatranscriptomic data analysis as
a Postdoctoral Fellow in the group of Dr. Georg Zeller at the European Molecu-
lar Biology Laboratory (EMBL) in Heidelberg, Germany.
iii
To my parents, for their unwavering love and support
iv
ACKNOWLEDGEMENTS
It is impossible to allocate the space necessary to adequately thank all of
those who have helped me reach this point in my career. I am indebted to my
committee members, Drs. James Booth and Michael Stanhope, for their guid-
ance and mentorship, as well as the National Science Foundation Graduate Re-
search Fellowship Program (NSF GRFP) and associated NSF Graduate Research
Opportunities Worldwide (NSF GROW) program for their generous funding.
I would not be here, in the most literal sense, without the support of my fam-
ily: my mother, who, as a woman in STEM, has been a role model available to
me for the entirety of my life; my father, for his unconditional love and support,
even when I pushed the boundaries of ”unconditional”; my brother, for his will-
ingness to drop everything to help me, even when I probably (read: definitely)
don’t deserve it; and my sister, who has been, and will forever be, my confidant,
favorite labmate, and best friend.
Professionally, I am beholden to my undergraduate research advisor, Dr.
Brad Marks, for the essential mentorship he provided while I navigated my
undergraduate years and transition to graduate school; Nicole Hall, who even-
tually molded me into a semi-functioning member of a laboratory; Dr. Teresa
Bergholz, whose guidance (and patience) nurtured my love of research (but not
RNA); Dr. Henk den Bakker, who helped me hit the ground running in my first
few weeks of graduate school (and continues to help me, even when I pester
him from afar); Dr. Richard Pereira, who guided me through my first research
project at Cornell; Drs. Simone Bianco and Kristen Beck, whose mentorship
during my time at IBM fostered my love of shotgun metagenomic and meta-
transcriptomic data analysis; Dr. Ahmed Gaballa, who is possibly the only per-
son on the planet as enthusiastic about colistin resistance as I am; and Dr. Tanja
v
Stadler, who welcomed me into her group and made my time in Switzerland
easily one of the most transformative experiences I have had as a graduate stu-
dent and researcher.
I am especially grateful for the guidance I have received from Drs. Jasna Ko-
vac (my ”Bacillus advisor”), Claudia Guldimann (my ”Salmonella advisor”), and
Rachel Cheng, who have been incredible mentors, role models, collaborators,
and friends throughout my graduate career. I consider myself incredibly fortu-
nate to be able to work alongside such brilliant researchers who display such an
aspirational work ethic and level of scientific creativity.
Continuing on a personal level, I am indebted to all of those on whom I have
leaned at various times during my graduate career and beyond: my soulmates,
Rachel Allison, Geoff Pleiss, and Tobias Schnabel; my sisters, Corinna Noel
and Jillian Jastrzembski; my ”sisters”, Ariel Buehler and Lory Henderson; moja
sestra, Svetlana Lyalina; my Swiss-ters, Jana Huisman, Rachel Warnock, Joelle
Barido-Sottani, and Julia Pecherska; Venelin Mitov and Daniel Scain Farenzena,
who went out of their way to make Basel feel like my second home; and to
all those who have been there for me in more ways than they can possibly
know: Pedro Menchik; Bryan Peele; Madeleine Bee; Emily Griep; Jeff Tokman;
Sophia Harrand; Beth Burzynski; Gorjan Dukovski; Richard Goater; Veronica
Guariglia; Dave Kent; Vlad Niculae; Madelyn Shoup; Hilary Podgers; Brittany
Massa; Morgan Frost; Kylie Gignac; Ian Hildebrandt; Dani Smith; and Sarah
Buchholz.
I owe additional gratitude to all of my labmates, past and present, whom
I was unable to list here, particularly my friends and colleagues in the Biosys-
tems Engineering Food Safety Laboratory at Michigan State University, IBM’s
Industrial and Applied Genomics Group, the Computational Evolution Group
vi
at ETH Zurich, and the Food Safety Laboratory and Milk Quality Improvement
Program at Cornell University.
Finally, I would like to thank my advisor of the past five years, Dr. Martin
Wiedmann. Articulating how grateful I am to have him as a mentor is com-
pletely futile; the level of independence and flexibility he has afforded me as a
doctoral student to pursue nearly every research question that I could dream up
is incomparable (and, as he would probably argue, excessive). I consider my-
self infinitely fortunate to have been a member of his laboratory, and I will never
take the knowledge, skills, and lessons he has taught me, both as a researcher
and as a person, for granted.
vii
TABLE OF CONTENTS
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction1 1
1.1 Next-Generation Sequencing: an Overview . . . . . . . . . . . . . 2
1.2 NGS Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 NGS Applications: Whole-Genome Sequencing of Microbial
Contaminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 NGS Applications: RNA Sequencing (RNA-Seq) of Food-
Relevant Organisms . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 NGS Applications: High-Throughput Amplicon Sequencing . . . 10
1.6 NGS Applications: Shotgun Metagenomic and Metatranscrip-
tomic Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Whole-genome sequencing of drug-resistant Salmonella enterica iso-
lates from dairy cattle and humans in New York and Washington states
reveals source and geographic associations 2 26
2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Isolate selection . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Phenotypic AMR testing . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Whole-genome sequencing . . . . . . . . . . . . . . . . . . 33
2.3.4 Initial data processing and genome assembly . . . . . . . . 33
2.3.5 In silico serotyping and MLST . . . . . . . . . . . . . . . . . 34
2.3.6 In silico AMR gene detection . . . . . . . . . . . . . . . . . 34
2.3.7 Initial phylogenetic tree construction and reference
genome selection . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.8 Reference-based variant calling . . . . . . . . . . . . . . . . 36
1From Wiedmann, Martin and Laura M. Carroll (2019). ”Next-Generation Sequencing”. In:
Encyclopedia of Food Chemistry , pp. 376-383. DOI: 10.1016/b978-0-08-100596-5.21792-7.
2From Carroll, Laura M., Martin Wiedmann, Henk den Bakker, Julie Siler, Steven Warchocki,
David Kent, Svetlana Lyalina, Margaret Davis, William Sischo, Thomas Besser, Lorin D. War-
nick, and Richard V. Pereira (2017). ”Whole-Genome Sequencing of Drug-Resistant Salmonella
enterica Isolates from Dairy Cattle and Humans in New York and Washington States Reveals
Source and Geographic Associations”. In: Applied and Environmental Microbiology 83, pp. e00140-
17. DOI: https://doi.org/10.1128/AEM.00140-17.
viii
2.3.9 Plasmid replicon detection . . . . . . . . . . . . . . . . . . 37
2.3.10 Statistical analyses . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.11 Accession number(s) and supplemental material . . . . . . 40
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.1 Overall distribution of SNPs, AMR genes, AMR pheno-
types, and plasmid replicons . . . . . . . . . . . . . . . . . 40
2.4.2 In silico AMR gene detection is correlated with phenotypic
AMR patterns. . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.3 S. Typhimurium phylogeny, AMR genes, AMR pheno-
types, and plasmid replicons . . . . . . . . . . . . . . . . . 44
2.4.4 S. Newport phylogeny, AMR genes, AMR phenotypes,
and plasmid replicons . . . . . . . . . . . . . . . . . . . . . 50
2.4.5 S. Dublin phylogeny, AMR genes, AMR phenotypes, and
plasmid replicons . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.5.1 WGS can be used to predict phenotypic resistance in
bovine and human-associated Salmonella Typhimurium,
Newport, and Dublin with high sensitivity and specificity 57
2.5.2 Both phenotypic and genomic data show geographic dif-
ferences in resistance-related characteristics for Salmonella,
suggesting a need for location-specific AMR control
strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5.3 S. enterica isolates from humans contain a more diverse
range of AMR genes and plasmid replicons than those iso-
lated from bovine populations . . . . . . . . . . . . . . . . 62
2.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3 Identification of novel mobilized colistin resistance gene mcr-9 in a
multidrug-resistant, colistin-susceptible Salmonella enterica serotype
Typhimurium isolate3 74
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.1 In silico identification of mcr-9 in an MDR S. Typhimurium
genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.2 mcr-9 confers resistance to colistin when cloned into
colistin-susceptible E. coli NEB5α . . . . . . . . . . . . . . . 79
3.2.3 Mcr-3, Mcr-4, Mcr-7, and Mcr-9 are highly similar at the
structural level . . . . . . . . . . . . . . . . . . . . . . . . . 80
3From Carroll, Laura M., Ahmed Gaballa, Claudia Guldimann, Genevieve Sullivan, Lory
O. Henderson, and Martin Wiedmann (2019). ”Identification of Novel Mobilized Colistin Re-
sistance Gene mcr-9 in a Multidrug-Resistant, Colistin-Susceptible Salmonella enterica Serotype
Typhimurium Isolate”. In: mBio 10, pp. e00853-19. DOI: 10.1128/mBio.00853-19.
ix
3.2.4 Numerous genera of Enterobacteriaceae harbor mcr-9 on
IncHI2 plasmids. . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2.5 Accession number(s) and supplemental material . . . . . . 87
3.3 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4 Rapid, High-Throughput Identification of Anthrax-Causing and
Emetic Bacillus cereus Group Genome Assemblies via BTyper, a Com-
putational Tool for Virulence-Based Classification of Bacillus cereus
Group Isolates by Using Nucleotide Sequencing Data4 91
4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.1 Database construction . . . . . . . . . . . . . . . . . . . . . 97
4.3.2 Construction of BTyper tool . . . . . . . . . . . . . . . . . . 98
4.3.3 PCR detection of virulence genes . . . . . . . . . . . . . . . 99
4.3.4 MLST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.5 rpoB allelic typing . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.6 Validation of BTyper using additional B. cereus group
whole-genome sequences . . . . . . . . . . . . . . . . . . . 102
4.3.7 Construction of BMiner companion application . . . . . . 102
4.3.8 Application of BTyper and BMiner to whole-genome se-
quencing data . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.9 Post hoc statistical analyses . . . . . . . . . . . . . . . . . . 104
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4.1 Construction and validation of BTyper using in vitro meth-
ods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4.2 Characteristics associated with B. cereus group phyloge-
netic clade III are most prevalent among genome assem-
blies currently available at NCBI . . . . . . . . . . . . . . . 106
4.4.3 Application of BTyper to identify B. anthracis-associated
genes in non-anthracis Bacillus isolates reveals virulence
gene heterogeneity within genome assemblies from an-
thrax toxin-encoding isolates . . . . . . . . . . . . . . . . . 108
4.4.4 Application of BTyper to identify assemblies associated
with emetic B. cereus group isolates . . . . . . . . . . . . . 118
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4From Carroll, Laura M., Jasna Kovac, Rachel A. Miller, and Martin Wiedmann (2017).
”Rapid, High-Throughput Identification of Anthrax-Causing and Emetic Bacillus cereus Group
Genome Assemblies via BTyper, a Computational Tool for Virulence-Based Classification of
Bacillus cereus Group Isolates by Using Nucleotide Sequencing Data”. In: Applied and Envi-
ronmental Microbiology 83, pp. e01096-17. DOI: 10.1128/AEM.01096-17.
x
4.5.1 Accessible whole-genome sequence analysis tools can fa-
cilitate improved taxonomic classification and characteri-
zation of B. cereus group isolate virulence potential . . . . 120
4.5.2 Analysis of publicly available B. cereus group assemblies
using BTyper and BMiner identifies virulence gene-based
clusters that capture phylogenetic heterogeneity in iso-
lates with similar phenotypes . . . . . . . . . . . . . . . . . 122
4.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5 Characterization of Emetic and Diarrheal Bacillus cereus Strains From
a 2016 Foodborne Outbreak Using Whole-Genome Sequencing: Ad-
dressing the Microbiological, Epidemiological, and Bioinformatic
Challenges 5 138
5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.1 Collection of Epidemiological Data . . . . . . . . . . . . . . 142
5.3.2 Isolation and Initial Characterization of B. cereus Strains . 142
5.3.3 rpoB Allelic Typing . . . . . . . . . . . . . . . . . . . . . . . 143
5.3.4 Bacterial Growth Conditions and Collection of Bacterial
Supernatants . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.3.5 Hemolysin BL and Non-hemolytic Enterotoxin Detection . 144
5.3.6 WST-1 Metabolic Activity Assay . . . . . . . . . . . . . . . 145
5.3.7 Statistical Analysis of Cytotoxicity Data . . . . . . . . . . . 146
5.3.8 Whole-Genome Sequencing . . . . . . . . . . . . . . . . . . 146
5.3.9 Initial Data Processing and Genome Assembly . . . . . . . 147
5.3.10 In silico Typing and Virulence Gene Detection . . . . . . . 147
5.3.11 Construction of k-mer Based Phylogeny Using Outbreak
Strains and Genomes of 18 B. cereus Group Species . . . . 148
5.3.12 Variant Calling and Phylogeny Construction Using Out-
break Isolates . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.3.13 Variant Calling and Statistical Comparison of Emetic Out-
break Isolates to Publicly Available Genomes . . . . . . . . 152
5.3.14 Statistical Comparison of Phylogenetic Trees . . . . . . . . 153
5.3.15 Calculation of Average Nucleotide Identity Values . . . . . 154
5.3.16 Supplementary Material and Availability of Data . . . . . 154
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5From Carroll, Laura M., Martin Wiedmann, Manjari Mukherjee, David C. Nicholas, Lisa A.
Mingle, Nellie B. Dumas, Jocelyn A. Cole, and Jasna Kovac (2019). ”Characterization of Emetic
and Diarrheal Bacillus cereus Strains From a 2016 Foodborne Outbreak Using Whole-Genome
Sequencing: Addressing the Microbiological, Epidemiological, and Bioinformatic Challenges”.
In: Frontiers in Microbiology 10, pp. 144. DOI: 10.3389/fmicb.2019.00144.
xi
5.4.1 Both Emetic and Diarrheal Symptoms Were Reported
Among Cases Associated With the B. cereus Foodborne
Outbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.4.2 WGS Confirms Presence of Multiple B. cereus Group
Species Represented Among Outbreak Strains . . . . . . . 157
5.4.3 Emetic and Diarrheal B. cereus Isolates Associated With
the Foodborne Outbreak do Not Differ in Cytotoxicity . . 159
5.4.4 Core SNPs Identified Among B. cereus Group Outbreak
Isolates From Two Phylogenetic Groups Are Dependent
on Variant Calling Pipeline and Reference Genome Selection161
5.4.5 Choice of Variant Calling Pipeline Has Greater Influence
on Core SNP Identification Than Choice of Closely Re-
lated Closed or Draft Reference Genome for Emetic Group
III B. cereus Group Isolates . . . . . . . . . . . . . . . . . . . 162
5.4.6 Phylogenies Constructed Using Core SNPs Identified in
55 Emetic ST 26 B. cereus Genomes by kSNP3 and Parsnp
Yield Similar Topologies . . . . . . . . . . . . . . . . . . . . 169
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.5.1 Addressing the Microbiological and Epidemiological
Challenges Associated With Determining the Causative
Agent of an Emetic Foodborne Outbreak . . . . . . . . . . 172
5.5.2 Considerations for Addressing the Unique Challenges As-
sociated With Characterization of Foodborne Outbreaks
Linked to the B. cereus Group Using WGS . . . . . . . . . . 174
5.5.3 Recommendations for Analyzing Illumina WGS Data
From B. cereus Group Isolates Potentially Linked to a
Foodborne Outbreak . . . . . . . . . . . . . . . . . . . . . . 179
5.5.4 As WGS Becomes Routinely Integrated Into Food Safety,
Clinical, and Epidemiological Realms, It Is Likely That the
Number of Illnesses Attributed to B. cereus Will Increase . 183
5.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6 Conclusion 197
6.1 NGS can be used to replicate many microbiological assays in silico
with high accuracy, speed, and throughput . . . . . . . . . . . . . 197
6.2 NGS can be used to identify novel genomic elements associated
with clinically relevant phenotypes . . . . . . . . . . . . . . . . . . 199
6.3 NGS can be used to query pathogens associated with foodborne
outbreaks at higher resolution than its predecessors . . . . . . . . 200
6.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
xii
LIST OF TABLES
1.1 Overview of next-generation sequencing technologies discussed
in this chapter.a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Overview of food science-relevant next-generation sequencing
applications discussed in this chapter. . . . . . . . . . . . . . . . . 5
2.1 Ranking of the five most common antimicrobial resistance
(AMR) gene groups, phenotypic AMR profiles, and plasmid
replicons for all serotypes, S. Typhimurium, S. Newport, and S.
Dublina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2 ANOSIM and PERMANOVA statistics and their respective
mean P valuesa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Sensitivity and specificity of genotype predictions of AMR phe-
notype for all 90 Salmonella isolates in the study. . . . . . . . . . . 44
2.4 Comparison of mean zone diameters between (i) Salmonella iso-
lates with at least one AMR gene (ARG) that has been known
to confer resistance to a particular antimicrobial and (ii) isolates
with no genes known to confer resistance to that antimicrobial.a . 46
2.5 Odds ratios for association of AMR gene groups, AMR pheno-
type, and plasmid replicons with source or location (only associ-
ations with P values of < 0.05 are shown).a . . . . . . . . . . . . . 48
2.6 S. Typhimurium isolates with qnr and/or oqx genes and/or point
mutations in gyrA and/or gyrB and/or parC.a . . . . . . . . . . . 50
4.1 Percentage of isolates in which BTyper correctly identified the
presence/absence of eight virulence genes, MLST, rpoB AT, and
panC clade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2 Virulence genes significantly associated with 5 B. cereus group
phylogenetic clades after a Bonferroni correctiona . . . . . . . . . 110
4.3 Non-anthracis Bacillus assemblies in which anthrax toxin genes
cya, lef, and/or pagA were detected using BTyper . . . . . . . . . 115
4.4 Non-anthracis Bacillus assemblies in which B. anthracis-associated
genes were detected, excluding anthrax toxin genes cya, lef, and
pagA and regulator atxA . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5 B. cereus group assemblies in which emetic toxin genes cesABCD
were detected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1 Description of variant calling pipelines and associated input
data formats tested in this study. . . . . . . . . . . . . . . . . . . . 149
5.2 Reference genomes used for reference-based variant calling in
this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.3 List of outbreak isolates and corresponding metadata, single-
and multi-locus sequence types, and species. . . . . . . . . . . . . 158
xiii
5.4 Maximum likelihood phylogenies of 30 emetic group III out-
break isolates considered to be more topologically similar than
would be expected by chance (P < 0.05).a . . . . . . . . . . . . . . 166
xiv
LIST OF FIGURES
2.1 Nonmetric multidimensional scaling (NMDS) plots for all iso-
lates based on antimicrobial resistance (AMR) gene sequences
(A), phenotypic antimicrobial resistance/susceptibility profiles
(B), and presence/absence of plasmid replicons (C). Points rep-
resent isolates, while shaded regions and convex hulls corre-
spond to isolate serotypes. For an interactive plot of these data,
as well as interactive NMDS plots for individual serotypes, visit
https://github.com/lmc297/2017 AEM Figure S2. . . . . . . . . 44
2.2 Frequency of different phenotypic and genotypic resistance
determinants for each serotype-source group (e.g., Salmonella
Dublin isolates obtained from humans [S. Dublin Human]).
Genotypic resistance was determined using nucleotide BLAST
(blastn) and the ARG-ANNOT database; isolates were classified
as having a resistant genotype if the AMR gene was detected by
BLAST with a minimum coverage of 50% and a minimum se-
quence identity of 75%. Phenotypic resistance was tested using
Kirby-Bauer disk diffusion. Percentages were calculated using
the ratio of resistant isolates to total isolates in each serotype-
source group (n = 17 for S. Typhimurium Bovine, n = 20 for S.
Typhimurium Human, n = 14 for S. Newport Bovine, n = 18 for
S. Newport Human, n = 10 for S. Dublin Bovine, and n = 11 for
S. Dublin Human). Nalidixic acid (NAL)- and sulfamethoxazole-
trimethoprim (SXT)-resistant isolates (6 and 12 of the 90 isolates,
respectively) each had one isolate for which genotypic resistance
did not correlate with phenotypic resistance. . . . . . . . . . . . . 45
2.3 Phylogenetic tree of S. Typhimurium isolates constructed using
BEAST. Gene groups for AMR genes detected in each genome
sequence at more than 50% coverage and 75% identity using
BLAST (blastn) and ARG-ANNOT are indicated in green. An-
timicrobials to which each isolate is resistant are indicated in red,
and intermediate resistance to an antimicrobial is indicated in or-
ange. Plasmid replicons detected in each genome sequence us-
ing PlasmidFinder are indicated in purple. Branch lengths are
reported in substitutions per site, while posterior probabilities
are reported at tree nodes. . . . . . . . . . . . . . . . . . . . . . . . 47
xv
2.4 Phylogenetic tree of S. Newport isolates constructed using
BEAST. Gene groups for AMR genes detected in each genome
sequence at more than 50% coverage and 75% identity using
BLAST (blastn) and ARG-ANNOT are indicated in green. An-
timicrobials to which each isolate is resistant are indicated in red,
and intermediate resistance to an antimicrobial is indicated in or-
ange. Plasmid replicons detected in each genome sequence us-
ing PlasmidFinder are indicated in purple. Branch lengths are
reported in substitutions per site, while posterior probabilities
are reported at tree nodes. . . . . . . . . . . . . . . . . . . . . . . . 51
2.5 Phylogenetic tree of S. Dublin isolates constructed using BEAST.
Gene groups for AMR genes detected in each genome sequence
at more than 50% coverage and 75% identity using BLAST
(blastn) and ARG-ANNOT are indicated in green. Antimicro-
bials to which each isolate is resistant are indicated in red, and
intermediate resistance to an antimicrobial is indicated in or-
ange. Plasmid replicons detected in each genome sequence us-
ing PlasmidFinder are indicated in purple. Branch lengths are
reported in substitutions per site, while posterior probabilities
are reported at tree nodes. . . . . . . . . . . . . . . . . . . . . . . . 54
3.1 Comparison of mcr-9 to all previously described mcr homo-
logues, based on amino acid sequence. The maximum likeli-
hood phylogeny was constructed using RAxML version 8.2.12
with the amino acid sequences of novel mobilized colistin resis-
tance gene mcr-9 (in blue) and all previously described mcr genes
(mcr-1 to -8 [in black]). The phylogeny is rooted at the midpoint,
with branch lengths reported in substitutions per site. Branch
labels correspond to bootstrap support percentages out of 1,000
replicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2 Colistin killing assay of E. coli NEB5α harboring a pLIV2 empty
vector (negative control), mcr-3 (positive control), or mcr-9, ex-
pressed under the control of the IPTG-controlled SPAC/lacOid
promoter. Cells were grown in MH-II (Mueller-Hinton II)
medium with IPTG to the mid-exponential phase. Colistin was
added at concentrations of 0, 1, 2, 2.5, or 5 mg/liter, and the bac-
teria were incubated at 37◦C for 1h. The samples were diluted in
phosphate-buffered saline (PBS) and plated on LB agar plates for
the determination of CFU. Log CFU reduction was calculated by
comparing CFU after each treatment to CFU levels obtained at 0
mg/liter colistin, using three independent biological replicates.
Asterisks denote significant differences compared to empty vec-
tor treatment (P < 0.05 by Student’s t test relative to the concen-
tration’s respective negative control after a Bonferroni correction). 81
xvi
3.3 Structural models of all published Mcr proteins (Mcr-1 to -8)
and Mcr-9, based on lipooligosaccharide phosphoethanolamine
transferase EptA. Models were constructed using the Phyre2
server, and structures were viewed and edited using UCSF
Chimera. Structural models show conservation of two EptA
domains: transmembrane-anchored and soluble periplasmic do-
mains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Similarity matrix (composed of Dali Z-scores) of all previously
described Mcr groups (Mcr-1 to -8) and Mcr-9, based on protein
structure. The Dali server was used to perform all-against-all
comparisons of 3D structural models based on all mcr homo-
logues (Figure 3.3); for this analysis, amino acid sequences of
mcr-5.3 and mcr-8.2, which were not available in ResFinder, were
additionally included from the National Database of Antibiotic
Resistant Organisms (NDARO). . . . . . . . . . . . . . . . . . . . 83
3.5 Location of Mcr-9 secondary structure elements within the align-
ment of Mcr amino acid sequences, constructed using the ES-
Pript 3 server. The top track denotes Mcr-9 secondary struc-
ture elements (alpha helixes and beta sheets). Green digits be-
low the alignment denote cysteine residues forming a disulfide
bridge (e.g., 1 forms a bridge with 1, 2 with 2, etc.). Within the
amino acid sequence alignment itself, a strict identity (i.e., iden-
tical amino acid residue at a site) is denoted by a red box and a
white character. A yellow box around an amino acid residue de-
notes similarity across groups, where groups were defined using
the default ”all” specification in ESPript 3 (ESPript 3 total score
[TSc] > in-group threshold [ThIn]), while a residue in boldface de-
notes similarity within a group (ESPript 3 in-group score [ISc] >
ThIn). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.6 Organization of the mcr-9 locus in S. Typhimurium. An un-
known function cupin fold metalloprotein is encoded by the
gene downstream of mcr-9 (unlabeled black arrow). The mcr-
9 locus is flanked by two different terminal repeat sequences
(IRR) from the IS5 (orange box) and IS6 (red box) families. The
mcr-9 upstream region contains highly conserved putative -35
and -10 σ70-dependent promoter elements (blue boxes and blue
text). Moreover, the mcr-9 promoter region contains an inverted
repeat motif (green box, green text, and sequence logo) that is
conserved in more than 95% of 321 mcr-9 genes, as shown by the
sequence logo (constructed using WebLogo) (Crooks et al. 2004). 86
xvii
4.1 BTyper command line workflow for various types of data and
default typing methods. Input datum type is listed in the left
margin, while typing methods are listed at the top of the chart.
Command line parameters associated with a particular typing
method are shown in parentheses. FSL, Food Safety Lab. . . . . . 100
4.2 Percentage (%) of B. cereus group assemblies in which a particu-
lar virulence gene was detected. Minimum identity and cover-
age thresholds of 50 and 70%, respectively, were used for viru-
lence gene detection. . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.3 Closest-matching phylogenetic clade using the panC loci from
662 B. cereus group genome assemblies. A panC locus could not
be assigned in 4 genome assemblies, which is denoted by NA. . . 109
4.4 Principal-component analysis (PCA) of 662 B. cereus group
genome assemblies based on presence/absence of virulence
genes. Virulence gene typing was carried out using BTyper,
while PCA was performed using BMiner. Principal components
1 (PC1) and 2 (PC2) are plotted on the x and y axes, respectively,
while principal component 3 (PC3) corresponds to point size.
Plots are colored by isolate species, as found in NCBI (A), and
assigned cluster using k-medoids (B). To view interactive ver-
sions of these plots containing isolate names and metadata, all
BTyper final results files and metadata can be downloaded from
https://github.com/lmc297/BTyper/tree/master/sample data and
viewed in BMiner. . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.5 k-medoids clusters based on presence/absence of virulence
genes detected using BTyper. Size corresponds to the number of
assemblies assigned to a given cluster, while panC corresponds to
panC clades found in the cluster, with an asterisk denoting one
or more assemblies that could not be placed into a panC clade.
Numbers within cells correspond to the proportion of assemblies
in a given cluster in which the corresponding virulence gene was
detected. Green shading corresponds to a virulence gene de-
tected in more than 90% of all assemblies in a cluster, while red
shading corresponds to a virulence gene detected in fewer than
10% of all assemblies in a cluster. Yellow shading corresponds
to B. anthracis-associated genes detected in fewer than 90% but
greater than 0% of assemblies in a cluster. . . . . . . . . . . . . . . 112
xviii
4.6 Nonmetric multidimensional scaling (NMDS) plot of Bacillus
cereus group clusters that (i) possessed at least one assembly that
was classified as Bacillus anthracis in NCBI, and/or (ii) possessed
at least one assembly in which at least one B. anthracis-associated
virulence gene (cya, lef, pagA, atxA, hasA, and/or capABCDE) was
detected using BTyper. NMDS was performed in BMiner using
virulence gene presence/absence data and a Jaccard dissimilar-
ity metric. Isolates are represented by points, and convex hulls
and shading correspond to the assigned k-medoids cluster. Vir-
ulence genes are plotted in dark gray. . . . . . . . . . . . . . . . . 114
5.1 Maximum likelihood phylogeny of core SNPs identified in 33
isolates sequenced in conjunction with a B. cereus outbreak,
as well as genomes of the 18 currently recognized B. cereus
group species (shown in gray). Core SNPs were identified
in all genomes using kSNP3. Heatmap corresponds to pres-
ence/absence of B. cereus group virulence genes detected in each
sequence using BTyper. Tip labels in maroon and teal correspond
to the seven human clinical isolates and 26 isolates from food
sequenced in conjunction with this outbreak, respectively. Phy-
logeny is rooted at the midpoint, and branch labels correspond
to bootstrap support percentages out of 500 replicates. Due to
the short lengths and low bootstrap support (all values < 10) of
branches within the outbreak clade, bootstrap support percent-
ages are not shown on branches within the outbreak clade. . . . . 159
5.2 Percentage viability of HeLa cells when treated with super-
natants of each isolate as determined by the WST-1 assay. Via-
bility was calculated as ratio of corrected absorbance of solution
when HeLa cells were treated with supernatants to the ratio of
corrected absorbance of solution when HeLa cells were treated
with BHI (i.e., negative control), converted to percentages. The
columns represent the mean viabilities, while the error bars rep-
resent standard deviations for 12 technical replicates. Any two
bars that do not share a common alphabetic character had signif-
icantly different percentage viability values (P < 0.05). . . . . . . 161
5.3 Number of core SNPs identified in 33 B. cereus group isolates
from two phylogenetic groups (30 and 3 isolates from groups III
and IV, respectively), sequenced in conjunction with a foodborne
outbreak. Combinations of five reference-based variant calling
pipelines and three reference genomes, as well as one reference-
free SNP calling method (kSNP3), were tested. . . . . . . . . . . . 163
xix
5.4 Comparison of core SNP positions reported by five reference-
based variant-calling pipelines for 33 B. cereus group strains iso-
lated in association with a foodborne outbreak, with the chromo-
somes of (A) B. cereus AH187 (group III), (B) B. cereus s.s. ATCC
14579 (group IV), and (C) B. cytotoxicus NVH 391-98 (group VII)
used as reference genomes. Ellipses represent each pipeline. . . . 164
5.5 (A) Number of core SNPs and (B) total number of SNPs identi-
fied in 30 emetic B. cereus group III strains isolated in association
with a foodborne outbreak. Combinations of (A) five and (B)
four reference-based variant calling pipelines and two reference
genomes (either dustmasked or unmasked) were tested, along
with one reference-free SNP calling method (kSNP3). Because
the Parsnp pipeline reports core SNPs by definition, it was ex-
cluded from Figure 5.5B (total SNPs). For quantification of the
total number of SNPs (Figure 5.5B), all sites with more than one
unique character were counted. . . . . . . . . . . . . . . . . . . . 166
5.6 Ranges of pairwise (A) core SNP differences and (B) total SNP
differences between 30 emetic group III B. cereus group strains
isolated in conjunction with a foodborne outbreak. Combi-
nations of (A) five and (B) four reference-based variant call-
ing pipelines and two reference genomes (either dustmasked or
unmasked), as well as one reference-free SNP calling method
(kSNP3) were tested. Lower and upper box hinges correspond
to the first and third quartiles, respectively. Lower and upper
whiskers extend from the hinge to the smallest and largest values
no more distant than 1.5 times the interquartile range from the
hinge, respectively. Points represent pairwise distances that fall
beyond the ends of the whiskers. Because the Parsnp pipeline re-
ports core SNPs by definition, it was excluded from Figure 5.6B
(pairwise differences in total SNPs). For quantification of pair-
wise differences in the total number of SNPs (Figure 5.6B), all
sites with more than one unique character were included. . . . . 167
5.7 Comparison of core SNP positions reported by five variant-
calling pipelines for 30 emetic group III B. cereus group outbreak
isolates. Ellipses represent each pipeline, all of which used the
chromosome of emetic group III B. cereus AH187 as a reference
for variant calling. . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
xx
5.8 Maximum likelihood phylogenies of 30 emetic group III isolates
(ST 26) sequenced in conjunction with a B. cereus outbreak, as
well as all other emetic group III ST 26 genomes available in
NCBI (n = 25; shown in black). Trees were constructed using
core SNPs identified using (A) kSNP3 or (B) Parsnp. Tip labels
in maroon and teal correspond to the six human clinical iso-
lates and 24 isolates from food sequenced in conjunction with
this outbreak, respectively. Branch labels correspond to boot-
strap support percentages out of 1,000 replicates. Due to the
short lengths and low bootstrap support of branches within the
outbreak clade, bootstrap support percentages are not shown on
branches within the outbreak clade. . . . . . . . . . . . . . . . . . 170
xxi
CHAPTER 1
INTRODUCTION1
1FROM WIEDMANN, MARTIN AND LAURA M. CARROLL (2019). ”NEXT-
GENERATION SEQUENCING”. IN: ENCYCLOPEDIA OF FOOD CHEMISTRY , PP. 376-383.
DOI: 10.1016/B978-0-08-100596-5.21792-7.
1
1.1 Next-Generation Sequencing: an Overview
Next-generation sequencing (NGS) encompasses sequencing technologies that
are capable of sequencing many DNA strands in parallel, resulting in higher
throughput than can be achieved using Sanger sequencing. As NGS has become
cheaper and more accessible, it has been used to address an expanding range of
biological problems, including many relevant to food safety and quality.
Contemporary NGS sequencing platforms employ either a (i) short-read,
or (ii) long-read sequencing approach (Table 1.1). Short-read sequencing ap-
proaches typically yield read lengths of up to 700 base pairs (bp), which tend
to be shorter than those produced by Sanger sequencing (Goodwin, McPher-
son, and McCombie 2016; Liu et al. 2012). Currently, sequencing-by-synthesis
approaches (SBS) to NGS are the dominant paradigm in short-read sequencing.
These approaches (e.g. Illumina sequencing, Roche 454 pyrosequencing, Ion
Torrent semiconductor-based sequencing) rely on the use of DNA polymerase
in their respective methods (Goodwin, McPherson, and McCombie 2016). SBS
approaches to short-read sequencing can be contrasted with the sequencing-by-
ligation (SBL) approach employed by the SOLiD (Small Oligonucleotide Liga-
tion and Detection) platform, which employs DNA ligase to join fluorescently-
labelled probe and anchor sequences to a DNA strand (Goodwin, McPherson,
and McCombie 2016). Among the SBS approaches and short-read sequencing
methods as a whole, Illumina sequencing has emerged as the dominant tech-
nology (Goodwin, McPherson, and McCombie 2016), in which fluorescently-
tagged nucleotides are added in complement to amplified strands of DNA.
Upon the addition of a single nucleotide, the fluorescent dye is imaged, and
the identity of the corresponding base is recorded (Goodwin, McPherson, and
2
McCombie 2016).
Table 1.1: Overview of next-generation sequencing technologies discussed in
this chapter.a
Sequencing technology Sequencing mechanism Read lengthb Error rate (type of error)
Sequencing-by-ligation (SBL)
SOLiD Ligation; 2-base encoding 50-75 bp ≤ 0.1% (AT bias)c
Sequencing-by-synthesis (SBS)
454 Pyrosequencing Up to 1000 bp 1% (indel)d
Illumina Illumina SBS 25-300 bp; can be 100 Kb if synthetic long- 0.1% to 1%, depending on plat-
read library preparation is used form/output (substitution)
Ion Torrent Hydrogen ion detection Up to 400 bp 1% (indel)
Single-molecule long-read
Oxford Nanopore Nanopore Up to 200 Kb 12% (indel)
Pacific Biosciences Single-molecule real-time 8-20 Kb 13% for a single pass (indel)
sequencing
aSummarized from reviews of NGS technologies by Goodwin et al., Liu, et al., and Glenn
(Goodwin, McPherson, and McCombie 2016; Liu et al. 2012; Glenn 2011)
bbp, base pairs; Kb, kilobase pairs.
cAT, adenine and thymine.
dindel, insertion/deletion.
While short-read sequencing technologies have been the workhorse of NGS,
they are not without limitations; many genomic features, such as long, repetitive
regions or copy number variations, cannot be readily resolved using short reads
(Goodwin, McPherson, and McCombie 2016). Long-read sequencing technolo-
gies have been able to bridge the literal gaps that their short-read counterparts
have been unable to resolve, relying on either (i) synthetic long-read approaches
or (ii) single-molecule long-read sequencing approaches (Pacific Biosciences
and Oxford Nanopore) (Goodwin, McPherson, and McCombie 2016). Synthetic
long-read sequencing approaches employ existing short-read sequencing plat-
forms, but use barcoding during library preparation to link fragments (Good-
win, McPherson, and McCombie 2016). Single-molecule long-read sequencing
approaches, however, yield ”true” long reads that can span kilobases, with the
approach most commonly employed as of late 2017 being the Pacific Biosciences
(PacBio) single-molecule real-time (SMRT) approach (Goodwin, McPherson,
and McCombie 2016). SMRT sequencing uses a DNA polymerase fixed to the
bottom of a well in a specialized flow cell through which a DNA strand is
passed (Goodwin, McPherson, and McCombie 2016). Upon the incorpora-
3
tion of a single, fluorescently-labelled nucleotide by the polymerase, light is
emitted and recorded by a camera to determine the identity of the nucleotide
(Goodwin, McPherson, and McCombie 2016). This can be contrasted with the
aforementioned short-read SBS approaches, which rely on DNA polymerase
traversing the DNA template to which it is bound (Goodwin, McPherson, and
McCombie 2016). In addition to the PacBio platform, the small and highly
portable MinION platform from Oxford Nanopore Technologies also employs
a single-molecule long-read sequencing approach, during which a strand of
DNA is passed through a protein pore along with an electric current (Goodwin,
McPherson, and McCombie 2016). As different combinations of nucleotides are
passed through the pore, shifts in the electric current are recorded (Goodwin,
McPherson, and McCombie 2016).
Long-read sequencing is becoming increasingly popular for many appli-
cations, including gap closure in reference genomes, characterization of long
genomic structures, and the generation of closed chromosomes or transcrip-
tomes (Goodwin, McPherson, and McCombie 2016). A notable considera-
tion when comparing short-read and long-read sequencing methods is the rel-
atively high error rates of long-read sequencing platforms (Goodwin, McPher-
son, and McCombie 2016). For example, the PacBio RS II, which yields
average read lengths of 10-15 Kb, has an error rate as high as 15% for a
single pass through a molecule of DNA (Goodwin, McPherson, and Mc-
Combie 2016). However, this error rate can be reduced to one that rivals
that of Sanger sequencing by increasing sequencing coverage through mul-
tiple passes; after 30 passes (i.e. at 30X coverage), the accuracy of the
consensus is greater than 99.999% (http://www.pacb.com/smrt-science/smrt-
sequencing/accuracy/ and https://www.pacb.com/uncategorized/a-closer-
4
look-at-accuracy-in-pacbio/) (Goodwin, McPherson, and McCombie 2016).
1.2 NGS Data Analysis
Processing and analysis of NGS data is dependent on the sequencing technol-
ogy used, as well as the experimental goals. Regardless of sequencing method
or experimental design, the first steps in the analysis of NGS data usually in-
volve an assessment of read quality, using metrics such as the total number of
reads, the distribution of read lengths, sequence quality scores, etc. This can be
followed by trimming of adapters and/or low-quality bases, filtering out low-
quality reads, and filtering of contaminant DNA, steps for which a number of
programs are available (Breitwieser, Lu, and Salzberg 2017). After these pre
processing steps, data analysis can be carried out according to the goals of the
experiment, with possible food science-relevant applications discussed below
(Table 1.2).
Table 1.2: Overview of food science-relevant next-generation sequencing appli-
cations discussed in this chapter.
Next-generation sequenc- Number of Nucleic acid ex- Genomic elements queried Current food science-relevant applications
ing application organisms tracted/sequenced
queried
Whole-genome sequencing 1 DNA/DNA Entire genome Characterization of food-relevant organisms at
(WGS) the genomic level
RNA sequencing (RNA- 1 RNA/cDNA Entire transcriptome Characterization of food-relevant organisms at
Seq) reverse-transcribed the transcriptional level
from RNA
High-throughput am- ≥ 1 DNA/DNA Selected amplicon(s) Taxonomic characterization of food-relevant mi-
plicon sequencing (e.g. present in sample (usually crobial communities (usually bacterial/archaeal
16S rDNA sequencing, 16S rDNA for bacte- communities); authentication of eukaryotic
DNA-barcoding) rial/archaeal communities; food matrices (e.g. seafood, meat products)
other loci for eukarya)
Shotgun metagenomic se- > 1 DNA/DNA All genomes present in Characterization of food-relevant communities
quencing sample at the genomic level (queries eukarya, bacteria,
archaea, and viruses)
Shotgun metatranscrip- > 1 RNA/cDNA All transcriptomes present Characterization of food-relevant communities
tomic sequencing reverse-transcribed in sample at the transcriptional level (queries eukarya,
from RNA bacteria, archaea, and viruses)
5
1.3 NGS Applications: Whole-Genome Sequencing of Micro-
bial Contaminants
Traditionally, microbial contaminants isolated from food undergo various
organism-specific phenotypic or biochemical tests (e.g. testing for motility, toxin
production, growth at various temperatures) to elucidate or confirm their iden-
tity (FDA 1998). These tests may be supplemented with additional typing
methods, such as serotyping, pulsed-field gel electrophoresis (PFGE), Sanger
sequencing of a single taxonomic marker gene or genomic region (i.e. single-
locus sequence typing; SLST), or Sanger sequencing of multiple loci used in
a multi-locus sequence typing (MLST) scheme (Kovac et al. 2017; Sabat et al.
2013). However, the per-isolate cost of whole-genome sequencing (WGS) has
decreased to the point at which it is comparable, and even below, the price of
many of these traditional subtyping methods (Kovac et al. 2017), making it an
increasingly popular method for characterizing microbial contaminants isolated
from food matrices, food-associated environments (e.g farm environments, pro-
cessing environments), and, in the case of pathogenic microbes, from hosts (e.g.
in human- or animal-clinical settings) (Kovac et al. 2017). Furthermore, many of
these typing methods (e.g. serotyping, SLST, MLST) can be performed in silico
using WGS data, with the advantage that one can query the majority of a micro-
bial genome from a single data set, rather than just a small fraction of it (< 0.01%
for a traditional 7-gene MLST scheme) (Kovac et al. 2017). In addition to in sil-
ico subtyping, WGS data from microbial contaminants can be used to predict
functional characteristics of isolates, query genes or genomic elements of inter-
est within a genome (e.g. plasmids, bacteriophage, and genes contributing to
antimicrobial resistance or virulence), and, in the case of pathogenic microor-
6
ganisms, detect and track outbreaks (Kovac et al. 2017).
After sequencing the genomic DNA and pre-processing the resulting reads
from a microbial isolate (see ”NGS Data Analysis” section above), possible anal-
ysis steps that may be taken include (i) de novo genome assembly of the reads
into contiguous stretches of sequence (contigs) (Giordano et al. 2017; Liao, S.-H.
Lin, and H.-H. Lin 2015; Ekblom and Wolf 2014), (ii) mapping reads back to a
reference genome, (iii) identifying single-nucleotide polymorphisms (SNPs), in-
sertions, and deletions (indels) in NGS data through variant calling (Olson et
al. 2015), (iv) constructing phylogenetic trees to assess the evolutionary relation-
ship of multiple isolates, (v) assigning allelic types at a genomic scale using core
genome or whole genome multi-locus sequence typing (cgMLST and wgMLST,
respectively), and (vi) locating genes and features in NGS data via genome an-
notation (Richardson and Watson 2012; Mudge and Harrow 2016; Yandell and
Ence 2012). These data can be used to characterize isolates at high resolution,
making it possible to compare isolates geospatially and temporally at the whole-
genome scale.
WGS is becoming an increasingly valuable tool for characterizing microbial
contaminants, particularly pathogens, isolated from food and food processing
environments. A notable example of the utility of WGS can be seen in the multi-
agency collaboration in the US to sequence all Listeria monocytogenes isolates
from human patients, food, and the environment (Jackson et al. 2016). Since its
implementation in 2013, the WGS-based surveillance program detected more
listeriosis clusters and solved more outbreaks each year, relative to the previous
year (Jackson et al. 2016). Similar findings have been seen for Salmonella en-
terica serotype Enteritidis (S. Enteritidis); retrospective sequencing of 55 S. En-
7
teritidis from clinical and environmental sources allowed isolates from known
outbreaks to be differentiated from sporadic isolates at greater resolution than
PFGE (Taylor et al. 2015). These examples showcase how WGS can be used
to not only characterize foodborne pathogens at high resolution, but also the
outbreaks associated with them.
1.4 NGS Applications: RNA Sequencing (RNA-Seq) of Food-
Relevant Organisms
While WGS can be used to characterize the genome of an organism at unprece-
dented resolution, it offers no information on whether a genomic element of in-
terest is being actively transcribed or not. This is particularly important within
a food safety context; for example, the mere isolation of a pathogen from a food
matrix does not necessarily mean that particular isolate is viable, or that it is
transcribing the genes necessary to cause infection or intoxication in a human
host. Traditionally, quantitative reverse-transcription PCR (RT-qPCR) has been
employed to quantify or detect shifts in transcript levels of loci of interest. For
this method, reverse-transcription PCR (RT-PCR) is used to obtain complemen-
tary DNA (cDNA) from a RNA template, and the resulting cDNA can be quan-
tified using quantitative PCR (qPCR). In a food science context, RT-qPCR has
been proposed as a method for detecting viable microorganisms, quantifying
virulence, toxin, or stress response gene transcription, and quantifying micro-
bial growth in food matrices (Postollec et al. 2011; Carroll et al. 2016). Studying
transcription at a genome-wide scale, however, was made possible with cDNA
microarrays, which have been used to study the stress responses of various
8
foodborne pathogens, as well as their transcription of toxin and virulence genes
(Postollec et al. 2011; Roy and Sen 2006; Rasooly and Herold 2008). As NGS
has become more feasible, however, it is now possible to query the transcrip-
tome of an organism in its entirety at low cost: RNA sequencing (RNA-Seq)
employs NGS technologies to sequence cDNA reverse-transcribed from RNA
that has been extracted from an organism of interest (Z. Wang, Gerstein, and
Snyder 2009). RNA-Seq allows one to quantitatively survey transcribed regions
of an entire genome, improving upon microarrays in both cost and flexibility
(i.e. the ability to characterize any organism that can be sequenced, rather than
relying on the availability of an array for a particular organism), which is par-
ticularly valuable for studying organisms or genomic regions that may not be
well-characterized.
After employing NGS to sequence cDNA from an organism of interest, and
determining that the quality of sequencing reads is adequate, reads are usually
aligned to a reference genome or an assembled transcriptome (McClure et al.
2013; Conesa et al. 2016). After assessing mapping quality and determining that
it is appropriate, reads mapping to various genes or genomic regions can be
quantified and normalized, taking into account biases such as gene length (Mc-
Clure et al. 2013; Conesa et al. 2016). After quantification and normalization,
analyses can be carried out according to the experimental goals (e.g. differen-
tial transcription under various conditions). Within the realm of food safety,
RNA-Seq has been applied to pathogenic and toxin-producing microorganisms
to identify differentially-transcribed genes during growth in various food ma-
trices (Tang et al. 2015; Deng, Z. Li, and W. Zhang 2012; Galia et al. 2017), after
exposure to various stressors (e.g. acid, starvation, or antimicrobial stressors)
(F. Zhang et al. 2014; Casey et al. 2014; Butcher and Stintzi 2013; K. Jia et al.
9
2017), and during the infection of a host (Avraham et al. 2016).
1.5 NGS Applications: High-Throughput Amplicon Sequenc-
ing
WGS and RNA-Seq have allowed food-associated microorganisms to be charac-
terized at unprecedented resolution. However, these methods typically require
the microorganism in question to be in pure culture or isolated via culture-based
methods, a process which involves the use of organism-specific enrichment me-
dia, selective media, and isolation protocols (Kovac et al. 2017). Metagenomics,
which involves sequencing DNA directly from an environmental sample, at-
tempts to bypass the isolation step, making it possible to survey an entire com-
munity simultaneously (Kovac et al. 2017).
Until recently, NGS-based metagenomic methods have primarily involved
high-throughput amplicon sequencing. Also referred to as ”metataxonomics”,
”meta-genetics”, or ”marker gene metagenomics”, high-throughput amplicon
sequencing employs NGS technologies to sequence targeted PCR products (am-
plicons) to characterize particular communities. When surveying bacterial and
archaeal communities, the 16S ribosomal DNA gene (16S rDNA) is usually the
amplicon of choice, as it is present in all bacterial and archaeal species. 16S
rDNA sequencing has been used to survey the microbiota of various foods (De
Filippis, Parente, and Ercolini 2016; Kergourlay et al. 2015; Ercolini 2013), in-
cluding fermented foods (De Filippis, Parente, and Ercolini 2016) and food
matrices subjected to pathogen-specific enrichments (Jarvis et al. 2015; Lusk
et al. 2012), as well as to monitor bacterial community shifts in food processing
10
environments (Stellato et al. 2016; Hultman et al. 2015).
One of the strengths of 16S rDNA amplicon sequencing is that there are
many freely available bioinformatic tools and pipelines available for data anal-
ysis and visualization of results (e.g. QIIME, Mothur). A typical workflow for
analyzing NGS data from high-throughput 16S rDNA experiments may include
pre-processing of the raw reads, clustering of sequences into operational taxo-
nomic units (OTUs) based on sequence similarity, and taxonomic assignment of
sequences using a database of 16S rDNA genes (e.g. RDP, Greengenes, SILVA)
(Oulas et al. 2015; Siegwald et al. 2017).
In addition to querying bacterial and archaeal communities, the same prin-
cipals of amplicon sequencing can be applied to characterize eukarya. DNA-
barcoding, a practice in which a specific region of a genome is sequenced, is
a commonly-used method for food matrix authentication along the food sup-
ply chain (Ellis et al. 2016; Galimberti et al. 2013). For this approach, a genetic
marker (i.e. a ”barcode”) present in a range of taxa, but variable enough to be ca-
pable of discriminating between taxa of interest, is sequenced (Galimberti et al.
2013), similar to the way the 16S rDNA gene is used to survey bacterial/archaeal
communities. When querying animal DNA in a matrix (e.g. for seafood or
meat authentication), the cytochrome b (cytB) and cytochrome c oxidase sub-
unit 1 (COI) genes are common amplicons of choice. For fungi, the internal
transcribed spacer (ITS) region of the genome is the locus of choice (Schoch et
al. 2012), while a number of loci have been proposed for querying plant DNA
present in a matrix (Hollingsworth, Graham, and Little 2011; Hollingsworth,
D.-Z. Li, et al. 2016). The sequences of these genes are then compared to the
barcodes of known taxa, such as those found in the Barcode of Life Database
11
(BOLD) (Ratnasingham and Hebert 2007) or the National Center for Biotechnol-
ogy Information’s (NCBI) GenBank database (Benson et al. 2013). Applications
of DNA-barcoding within the realm of matrix authentication and contaminant
detection along the food supply chain have included authentication of and con-
taminant detection in seafood (Carvalho, Palhares, Drummond, and Frigo 2015;
Armani et al. 2015; Pardo, Jimenez, and Perez-Villarreal 2016; Kim et al. 2015;
Chang et al. 2016; Carvalho, Palhares, Drummond, and Gadanho 2017), meat
(Kane and Hellberg 2016; Hellberg, B. C. Hernandez, and E. L. Hernandez 2017;
Naaum et al. 2018), poultry (Hellberg, B. C. Hernandez, and E. L. Hernandez
2017), dairy products (Galimberti et al. 2013), olive oil (Kumar, Kahlon, and
Chaudhary 2011), and spices (Swetha et al. 2017; De Mattia et al. 2011; Galim-
berti et al. 2013).
Until recently, DNA-barcoding was limited by the low-throughput that
Sanger sequencing provides; however, NGS has emerged as a low-cost, high-
throughput alternative (Ellis et al. 2016; Shokralla et al. 2014) that has been
used for characterizing both raw ingredients and processed foods (Galimberti
et al. 2013). In this high-throughput approach, sequencing reads are mapped
to sequences in an appropriate database (often BOLD or GenBank) after de-
termining that read quality is appropriate. The proportion of reads mapping
to a particular species in the database corresponds to the proportion of that
particular species in the matrix. A notable example of the application of high-
throughput sequencing for food matrix authentication is provided by Carvalho
et al. (Carvalho, Palhares, Drummond, and Gadanho 2017), in which misla-
beled cod products in Brazilian stores and restaurants were identified by tar-
geted sequencing of the cytB and COI genes present in processed cod products
using NGS (Carvalho, Palhares, Drummond, and Gadanho 2017). In addition
12
to identifying mislabeled products, the composition of blended products com-
posed of multiple fish species could be determined by sequencing the selected
loci (Carvalho, Palhares, Drummond, and Gadanho 2017).
1.6 NGS Applications: Shotgun Metagenomic and Metatran-
scriptomic Sequencing
Although high-throughput amplicon sequencing has offered a higher-
resolution glimpse into food and food-associated microbiomes, it has numer-
ous limitations that are particularly relevant within the realms of food safety
and food quality, perhaps most notably the inability to query organisms that
do not possess the amplicon of choice (e.g. eukarya in a community can-
not be queried if 16S rDNA amplicon sequencing is performed; see ”NGS
Applications: High-Throughput Amplicon Sequencing” section above). For
16S rDNA amplicon sequencing of bacterial/archaeal communities, additional
drawbacks include (i) difficulty achieving species-level resolution (Janda and
Abbott 2007; Rossi-Tamisier et al. 2015) and reliably distinguishing pathogenic
bacteria from non-pathogenic species (e.g. L. monocytogenes from Listeria in-
nocua, human pathogens Bacillus anthracis from Bacillus cereus and biopesticide
Bacillus thuringiensis), (ii) PCR amplification and primer bias (Brooks et al. 2015),
and (iii) inability to query functionally-relevant genomic elements directly, such
as virulence or antimicrobial resistance determinants (Kovac et al. 2017).
An increasingly popular alternative to amplicon sequencing is shotgun
metagenomic sequencing, an approach in which all DNA present in a sample is
sequenced, rather than solely an amplicon. By sequencing all DNA present in
13
a sample, the amplification bias and low taxonomic and functional resolution
issues which plague amplicon sequencing can typically be bypassed (Kovac
et al. 2017). In addition to sequencing all of the bacterial and archaeal DNA
present in a sample, all viral and eukaryotic DNA is sequenced; this is partic-
ularly relevant when the community of interest is derived from a eukaryotic
matrix (e.g. from a host or from food), as the majority (as much as 99%) of DNA
will come from the eukaryotic matrix itself (Kovac et al. 2017; Noyes et al. 2016).
While large quantities of host DNA may not be a problem if the experimental
goal is to assess the composition of the food matrix itself, it may hinder the se-
quencing and detection of many microbial species. As a result, when extracting
DNA from a matrix containing high amounts of host DNA, additional steps
may be taken to deplete any background DNA originating from the matrix it-
self to increase the proportion of microbial DNA that is sequenced (Kovac et al.
2017). After sequencing the extracted DNA, analysis of the resulting sequenc-
ing reads is carried out according to the experimental goals, which may include
taxonomic assignment (Sharpton 2014), metagenomic assembly, functional an-
notation (Sharpton 2014), and/or conducting a metagenome-wide association
study by associating community data with a particular phenotype (J. Wang and
H. Jia 2016; Lynch and Pedersen 2016).
As with all genomic approaches, shotgun metagenomic methods can offer
insight into the genomic composition of a community, but cannot offer infor-
mation as to which genes are being transcribed and possibly translated and ex-
pressed as protein products (Kovac et al. 2017). Similar to the way RNA-Seq
can be used to complement WGS of a bacterial isolate, metagenomic approaches
can be supplemented with shotgun metatranscriptomic sequencing, which in-
volves sequencing cDNA reverse-transcribed from RNA (typically messenger
14
RNA) extracted from an entire community (Kovac et al. 2017).
Analysis of shotgun metagenomic and metatranscriptomic data usually be-
gins with pre-processing steps such as assessing read quality and trimming
adapters (Breitwieser, Lu, and Salzberg 2017). This can be followed by (i) as-
sembly of the reads into contigs, or (ii) taxonomic or functional classification di-
rectly from sequencing reads (Breitwieser, Lu, and Salzberg 2017). For a review
of methods for metagenomic data analysis, see Breitwieser et al. (Breitwieser,
Lu, and Salzberg 2017).
The use of shotgun metagenomic and metatranscriptomic approaches to sur-
vey communities in foods has been undertaken only recently (De Filippis, Par-
ente, and Ercolini 2016). Goals of these studies have included characterization
of the microbiomes of various foods in the presence of foodborne pathogens
and/or spoilage organisms (Jarvis et al. 2015; Ottesen et al. 2013), tracking
foodborne pathogens and antimicrobial resistance genes along the food sup-
ply chain (Noyes et al. 2016; Yang et al. 2016), characterizing eukaryotic food
matrices composed of multiple species (Ripp et al. 2014), and characterizing
the microbiomes of various food matrices during processes such as fermenta-
tion (De Filippis, Parente, and Ercolini 2016; Kergourlay et al. 2015; Alkema
et al. 2016; Valdes et al. 2013; Lessard et al. 2014; De Filippis, Genovese, et al.
2016; Monnet et al. 2016). A notable example of the application of shotgun meta-
omics approaches to identify the cause of a food quality anomaly is provided by
Quigley et al. (Quigley et al. 2016); using high-throughput 16S rDNA sequenc-
ing followed by shotgun metagenomic sequencing, Thermus thermophilus was
proposed (and later confirmed) to be the cause of a pink discoloration defect in
Continental-type cheeses (Quigley et al. 2016).
15
1.7 Conclusion
NGS technologies are being employed increasingly in food science relevant
realms, with applications ranging from surveying microbial communities in-
volved in food processing, to rapidly characterizing bacterial isolates from food-
borne outbreaks. As sequencing costs continue to decrease, it is likely that
whole-genome and meta-omics approaches will be applied routinely at various
points along the food supply chain.
The following chapters detail how NGS can be used to query bacterial food-
borne pathogens, with an emphasis on rapid, high-throughput computational
methods which can be used to analyze short-read data produced by Illumina
platforms. Two model organisms are discussed: (i) non-typhoidal Salmonella en-
terica, a widely studied Gram-negative pathogen which can be transmitted be-
tween livestock and humans, as well as though food, and (ii) the lesser-queried
Gram-positive members of the Bacillus cereus group, which are spore-forming
organisms commonly isolated from soil. While both groups of organisms are
capable of causing foodborne illness in humans, they differ at a biological level
and, thus, necessitate different approaches to analyze NGS data derived from
them.
1.8 References
Alkema, Wynand, Jos Boekhorst, Michiel Wels, and Sacha A. F. T. van Hijum
(2016). “Microbial bioinformatics for food safety and production”. In: Brief
Bioinform 17.2, pp. 283–292. DOI: 10.1093/bib/bbv034.
Armani, A. et al. (2015). “DNA barcoding reveals commercial and health issues
in ethnic seafood sold on the Italian market”. In: Food Control 55, pp. 206–214.
16
Avraham, Roi et al. (2016). “A highly multiplexed and sensitive RNA-seq pro-
tocol for simultaneous analysis of host and pathogen transcriptomes”. In:
Nature Protocols 11, pp. 1477–1491.
Benson, Dennis A. et al. (2013). “GenBank”. In: Nucleic Acids Res 41.Database
issue, pp. D36–D42. DOI: 10.1093/nar/gks1195.
Breitwieser, Florian P., Jennifer Lu, and Steven L. Salzberg (2017). “A review of
methods and databases for metagenomic classification and assembly”. In:
Briefings in Bioinformatics. DOI: 10.1093/bib/bbx120. eprint: http://
oup.prod.sis.lan/bib/advance-article-pdf/doi/10.1093/
bib/bbx120/20139928/bbx120.pdf.
Brooks, J. Paul et al. (2015). “The truth about metagenomics: quantifying and
counteracting bias in 16S rRNA studies”. In: BMC Microbiol 15, pp. 66–66.
DOI: 10.1186/s12866-015-0351-6.
Butcher, James and Alain Stintzi (2013). “The transcriptional landscape of
Campylobacter jejuni under iron replete and iron limited growth conditions”.
In: PLoS One 8.11, e79475–e79475. DOI: 10 . 1371 / journal . pone .
0079475.
Carroll, Laura M., Teresa M. Bergholz, Ian M. Hildebrandt, and Bradley P. Marks
(2016). “Application of a Nonlinear Model to Transcript Levels of Upregu-
lated Stress Response Gene ibpA in Stationary-Phase Salmonella enterica Sub-
jected to Sublethal Heat Stress”. In: Journal of Food Protection 79.7, pp. 1089–
1096. DOI: 10.4315/0362-028X.JFP-15-377. eprint: https://doi.
org/10.4315/0362-028X.JFP-15-377.
Carvalho, Daniel Cardoso, Rafael Melo Palhares, Marcela Goncalves Drum-
mond, and Tiago Bolan Frigo (2015). “DNA Barcoding identification of com-
mercialized seafood in South Brazil: A governmental regulatory forensic
program”. In: Food Control 50, pp. 784–788.
Carvalho, Daniel Cardoso, Rafael Melo Palhares, Marcela Goncalves Drum-
mond, and Mario Gadanho (2017). “Food metagenomics: Next generation
sequencing identifies species mixtures and mislabeling within highly pro-
cessed cod products”. In: Food Control 80, pp. 183–186.
17
Casey, Aidan et al. (2014). “Transcriptome analysis of Listeria monocytogenes ex-
posed to biocide stress reveals a multi-system response involving cell wall
synthesis, sugar uptake, and motility”. In: Front Microbiol 5, pp. 68–68. DOI:
10.3389/fmicb.2014.00068.
Chang, Chia-Hao, Han-Yang Lin, Qiu Ren, Yeong-Shin Lin, and Kwang-
Tsao Shao (2016). “DNA barcode identification of fish products in Tai-
wan: Government-commissioned authentication cases”. In: Food Control 66,
pp. 38–43.
Conesa, Ana et al. (2016). “A survey of best practices for RNA-seq data analy-
sis”. In: Genome Biology 17.1, p. 13. DOI: 10.1186/s13059-016-0881-8.
De Filippis, Francesca, Alessandro Genovese, Pasquale Ferranti, Jack A. Gilbert,
and Danilo Ercolini (2016). “Metatranscriptomics reveals temperature-
driven functional changes in microbiome impacting cheese maturation rate”.
In: Sci Rep 6, pp. 21871–21871. DOI: 10.1038/srep21871.
De Filippis, Francesca, Eugenio Parente, and Danilo Ercolini (2016). “Metage-
nomics insights into food fermentations”. In: Microb Biotechnol 10.1, pp. 91–
102. DOI: 10.1111/1751-7915.12421.
De Mattia, Fabrizio et al. (2011). “A comparative study of different DNA bar-
coding markers for the identification of some members of Lamiacaea”. In:
Food Research International 44.3, pp. 693–702.
Deng, Xiangyu, Zengxin Li, and Wei Zhang (2012). “Transcriptome sequenc-
ing of Salmonella enterica serovar Enteritidis under desiccation and starvation
stress in peanut oil”. In: Food Microbiology 30.1, pp. 311–315.
Ekblom, Robert and Jochen B. W. Wolf (2014). “A field guide to whole-
genome sequencing, assembly and annotation”. In: Evolutionary Applica-
tions 7.9, pp. 1026–1042. DOI: 10.1111/eva.12178. eprint: https://
onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12178.
Ellis, David I., Howbeer Muhamadali, David P. Allen, Christopher T. Elliott, and
Royston Goodacre (2016). “A flavour of omics approaches for the detection
of food fraud”. In: Current Opinion in Food Science 10, pp. 7–15.
18
Ercolini, Danilo (2013). “High-throughput sequencing and metagenomics: mov-
ing forward in the culture-independent analysis of food microbial ecology”.
In: Appl Environ Microbiol 79.10, pp. 3148–3155. DOI: 10.1128/AEM.00256-
13.
FDA (1998). Bacteriological analytical manual, 8th edition, 1998 and Foodborne
pathogenic microorganisms and natural toxins handbook, 1998. Gaithersburg,
MD: AOAC International.
Galia, Wessam et al. (2017). “Strand-specific transcriptomes of Enterohemor-
rhagic Escherichia coli in response to interactions with ground beef micro-
biota: interactions between microorganisms in raw meat”. In: BMC Genomics
18.1, pp. 574–574. DOI: 10.1186/s12864-017-3957-2.
Galimberti, Andrea et al. (2013). “DNA barcoding as a new tool for food trace-
ability”. In: Food Research International 50.1, pp. 55–63.
Giordano, Francesca et al. (2017). “De novo yeast genome assemblies from Min-
ION, PacBio and MiSeq platforms”. In: Scientific Reports 7.1, p. 3935. DOI:
10.1038/s41598-017-03996-z.
Glenn, Travis C. (2011). “Field guide to next-generation DNA sequencers”. In:
Molecular Ecology Resources 11.5, pp. 759–769. DOI: 10.1111/j.1755-
0998.2011.03024.x. eprint: https://onlinelibrary.wiley.com/
doi/pdf/10.1111/j.1755-0998.2011.03024.x.
Goodwin, Sara, John D. McPherson, and W. Richard McCombie (2016). “Com-
ing of age: ten years of next-generation sequencing technologies”. In: Nature
Reviews Genetics 17, pp. 333–351.
Hellberg, Rosalee S., Brenda C. Hernandez, and Eduardo L. Hernandez (2017).
“Identification of meat and poultry species in food products using DNA bar-
coding”. In: Food Control 80, pp. 23–28.
Hollingsworth, Peter M., Sean W. Graham, and Damon P. Little (2011). “Choos-
ing and Using a Plant DNA Barcode”. In: PLOS ONE 6.5, e19254. DOI: 10.
1371/journal.pone.0019254.
19
Hollingsworth, Peter M., De-Zhu Li, Michelle van der Bank, and Alex D.
Twyford (2016). “Telling plant species apart with DNA: from barcodes to
genomes”. In: Philos Trans R Soc Lond B Biol Sci 371.1702, p. 20150338. DOI:
10.1098/rstb.2015.0338.
Hultman, Jenni, Riitta Rahkila, Javeria Ali, Juho Rousu, and K. Johanna
Bjorkroth (2015). “Meat Processing Plant Microbiome and Contamination
Patterns of Cold-Tolerant Bacteria Causing Food Safety and Spoilage Risks
in the Manufacture of Vacuum-Packaged Cooked Sausages”. In: Applied and
Environmental Microbiology 81.20. Ed. by H. L. Drake, pp. 7088–7097. DOI:
10.1128/AEM.02228-15. eprint: https://aem.asm.org/content/
81/20/7088.full.pdf.
Jackson, Brendan R. et al. (2016). “Implementation of Nationwide Real-time
Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and
Investigation”. In: Clinical Infectious Diseases 63.3, pp. 380–386. DOI: 10 .
1093 / cid / ciw242. eprint: http : / / oup . prod . sis . lan / cid /
article-pdf/63/3/380/8039807/ciw242.pdf.
Janda, J. Michael and Sharon L. Abbott (2007). “16S rRNA gene sequencing for
bacterial identification in the diagnostic laboratory: pluses, perils, and pit-
falls”. In: J Clin Microbiol 45.9, pp. 2761–2764. DOI: 10.1128/JCM.01228-
07.
Jarvis, Karen G. et al. (2015). “Cilantro microbiome before and after nonselective
pre-enrichment for Salmonella using 16S rRNA and metagenomic sequenc-
ing”. In: BMC Microbiology 15.1, p. 160. DOI: 10.1186/s12866- 015-
0497-2.
Jia, Kun et al. (2017). “Preliminary Transcriptome Analysis of Mature Biofilm
and Planktonic Cells of Salmonella Enteritidis Exposure to Acid Stress”. In:
Front Microbiol 8, pp. 1861–1861. DOI: 10.3389/fmicb.2017.01861.
Kane, Dawn E. and Rosalee S. Hellberg (2016). “Identification of species in
ground meat products sold on the U.S. commercial market using DNA-
based methods”. In: Food Control 59, pp. 158–163.
Kergourlay, Gilles, Bernard Taminiau, Georges Daube, and Marie-Christine
Champomier Vergès (2015). “Metagenomic insights into the dynamics of mi-
20
crobial communities in food”. In: International Journal of Food Microbiology
213, pp. 31–39.
Kim, Heejoong et al. (2015). “Utility of Stable Isotope and Cytochrome Oxidase
I Gene Sequencing Analyses in Inferring Origin and Authentication of Hair-
tail Fish and Shrimp”. In: Journal of Agricultural and Food Chemistry 63.22.
PMID: 25980806, pp. 5548–5556. DOI: 10 . 1021 / acs . jafc . 5b01469.
eprint: https://doi.org/10.1021/acs.jafc.5b01469.
Kovac, Jasna, Henk den Bakker, Laura M. Carroll, and Martin Wiedmann (2017).
“Precision food safety: A systems approach to food safety facilitated by
genomics tools”. In: TrAC Trends in Analytical Chemistry 96.Supplement C,
pp. 52–61.
Kumar, S., T. Kahlon, and S. Chaudhary (2011). “A rapid screening for adulter-
ants in olive oil using DNA barcodes”. In: Food Chemistry 127.3, pp. 1335–
1341.
Lessard, Marie-Helene, Catherine Viel, Brian Boyle, Daniel St-Gelais, and Steve
Labrie (2014). “Metatranscriptome analysis of fungal strains Penicillium
camemberti and Geotrichum candidum reveal cheese matrix breakdown and
potential development of sensory properties of ripened Camembert-type
cheese”. In: BMC Genomics 15, pp. 235–235. DOI: 10.1186/1471-2164-
15-235.
Liao, Yu-Chieh, Shu-Hung Lin, and Hsin-Hung Lin (2015). “Completing bacte-
rial genome assemblies: strategy and performance comparisons”. In: Scien-
tific Reports 5, p. 8747.
Liu, Lin et al. (2012). “Comparison of Next-Generation Sequencing Systems”. In:
Journal of Biomedicine and Biotechnology 2012. DOI: 10.1155/2012/251364.
Lusk, Tina S. et al. (2012). “Characterization of microflora in Latin-style cheeses
by next-generation sequencing technology”. In: BMC Microbiol 12, pp. 254–
254. DOI: 10.1186/1471-2180-12-254.
Lynch, Susan V. and Oluf Pedersen (2016). “The Human Intestinal Microbiome
in Health and Disease”. In: New England Journal of Medicine 375.24. PMID:
21
27974040, pp. 2369–2379. DOI: 10.1056/NEJMra1600266. eprint: https:
//doi.org/10.1056/NEJMra1600266.
McClure, Ryan et al. (2013). “Computational analysis of bacterial RNA-Seq
data”. In: Nucleic Acids Res 41.14, e140–e140. DOI: 10.1093/nar/gkt444.
Monnet, Christophe et al. (2016). “Investigation of the Activity of the Microor-
ganisms in a Reblochon-Style Cheese by Metatranscriptomic Analysis”. In:
Front Microbiol 7, pp. 536–536. DOI: 10.3389/fmicb.2016.00536.
Mudge, Jonathan M. and Jennifer Harrow (2016). “The state of play in higher
eukaryote gene annotation”. In: Nat Rev Genet 17.12, pp. 758–772. DOI: 10.
1038/nrg.2016.119.
Naaum, Amanda M. et al. (2018). “Complementary molecular methods detect
undeclared species in sausage products at retail markets in Canada”. In: Food
Control 84, pp. 339–344.
Noyes, Noelle R et al. (2016). “Resistome diversity in cattle and the environment
decreases during beef production”. In: eLife 5. Ed. by Ben Cooper, e13195.
DOI: 10.7554/eLife.13195.
Olson, Nathan D. et al. (2015). “Best practices for evaluating single nucleotide
variant calling methods for microbial genomics”. In: Front Genet 6, pp. 235–
235. DOI: 10.3389/fgene.2015.00235.
Ottesen, Andrea R. et al. (2013). “Co-enriching microflora associated with cul-
ture based methods to detect Salmonella from tomato phyllosphere”. In: PLoS
One 8.9, e73079. DOI: 10.1371/journal.pone.0073079.
Oulas, Anastasis et al. (2015). “Metagenomics: tools and insights for analyz-
ing next-generation sequencing data derived from biodiversity studies”. In:
Bioinform Biol Insights 9, pp. 75–88. DOI: 10.4137/BBI.S12462.
Pardo, Miguel Angel, Elisa Jimenez, and Begona Perez-Villarreal (2016). “Mis-
description incidents in seafood sector”. In: Food Control 62, pp. 277–283.
22
Postollec, Florence, Helene Falentin, Sonia Pavan, Jerome Combrisson, and
Daniele Sohier (2011). “Recent advances in quantitative PCR (qPCR) appli-
cations in food microbiology”. In: Food Microbiology 28.5, pp. 848–861.
Quigley, Lisa et al. (2016). “Thermus and the Pink Discoloration Defect in
Cheese”. In: mSystems 1.3. Ed. by Rachel J. Dutton. DOI: 10 . 1128 /
mSystems . 00023 - 16. eprint: https : / / msystems . asm . org /
content/1/3/e00023-16.full.pdf.
Rasooly, Avraham and Keith E. Herold (2008). “Food microbial pathogen detec-
tion and analysis using DNA microarray technologies”. In: Foodborne Pathog
Dis 5.4, pp. 531–550. DOI: 10.1089/fpd.2008.0119.
Ratnasingham, Sujeevan and Paul D. N. Hebert (2007). “bold: The Barcode of
Life Data System (http://www.barcodinglife.org)”. In: Mol Ecol Notes 7.3,
pp. 355–364. DOI: 10.1111/j.1471-8286.2007.01678.x.
Richardson, Emily J. and Mick Watson (2012). “The automatic annotation of bac-
terial genomes”. In: Briefings in Bioinformatics 14.1, pp. 1–12. DOI: 10.1093/
bib/bbs007. eprint: http://oup.prod.sis.lan/bib/article-
pdf/14/1/1/864359/bbs007.pdf.
Ripp, Fabian et al. (2014). “All-Food-Seq (AFS): a quantifiable screen for species
in biological samples by deep DNA sequencing”. In: BMC Genomics 15.1,
p. 639. DOI: 10.1186/1471-2164-15-639.
Rossi-Tamisier, Morgane, Samia Benamar, Didier Raoult, and Pierre-Edouard
Fournier (2015). “Cautionary tale of using 16S rRNA gene sequence simi-
larity values in identification of human-associated bacterial species”. In: In-
ternational Journal of Systematic and Evolutionary Microbiology 65.6, pp. 1929–
1934.
Roy, Sashwati and Chandan K. Sen (2006). “cDNA microarray screening in food
safety”. In: Toxicology 221.1, pp. 128–133. DOI: 10.1016/j.tox.2005.12.
025.
23
Sabat, A J et al. (2013). “Overview of molecular typing methods for outbreak
detection and epidemiological surveillance”. In: Eurosurveillance 18.4, 20380.
DOI: https://doi.org/10.2807/ese.18.04.20380-en.
Schoch, Conrad L. et al. (2012). “Nuclear ribosomal internal transcribed spacer
(ITS) region as a universal DNA barcode marker for Fungi”. In: Proceedings
of the National Academy of Sciences 109.16, pp. 6241–6246. DOI: 10.1073/
pnas.1117018109. eprint: https://www.pnas.org/content/109/
16/6241.full.pdf.
Sharpton, Thomas J. (2014). “An introduction to the analysis of shotgun metage-
nomic data”. In: Front Plant Sci 5, pp. 209–209. DOI: 10.3389/fpls.2014.
00209.
Shokralla, Shadi et al. (2014). “Next-generation DNA barcoding: using next-
generation sequencing to enhance and accelerate DNA barcode capture from
single specimens”. In: Mol Ecol Resour 14.5, pp. 892–901. DOI: 10.1111/
1755-0998.12236.
Siegwald, Lea et al. (2017). “Assessment of Common and Emerging Bioinfor-
matics Pipelines for Targeted Metagenomics”. In: PLOS ONE 12.1, e0169563.
DOI: 10.1371/journal.pone.0169563.
Stellato, Giuseppina et al. (2016). “Overlap of Spoilage-Associated Microbiota
between Meat and the Meat Processing Environment in Small-Scale and
Large-Scale Retail Distributions”. In: Applied and Environmental Microbiology
82.13. Ed. by C. A. Elkins, pp. 4045–4054. DOI: 10.1128/AEM.00793-16.
eprint: https://aem.asm.org/content/82/13/4045.full.pdf.
Swetha, V. P., V. A. Parvathy, T. E. Sheeja, and B. Sasikumar (2017). “Authentica-
tion of Myristica fragrans Houtt. using DNA barcoding”. In: Food Control 73,
pp. 1010–1015.
Tang, Silin et al. (2015). “Transcriptomic Analysis of the Adaptation of Listeria
monocytogenes to Growth on Vacuum-Packed Cold Smoked Salmon”. In: Appl
Environ Microbiol 81.19, pp. 6812–6824. DOI: 10.1128/AEM.01752-15.
24
Taylor, Angela J. et al. (2015). “Characterization of Foodborne Outbreaks of
Salmonella enterica Serovar Enteritidis with Whole-Genome Sequencing Sin-
gle Nucleotide Polymorphism-Based Analysis for Surveillance and Out-
break Detection”. In: Journal of Clinical Microbiology 53.10. Ed. by D. J.
Diekema, pp. 3334–3340. DOI: 10.1128/JCM.01280-15. eprint: https:
//jcm.asm.org/content/53/10/3334.full.pdf.
Valdes, Alberto, Clara Ibanez, Carolina Simo, and Virginia Garcia-Canas (2013).
“Recent transcriptomics advances and emerging applications in food sci-
ence”. In: TrAC Trends in Analytical Chemistry 52, pp. 142–154.
Wang, Jun and Huijue Jia (2016). “Metagenome-wide association studies: fine-
mining the microbiome”. In: Nature Reviews Microbiology 14, pp. 508–522.
Wang, Zhong, Mark Gerstein, and Michael Snyder (2009). “RNA-Seq: a revo-
lutionary tool for transcriptomics”. In: Nat Rev Genet 10.1, pp. 57–63. DOI:
10.1038/nrg2484.
Yandell, Mark and Daniel Ence (2012). “A beginner’s guide to eukaryotic
genome annotation”. In: Nature Reviews Genetics 13, pp. 329–342.
Yang, Xiang et al. (2016). “Use of Metagenomic Shotgun Sequencing Technology
To Detect Foodborne Pathogens within the Microbiome of the Beef Produc-
tion Chain”. In: Appl Environ Microbiol 82.8, pp. 2433–2443. DOI: 10.1128/
AEM.00078-16.
Zhang, Feng et al. (2014). “RNA-Seq-based transcriptome analysis of aflatoxi-
genic Aspergillus flavus in response to water activity”. In: Toxins (Basel) 6.11,
pp. 3187–3207. DOI: 10.3390/toxins6113187.
25
CHAPTER 2
WHOLE-GENOME SEQUENCING OF DRUG-RESISTANT SALMONELLA
ENTERICA ISOLATES FROM DAIRY CATTLE AND HUMANS IN NEW
YORK AND WASHINGTON STATES REVEALS SOURCE AND
GEOGRAPHIC ASSOCIATIONS 1
1FROM CARROLL, LAURA M., MARTIN WIEDMANN, HENK DEN BAKKER,
JULIE SILER, STEVEN WARCHOCKI, DAVID KENT, SVETLANA LYALINA, MAR-
GARET DAVIS, WILLIAM SISCHO, THOMAS BESSER, LORIN D. WARNICK, AND
RICHARD V. PEREIRA (2017). ”WHOLE-GENOME SEQUENCING OF DRUG-RESISTANT
SALMONELLA ENTERICA ISOLATES FROM DAIRY CATTLE AND HUMANS IN NEW
YORK AND WASHINGTON STATES REVEALS SOURCE AND GEOGRAPHIC ASSOCIA-
TIONS”. IN: APPLIED AND ENVIRONMENTAL MICROBIOLOGY 83, PP. E00140-17. DOI:
HTTPS://DOI.ORG/10.1128/AEM.00140-17.
26
2.1 Abstract
Multidrug-resistant (MDR) Salmonella enterica can be spread from cattle to hu-
mans through direct contact with animals shedding Salmonella, as well as
through the food chain, making MDR Salmonella a serious threat to human
health. The objective of this study was to use whole-genome sequencing to com-
pare antimicrobial-resistant (AMR) Salmonella enterica serovars Typhimurium,
Newport, and Dublin isolated from dairy cattle and humans in Washington
State and New York State at the genotypic and phenotypic levels. A total of 90
isolates were selected for the study (37 S. Typhimurium, 32 S. Newport, and 21
S. Dublin isolates). All isolates were tested for phenotypic antibiotic resistance
to 12 drugs using Kirby-Bauer disk diffusion. AMR genes were detected in the
assembled genome of each isolate using nucleotide BLAST and ARG-ANNOT.
Genotypic prediction of phenotypic resistance resulted in a mean sensitivity of
97.2% and specificity of 85.2%. Sulfamethoxazole-trimethoprim resistance was
observed only in human isolates (P < 0.05), while resistance to quinolones and
fluoroquinolones was observed only in 6 S. Typhimurium isolates from humans
in Washington State. S. Newport isolates showed a high degree of AMR pro-
file similarity, regardless of source. S. Dublin isolates from New York State
differed from those from Washington State based on the presence/absence of
plasmid replicons, as well as phenotypic AMR susceptibility/nonsusceptibility
(P < 0.05). The results of this study suggest that distinct factors may contribute
to the emergence and dispersal of AMR S. enterica in humans and farm animals
in different regions.
IMPORTANCE: The use of antibiotics in food-producing animals has been
hypothesized to select for AMR Salmonella enterica and associated AMR deter-
27
minants, which can be transferred to humans through different routes. Pre-
vious studies have sought to assess the degree to which AMR livestock- and
human-associated Salmonella strains overlap, as well as the spatial distribution
of Salmonella’s associated AMR determinants, but have often been limited by
the degree of resolution at which isolates can be compared. Here, a compara-
tive genomics study of livestock- and human-associated Salmonella strains from
different regions of the United States shows that while many AMR genes and
phenotypes were confined to human isolates, overlaps between the resistomes
of bovine and human-associated Salmonella isolates were observed on numer-
ous occasions, particularly for S. Newport. We have also shown that whole-
genome sequencing can be used to reliably predict phenotypic resistance across
Salmonella isolated from bovine sources.
2.2 Introduction
Salmonella enterica is estimated to cause approximately 1.2 million illnesses and
450 deaths each year in the United States alone (Scallan et al. 2011). While most
individuals recover without medical intervention, severe infections require hos-
pitalization and treatment with antimicrobials (Scallan et al. 2011). An even
greater challenge is posed when those infections are caused by antimicrobial-
resistant (AMR) organisms. The Centers for Disease Control and Prevention
(CDC) estimates that 100,000 infections due to AMR non-typhoidal Salmonella
occur in the United States annually and has designated AMR in non-typhoidal
Salmonella as a serious threat to public health (CDC 2013). More specifically, the
World Health Organization (WHO) has listed fluoroquinolone-resistant non-
typhoidal Salmonella as a global health concern (WHO 2014).
28
Both the CDC and WHO have called for improved monitoring of AMR along
the food chain, particularly in food-producing animals (CDC 2013; WHO 2014).
Due to concerns about the misuse of antimicrobials in farm animals, the farm is
often viewed as a reservoir in which AMR can be acquired by bacteria that are
then transmitted from animals to humans (Van Boeckel et al. 2015; Silbergeld,
Graham, and Price 2008). In this context, S. enterica becomes particularly rel-
evant, as it can be transmitted between animal and human populations (Hen-
driksen et al. 2004; Fey et al. 2000; Hoelzer, Moreno Switt, and Wiedmann 2011),
as well as through food (White et al. 2001; Cody et al. 1999; Hald et al. 2016).
A number of studies have sought to assess the extent to which AMR is ac-
quired by bacteria in livestock environments and subsequently transmitted to
humans, and many have arrived at different conclusions (Johnson et al. 2007;
Price et al. 2012; A. E. Mather et al. 2013; Alison E. Mather et al. 2012). Of-
ten, the degree of resolution at which isolates can be compared is a limiting
factor in determining the origin of a particular bacterial isolate and its AMR
profile. Methods such as multilocus sequence typing (MLST), serotyping, and
pulsed-field gel electrophoresis (PFGE) may not offer enough discriminatory
power to detect differences between isolates from different sources or locations
(Kwong et al. 2016; Holmes et al. 2015; Taylor et al. 2015), while phenotypic test-
ing of AMR may not distinguish between AMR mechanisms in different isolates
(A. E. Mather et al. 2013).
The extent to which Salmonella and AMR genes associated with it are trans-
mitted between animal and human sources remains unclear. The objective
of this study was to use whole-genome sequencing (WGS) to compare AMR
Salmonella enterica isolates previously serotyped as Typhimurium, Newport, or
29
Dublin isolated from dairy cattle and humans in Washington State and New
York State at the genotypic and phenotypic levels. In addition, correlations be-
tween AMR genotype and AMR phenotype were assessed. It was hypothesized
that sources and geographic differences between Salmonella isolates could be
elucidated at greater resolution through the implementation of WGS.
2.3 Materials and Methods
2.3.1 Isolate selection
A total of 93 Salmonella isolates were initially selected for the study. Bovine
isolates originated from the Washington Animal Disease Diagnostic Labora-
tory (WADDL), the Washington State Zoonotic Research Unit, the Cornell Ani-
mal Health Diagnostic Center (Ithaca, NY), and Salmonella strains isolated from
dairy cattle during previous research sampling at dairy farms. Isolates from hu-
man clinical specimens were obtained from the Washington State Department
of Health Public Health Laboratory and from the New York State Department
of Health Laboratory. Isolates were selected to (i) represent isolation dates be-
tween 2008 and 2012; (ii) represent one of the three serotypes of interest (Ty-
phimurium, Newport, and Dublin, as determined using traditional serotyping;
these serotypes were selected for their association with humans and cattle); and
(iii) represent isolates that had previously been tested for phenotypic resistance
to antimicrobials and were found to be resistant to at least one antimicrobial.
Bovine isolates originated from fecal samples, independent of whether the host
presented clinical signs of salmonellosis or not, while human isolates were from
30
stool samples of patients presenting clinical signs of salmonellosis. Among the
isolates that met these criteria, ”redundant” isolates were filtered out (those
known to come from the same animal/farm/farm visit), and selected isolates
were chosen to represent approximately equal numbers of human and bovine
isolates evenly distributed between New York State and Washington State. To
ensure consistency between phenotypic testing methods, all of the isolates se-
lected for this study were re-tested for phenotypic resistance using a single AMR
testing method and a panel of antimicrobial drugs (see ”Phenotypic AMR test-
ing” below).
Following WGS (see ”Whole-genome sequencing” below), seven isolates
were found to belong to species/serotypes different from those to which they
were initially assigned. One isolate that had been initially classified as S. enter-
ica serotype Newport was found to belong to the genus Citrobacter. In addition,
in silico multilocus sequence typing (MLST) and in silico serotyping using WGS
data from the isolates (see ”In silico serotyping and MLST” below) revealed that
two of the isolates that had been classified as serotypes Typhimurium and New-
port using traditional serotyping methods actually belonged to serotypes Give
and Montevideo, respectively. These two isolates, as well as the Citrobacter iso-
late, were excluded from the study. Four isolates that were classified using
traditional serotyping as Newport, Typhimurium, Typhimurium, and Dublin
were reclassified as Dublin, Newport, Dublin, and Newport, respectively, and
remained in the study under the new serotype classifications. A total of 90 iso-
lates (37 S. Typhimurium, 32 S. Newport, and 21 S. Dublin isolates; see Table S1
in the supplemental material for details) were used in all subsequent analyses.
31
2.3.2 Phenotypic AMR testing
The antimicrobial susceptibility of each Salmonella isolate was tested using
a modified National Antimicrobial Resistance Monitoring System (NARMS)
panel of 12 antimicrobial drugs. Susceptibility testing was performed using a
Kirby-Bauer disk diffusion agar assay in accordance with the guidelines pub-
lished by the Clinical and Laboratory Standards Institute (CLSI) and a method-
ology previously described (CLSI 2012; CLSI 2013). Internal quality control
was performed by the inclusion of E. coli ATCC 25922, which had previously
been determined to be pan-susceptible, as well as an E. coli isolate that had been
previously characterized as positive for the blaCMY-2 gene and resistant to nine
of the antimicrobial agents tested. All isolates were tested using the following
panel: ampicillin (AMP) at 10 µg, amoxicillin-clavulanic acid (AMC) at 20 and
10 µg, respectively, cefoxitin (FOX) at 30 µg, ceftiofur (TIO) at 30 µg, ceftriax-
one (CRO) at 30 µg, chloramphenicol (CHL) at 30 µg, ciprofloxacin (CIP) at 5 µg,
nalidixic acid (NAL) at 30 µg, streptomycin (STR) at 10 µg, tetracycline (TET) at
30 µg, sulfisoxazole (SX) at 250 µg, and trimethoprim-sulfamethoxazole (SXT) at
23.75 and 1.25 µg, respectively. Results of the disk diffusion test for the inter-
nal quality control strains were within the anticipated standards. Isolates were
categorized as susceptible, intermediate, or resistant (SIR) by measuring the in-
hibition zone and using interpretive criteria and breakpoints established by the
CLSI guidelines for each antimicrobial (CLSI 2012).
32
2.3.3 Whole-genome sequencing
Isolates were plated on brain heart infusion (BHI) agar (Becton, Dickinson and
Company, Franklin Lakes, NJ), grown for 24 h, and inoculated into 1.0 ml BHI
broth in a Nunc U96 PP 2-ml DeepWell Natural plate (Fisher Scientific, Pitts-
burgh, PA). Following overnight incubation at 37◦C, cells were pelleted by cen-
trifugation at 3,320 relative centrifugal force (RCF) for 15 min. DNA extraction
for the majority of isolates was performed with the DNeasy 96 blood and tissue
kit (Qiagen, Valencia, CA) according to the manufacturer’s specifications for
high-throughput applications. DNA extraction for a smaller group of isolates
was performed using the QIAamp DNA minikit (Qiagen, Valencia, CA) accord-
ing to the manufacturer’s protocol for bacteria. DNA was eluted in 50 µl Tris-
HCl at pH 8.0 and stored at 4◦C prior to sequencing. Following an initial spec-
trophotometry step to determine the optical density at 260 nm (OD260)/OD280
measurements, the genomic DNA from each isolate was quantified using a flu-
orescent nucleic acid dye (Picogreen; Invitrogen, Paisley, UK) and diluted to 200
pg/µl. Sequencing libraries were prepared using the Nextera XT DNA sample
preparation kit and the associated Nextera XT Index kit with 96 indices (Illu-
mina, Inc., San Diego, CA) according to the manufacturer’s instructions. Pooled
samples were sequenced with 2 lanes of an Illumina HiSeq 2500 rapid run with
2 x 100-bp paired-end sequencing.
2.3.4 Initial data processing and genome assembly
Illumina sequencing adapters and low-quality bases were trimmed using Trim-
momatic version 0.32 for Nextera paired-end reads (Bolger, Lohse, and Usadel
33
2014). FastQC version 0.11.2 was used to confirm that all adapter sequences
had been removed and that the read quality was appropriate (Andrews 2014).
Genomes were assembled de novo using SPAdes version 3.0.0, as SPAdes has
been shown to produce few misassemblies and yield contigs with high N50
values when assembling bacterial genomes de novo from Illumina short reads
(Bankevich et al. 2012). Genome coverage was determined using BBMap ver-
sion 35.49 (Bushnell 2015) and samtools version 0.1.19-96b5f2294a (H. Li et al.
2009).
2.3.5 In silico serotyping and MLST
To assess the results of traditional serotyping, in silico serotyping was performed
using SeqSero and the assembled genome for each isolate (Zhang et al. 2015).
In addition, MLST was performed using the Short Read Sequence Typer 2 ver-
sion 0.1.5 (SRST2) and the trimmed Illumina paired-end reads (Inouye et al.
2014). Sequence types were associated with serotypes using the University of
Warwick’s MLST database for Salmonella (http://mlst.warwick.ac.uk).
2.3.6 In silico AMR gene detection
AMR genes were detected in all 90 assembled genomes using nucleotide BLAST
(blastn) version 2.4.0 (Camacho et al. 2009) and the formatted ARG-ANNOT
database included with SRST2 (Inouye et al. 2014; Gupta et al. 2014). To prevent
overlapping hits due to the presence of multiple alleles of the same gene in the
database, one gene was selected from each SRST2-ARG-ANNOT gene group
34
and used to build a reduced database (Inouye et al. 2014). Genes that were
detected using blastn and belonged to a particular gene group were categorized
as being present in a genome if they were detected at 50% coverage and 75%
nucleotide identity.
2.3.7 Initial phylogenetic tree construction and reference
genome selection
The closed chromosomal sequences of S. Typhimurium strain LT2 (RefSeq
NC 003197.1), S. Newport strain SL254 (GenBank accession no. CP001113), and
S. Dublin strain CT 02021853 (RefSeq NC 011205.1) were chosen as candidate
reference sequences for reference-based SNP calling. To obtain an initial phy-
logeny of all isolates and determine if these candidate reference sequences clus-
tered appropriately with the genomes of the isolates used in this study, a phy-
logenetic tree was constructed using the assembled genomes of all 90 isolates
and the three candidate reference genomes using kSNP version 2.1.2 (Gardner
and Hall 2013). Kchooser was used to determine an optimum k-mer size of 19
(Gardner and Hall 2013). This core SNP phylogeny based on the genomes of
all 90 isolates used in the study, as well as three closed reference genomes from
GenBank, clustered isolates into three distinct clades (see Fig. S1 in the supple-
mental material). As a result, all subsequent analyses were performed within
each serotype clade to maximize resolution.
35
2.3.8 Reference-based variant calling
Variant calling was performed within each of the three serotypes using the Cor-
tex variant caller (cortex var) (Iqbal et al. 2012). For S. Typhimurium isolates,
S. Typhimurium strain LT2 was used as a reference genome. For S. Newport
isolates, S. Newport strain SL254 was used as a reference, as all of the Newport
isolates in this study were predicted to have the same sequence type (ST45) us-
ing SRST2 (Inouye et al. 2014). For S. Dublin isolates, strain CT 02021853, which
was used as a candidate reference in the initial phylogenetic tree, clustered rela-
tively far from the closely related S. Dublin isolates used in this study. In order
to obtain better resolution, variant calling was performed a second time using
the contigs of isolate BOV DUBN WA 10 R9 3233 as a reference, as its assem-
bly had the highest coverage of all of the S. Dublin isolates used in the study.
An additional 11 SNPs were found using isolate BOV DUBN WA 10 R9 3233
as a reference; these SNPs were included in subsequent analyses. SNPs were fil-
tered from other variants using Plink/Seq version 0.10 (PLINK/Seq 2014), and
recombination events were filtered out using Gubbins version 1.4.2 (Croucher
et al. 2015). Within each serotype, only SNPs at positions present in all genomes
were used. MEGA6 was used to identify the best nucleotide substitution mod-
els for SNPs within each serotype (Tamura et al. 2013). For S. Typhimurium,
the general time-reversible (GTR) model was selected as the best model (Tavare
n.d.), while the Kimura 2-parameter model (Kimura 1980) was selected for both
S. Newport and S. Dublin.
For each serotype, BEAST version 1.8.2 (Alexei J. Drummond et al. 2012)
was used to construct rooted phylogenetic trees. An ascertainment bias correc-
tion was applied to account for the use of solely variant sites (Rambaut 2013).
36
The best nucleotide substitution model, as determined by MEGA6, was used for
each serotype, and base frequencies were estimated. Temporal signals, which
were assessed using Path-O-Gen version 1.4 (now TempEst) (Rambaut et al.
2016), were not strong enough to estimate evolutionary rates using sampling
dates (R < 0.10). As a result, the clock rate was set to 1.0 and tip dates were not
used. For each serotype, combinations of either a strict or lognormal relaxed
molecular clock (A. J. Drummond, Ho, et al. 2006) and either a coalescent con-
stant size or Bayesian skyline population (A. J. Drummond, Rambaut, et al.
2005) were tested. Trees were constructed using chain lengths of 100 million
generations, with sampling every 10,000 generations. Path sampling analyses
(Baele, Lemey, et al. 2012; Baele, W. L. S. Li, et al. 2013) were performed using
100 steps of 1 million generations, sampling every 1,000 generations. Bayes fac-
tors were calculated to determine which combination of molecular clock and
population models best modeled each serotype. For S. Typhimurium and S.
Newport, the best model used a relaxed molecular clock with a constant coa-
lescent population model. For S. Dublin, the best model used a strict molecular
clock with a constant coalescent population.
2.3.9 Plasmid replicon detection
Plasmid replicons were detected in all whole-genome sequences using Plas-
midFinder version 1.3 (Carattoli et al. 2014). An identity cutoff of 80% was
used. PlasmidFinder was also used to confirm that plasmid replicons could not
be detected in the chromosomal sequences of S. Typhimurium LT2, S. Newport
SL254, and S. Dublin CT 02021853.
37
2.3.10 Statistical analyses
Matrices were created using (i) the sequences of all AMR genes detected us-
ing blastn, (ii) phenotypic antimicrobial resistance/susceptibility, and (iii) the
presence/absence of plasmid replicons detected using PlasmidFinder. For the
phenotypic resistance matrix, isolates showing resistance or intermediate resis-
tance to a particular antimicrobial, using NARMS breakpoints, were treated as
resistant and given a value of 1, while susceptible isolates were given a value
of 0. Fisher’s exact tests were conducted to test whether a given AMR gene,
AMR phenotype, or plasmid replicon was statistically associated with a par-
ticular source and/or geographic location using the fisher.test function in R
version 3.3.0 (R Core Team 2016). When performing Fisher’s exact tests for
each serotype category with n isolates, gene groups, AMR phenotypes, and
plasmid replicons present in fewer than 3 and more than n − 3 isolates were
not tested. A Holm-Bonferroni correction was applied to each test to correct
for multiple comparisons (Holm 1979). Additionally, Fisher’s exact tests were
used to test if any AMR gene groups were statistically associated with any
plasmid replicons. Plasmid replicons present in fewer than 5 and more than
n − 5 isolates were not tested, and a Bonferroni correction was applied to cor-
rect for multiple comparisons. Analysis of similarity (ANOSIM) (Clarke 1993)
using the anosim function in the vegan package (Oksanen et al. 2017) in R was
used to determine if the average ranks of within-serotype, within-source, and
within-geographic-group distances were greater than or equal to the average
ranks of between-group distances using AMR gene sequences, phenotypic resis-
tance to a particular antimicrobial, and/or plasmid replicon presence/absence
data (Anderson and Walsh 2013). For ANOSIM simulations using AMR
gene sequences, 5 runs of 10,000 permutations using unweighted unifrac dis-
38
tances (Lozupone and Knight 2005) were conducted. For all ANOSIM simu-
lations using phenotypic resistance/susceptibility and plasmid replicon pres-
ence/absence matrices, 5 runs of 10,000 permutations using Raup-Crick dis-
similarities (Chase et al. 2011) were conducted. PERMANOVA (Anderson
2001) was performed to test whether the centroids of serotype, source, and ge-
ographic groups were equivalent for all groups (Anderson and Walsh 2013)
based on AMR gene sequences, phenotypic resistance to a particular antimicro-
bial, and/or plasmid replicon presence/absence using the adonis function in
R’s vegan package (Oksanen et al. 2017). Three runs of 10,000 permutations
using unweighted unifrac distances were used to obtain mean PERMANOVA
test statistics (F) and P values for AMR gene sequences, while three runs
of 100,000 permutations and Raup-Crick distances were used for phenotypic
resistance/susceptibility and plasmid replicon presence/absence data. The
metaMDS function in the vegan package was used to perform nonmetric multi-
dimensional scaling (NMDS) (Kruskal 1964a; Kruskal 1964b) using monoMDS
(Oksanen et al. 2017), a maximum of 10,000 random starts, and an appropriate
distance metric (unweighted unifrac distances for AMR gene sequence data and
Raup-Crick dissimilarities for phenotypic resistance/susceptibility and plas-
mid replicon presence/absence data). Interactive NMDS plots can be found
at https://github.com/lmc297/2017 AEM Figure S2.
Descriptive analyses of the susceptible/intermediate/resistant (SIR) distri-
bution of Salmonella isolates by antimicrobial drug and distribution of AMR
phenotypes and genes were performed using PROC FREQ in SAS (SAS Insti-
tute Inc., USA). To evaluate the effect of presence or absence of resistance genes
on the mean zone diameter (in centimeters) of the Kirby-Bauer disk diffusion
test, multivariable mixed logistic regression models were fitted to the data us-
39
ing the Glimmix procedure of SAS. The independent variables (i) isolate source
(bovine or human), (ii) isolation location (New York State or Washington State),
and (iii) serotype were included in all models.
2.3.11 Accession number(s) and supplemental material
Paired-end reads for the 90 isolates used in this study have been deposited
in the National Center for Biotechnology Information’s (NCBI) Sequence Read
Archive (SRA) under study accession number SRP068320. Supplemental mate-
rial for this article may be found at https://doi.org/10.1128/AEM.00140-17.
2.4 Results
2.4.1 Overall distribution of SNPs, AMR genes, AMR pheno-
types, and plasmid replicons
Of the three serotypes studied, S. Typhimurium displayed the highest degree of
phylogenetic diversity. Variant calling revealed a total number of 2,976 variants
in the S. Typhimurium isolates, with 2,723 of those variants called as single nu-
cleotide polymorphism (SNPs). In S. Newport, only 327 variants were called,
263 of which were SNPs. The fewest number of variants occurred in S. Dublin,
with 183 variants, 131 of which were SNPs.
AMR genes belonging to 42 different groups were detected in the 90
genomes (see Table S2 in the supplemental material). The most common genes
40
belonged to groups associated with resistance to penicillins (penicillin bind-
ing protein [PBP] gene), aminoglycosides [aac(6)-Iaa, strA, and strB], phenicols
(floR), tetracyclines [tet(A) and tet(R)], cephalosporins (CMY), and sulfonamides
(sul2) (Table 2.1). At the phenotypic level, all isolates displayed resistance or
intermediate resistance to between 1 and 11 antimicrobials. The most common
antimicrobial to which isolates were resistant was ampicillin (AMP), as 88 of 90
isolates were AMP resistant (Table 2.1). In addition, a total of 20 different plas-
mid replicons were detected in the genomes of the 90 isolates used in the study.
The three most common replicons (ColRNAI, ColpVC, and IncA/C2) were each
detected in over one-half of all isolates (Table 2.1). Several significant (P < 0.001)
associations between plasmid replicons and AMR gene groups were observed,
including the IncA/C2 replicon and gene groups CMY, floR, strA-strB, sul2, and
tet(A)-tet(R) (see Table S3 in the supplemental material). These genes had pre-
viously been found on an IncA/C2 plasmid isolated from S. Newport (Fricke
et al. 2009).
Serotypes were found to differ with regard to AMR gene sequences, pheno-
typic resistance/susceptibility, and the presence/absence of plasmid replicons
when using analysis of similarity (ANOSIM) and/or permutational multivari-
ate analysis of variance (PERMANOVA; P < 0.001 after a Holm-Bonferroni cor-
rection) (Table 2.2). Of the three serotypes studied, S. Typhimurium showed
the widest range of AMR gene profiles, phenotypic AMR profiles, and plasmid
replicon presence/absence profiles (Figure 2.1).
41
Table 2.1: Ranking of the five most common antimicrobial resistance (AMR)
gene groups, phenotypic AMR profiles, and plasmid replicons for all serotypes,
S. Typhimurium, S. Newport, and S. Dublina
Rankb All isolates (n = 90) S. Typhimurium (n = 37) S. Newport (n = 32) S. Dublin (n = 21)
AMR gene groups
1 aac(6)-Iaa, PBP gene (90) aac(6)-Iaa, PBP gene (37) aac(6)-Iaa, CMY, PBP gene, aac(6’)-Iaa, CMY, PBP gene,
strA, strB, sul2, tet(A), tet(R) sul2 (21)
(32)
2 floR (72) aadA (25) floR (30) strA, strB, tet(A), tet(R) (20)
3 CMY, tet(A), tet(R) (68) floR (23) aph(3”)-Ia (22) floR (19)
4 sul2 (67) sul1 (21) aadA, dfrA, sul1 (3) aph(3”)-Ia (18)
5 strA, strB (64) aph(3”)-Ia (20) blaTEM-1D (15)
Phenotypic AMR profile
1 AMP (88) AMP (35) AMC; AMP; CRO; FOX; AMP; CRO; TIO (21)
STR; SX; TIO; TET (32)
2 TET (82) TET (31) CHL (30) AMC; FOX; SX (20)
3 AMC; SX (81) STR (30) SXT (3) CHL; TET (19)
4 CHL; STR (72) AMC; SX (29) STR (10)
5 CRO; TIO (71) CHL (23) SXT (1)
Plasmid replicons
1 ColRNAI (77) ColRNAI (27) ColRNAI; IncA/C2 (32) IncX1 (21)
2 ColpVC (63) IncFII(S) (25) ColpVC (26) IncA/C2 (20)
3 IncA/C2 (60) ColpVC (20) IncI1 (2) ColRNAI (18)
4 IncFII(S) (36) IncFIB(S) (17) Col(BS512) (1) ColpVC (17)
5 IncX1 (22) IncI1 (10) IncFII(S) (11)
aNumbers in parentheses indicate the number of isolates (i) carrying genes classified into a given AMR gene group, (ii) resistant to a
given antimicrobial, or (iii) carrying a given plasmid replicon.
bRank is based on the frequency of (i) AMR gene group presence, (ii) phenotypic resistance, and (iii) plasmid replicon presence.
2.4.2 In silico AMR gene detection is correlated with pheno-
typic AMR patterns.
Genotypic and phenotypic AMR data were used to evaluate the ability of geno-
typic data to predict phenotypic resistance (Figure 2.2). Ciprofloxacin (CIP) was
not included in these analyses due to the rarity of resistant isolates in this data
set (1 of the 90 isolates). Based on the 11 remaining antimicrobials, genotypic
prediction of phenotypic resistance resulted in a mean sensitivity of 97.2% and
specificity of 85.2% (Table 2.3). Genotypic prediction of phenotypic resistance
to AMP, cefoxitin (FOX), chloramphenicol (CHL), streptomycin (STR), sulfisox-
azole (SX), and tetracycline (TET) had a sensitivity of 100%, while the prediction
of phenotypic resistance to AMP, ceftiofur (TIO), ceftriaxone (CRO), nalidixic
acid (NAL), and trimethoprim-sulfamethoxazole (SXT) had a specificity of 100%
(Table 2.3). With the exception of NAL, genotypic prediction of phenotypic re-
42
Table 2.2: ANOSIM and PERMANOVA statistics and their respective mean P
valuesa
ANOSIM PERMANOVA
Serotype(s) Grouping factor/responseb R statistic Mean uncorrected F statistic Mean uncorrected
P value P value
Antimicrobial resistance gene sequences
All Serotype 0.234c < 0.001c 15.598d < 0.001d
Typhimurium Source 0.079 0.040 2.937 0.020
Typhimurium Location 0.045 0.105 2.093 0.074
Newport Source 0.034 0.169 3.405 0.004
Newport Location 0.241c 0.002c 3.185 0.008
Dublin Source 0.041 0.188 1.578 0.231
Dublin Location 0.145 0.064 5.366 0.004
Phenotypic antimicrobial resistance/susceptibility profiles
All Serotype 0.200c < 0.001c 1.037 0.433
Typhimurium Source 0.122 0.015 6.796 0.012
Typhimurium Location −0.003 0.417 0.181 0.727
Newport Source −0.030 1.000 1.739 0.053
Newport Location 0.103 0.072 1.699 0.074
Dublin Source 0.089 0.053 1.060 0.477
Dublin Location 0.481c < 0.001c 4.717d < 0.001d
Plasmid replicon presence/absence profiles
All Serotype 0.350c < 0.001c 21.800d < 0.001d
Typhimurium Source 0.025 0.201 −0.299 0.853
Typhimurium Location 0.107 0.009 6.077 0.011
Newport Source −0.030 0.934 2.118 0.042
Newport Location 0.098 0.074 1.572 0.105
Dublin Source 0.040 0.146 1.521 0.116
Dublin Location 0.408c < 0.001c 4.466d < 0.001d
aRows in boldface indicate that at least one test was significant (P < 0.05) after a Holm-Bonferroni correction was applied.
bGrouping factors used were serotype (only for ”All isolates”), source (bovine or human), and location (New York or Washington State).
cSignificant ANOSIM test (P < 0.05) after a Holm-Bonferroni correction was applied.
dSignificant PERMANOVA test (P < 0.05) after a Holm-Bonferroni correction was applied.
sistance resulted in sensitivities greater than 90% for all drugs (Table 2.3). For all
antimicrobials other than AMC, STR, SX, and TET, genotypic prediction of phe-
notypic resistance had specificity above 90% (Table 2.3). Consistent with these
findings, significant differences in resistance (determined by the mean zone di-
ameters from the Kirby-Bauer disk diffusion assays) were observed between
isolates carrying at least one AMR gene conferring resistance to a given antimi-
crobial and those isolates that did not carry said AMR gene (P < 0.05 after a
Holm-Bonferroni correction) (Table 2.4).
43
1.0
●
0.2 0.4 0.5
0.1 Serotype Serotype Serotype
●
● Dublin ● Dublin ● Dublin
0.0 Newport 0.0 Newport Newport●
●
● ●●● Typhimurium Typhimurium Typhimurium
● ● ●●● ● 0.0
−0.1
● ●
● ●
−0.2 −0.4 ●
−0.25 0.00 0.25 0.50
NMDS1
●
−0.5 ●
−0.5 0.0 0.5 1.0
NMDS1
●
−0.5 0.0 0.5 1.0
NMDS1
Figure 2.1: Nonmetric multidimensional scaling (NMDS) plots for all iso-
lates based on antimicrobial resistance (AMR) gene sequences (A), pheno-
typic antimicrobial resistance/susceptibility profiles (B), and presence/absence
of plasmid replicons (C). Points represent isolates, while shaded regions and
convex hulls correspond to isolate serotypes. For an interactive plot of
these data, as well as interactive NMDS plots for individual serotypes, visit
https://github.com/lmc297/2017 AEM Figure S2.
Table 2.3: Sensitivity and specificity of genotype predictions of AMR phenotype
for all 90 Salmonella isolates in the study.
Phenotype: resistant (n)b Phenotype: susceptible (n)
Antimicrobiala Genotype: resis- Genotype: suscep- Genotype: resis- Genotype: suscep- Sensitivity (%) Specificity (%)
tant tible tant tible
AMC 71 2 6 11 97.3 64.7
AMP 88 0 0 2 100.0 100.0
FOX 67 0 1 22 100.0 95.7
TIO 70 1 0 19 98.6 100.0
CRO 70 1 0 19 98.6 100.0
CHL 72 0 1 17 100.0 94.4
NAL 5 1 0 84 83.3 100.0
STR 72 0 17 1 100.0 5.6
SX 81 0 1 8 100.0 88.9
SXT 11 1 0 78 91.7 100.0
TET 82 0 1 7 100.0 87.5
Overall 97.2 85.2
aAMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; NAL,
nalidixic acid; STR, streptomycin; SX, sulfisoxazole; SXT, sulfamethoxazole/trimethoprim; TET, tetracycline
bIsolates that showed intermediate resistance to an antimicrobial are categorized as resistant.
2.4.3 S. Typhimurium phylogeny, AMR genes, AMR pheno-
types, and plasmid replicons
A BEAST phylogeny of the 37 S. Typhimurium genomes separated
these isolates into two major clades (Figure 2.3; posterior probability,
44
NMDS2
NMDS2
NMDS2
Figure 2. Genotypic and phenotypic resistance of each serotype-source group to various antimicrobials. 
Genotypic resistance was determined using nucleotide BLAST (blastn) and the ARG-ANNOT 
database. Isolates were classified as having a resistant genotype if the AMR gene was detected by 
FBigLuAreS 1T.   Gweintohty pai cm ainndi mpheunmoty cpioc vreersiastganec eo fo f5 e0a%ch  saenrodty ape m-soiunricme ugrmou ps etoq vuaerinoucse  aindtiemnictriotbyi aolsf.  7G5e%not.y pPich erensoisttaynpceic w naos n-
dseutesrmined uif thec AepMtRib gi
sliintgy  nwucaleso ttidees teBdL AuSsTi n(gbl aKstni)r baynd-B thaeu AeRr Gd-iAsNk NdOifTf udsaitaobnas. eP.  eIsrocleatnest awgeeres  cwlaessrifeie dc aalsc uhalvainteg da  uressiisntagn tt genotype ene was detected by BLAST with a minimum coverage of 50% and a minimum sequence identity of 75%.  Phehneo tyrpaitci o 
noofn -rseussciesptatibnitl itiys owlast etess tteod  tuositnag l Kisiroblya-Bteasu eirn d eisak cdhif fusseioron.t y Pper-csenotaugrecse w gerreo ucaplc u(lnat e=d  1u7sin fgo trh eS .r aTtioy pofh riemsisutarnitu imsol aBteos vtoin e, 
tonta =l i s2ol0at efso rin  Se.a cThy sperhoitmypeu-rsiourmce  Hgruoump a(n ,= n 1 7= f o1r 4S .f Toyrp hSi.m Nurieuwm pBoorvt inBe,o nv i=n 2e0, fno r =S .1 T8y pfhoimr uSri.u Nm eHwupmoanr,t  nH =u 1m4 afonr ,S n.  = 
Newport Bovine, n = 18 for S1.0 N feowrp oSrt.  HDuumbanl,i nn  B= o10v ifonre S, . aDnudbl inn  =Bo 1vi1ne f, oarn dS .n  D= u11b lfoinr  SH. Duumblian nH).u man). 
100%
80%
60%
40%
20%
0%
AMC AMP FOX TIO CRO CHL NAL STR SX SXT TET
S. Dublin Bovine S. Dublin Human S. Newport Bovine
S. Newport Human S. Typhimurium Bovine S. Typhimurium Human
 
AMC, amoxicillin/clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, Ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; STR, 
streptomycin; SX, sulfisoxazole; SXT, sulfamethoxazole / trimethoprim; TET, tetracycline. 
Figure 2.2: Frequency of different phenotypic and genotypic resistance determi-
nants for each serotype-source group (e.g., Salmonella Dublin isolates obtained
from humans [S. Dublin Human]). Genotypic resistance was determined using
nucleotide BLAST (blastn) and the ARG-ANNOT database; isolates were clas-
sified as having a resistant genotype if the AMR gene was detected by BLAST
with a minimum coverage of 50% and a minimum sequence identity of 75%.
Phenotypic resistance was tested using Kirby-Bauer disk diffusion. Percent-
ages were calculated using the ratio of resistant isolates to total isolates in each
serotype-source group (n = 17 for S. Typhimurium Bovine, n = 20 for S. Ty-
phimurium Human, n = 14 for S. Newport Bovine, n = 18 for S. Newport Hu-
man, n = 10 for S. Dublin Bovine, and n = 11 for S. Dublin Human). Nalidixic
acid (NAL)- and sulfamethoxazole-trimethoprim (SXT)-resistant isolates (6 and
12 of the 90 isolates, respectively) each had one isolate for which genotypic re-
sistance did not correlate with phenotypic resistance.
1). One of these clades contained human isolates exclusively (n =
8), while the other major clade included 12 human and 17 bovine iso-
lates (Figure 2.3). Three isolates within this ”mixed source” clade
were particularly similar based on their AMR gene sequences: isolates
BOV TYPH WA 09 R9 3247 (isolated from a dairy cow in Washington State
in 2009), HUM TYPH WA 09 R9 3271 (isolated from a human in Washington
State in 2009), and HUM TYPH NY 12 R9 0437 (isolated from a human in New
York State in 2012) appeared to have highly similar AMR gene profiles (see Fig-
45
Percent
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Phenotype
Genotype
Table 2.4: Comparison of mean zone diameters between (i) Salmonella isolates
with at least one AMR gene (ARG) that has been known to confer resistance
to a particular antimicrobial and (ii) isolates with no genes known to confer
resistance to that antimicrobial.a
95% CI of MZDa (cm)
Antimicrobial ARG absent ARG present
Aminopenicillins
Ampicillin 25.4-25.6 0.0-0.02
Amoxicillin-clavulanic acid 13.9-18.7 9.2-11.0
Chloramphenicol 24.4-27.6 0.02-1.45
Cephalosporins
Ceftiofur 25.5-29.5 12.7-14.5
Ceftriaxone 29.7-34.5 13.4-15.5
Cefoxitin 23.2-27.5 8.4-10.2
Streptomycin 13.9-21.1 3.1-5.3
Sulfonamides
Sulfisoxazole 22.4-26.2 0.0-0.9
Sulfamethoxazole-trimethoprim 23.8-25.8 0-3.3
Tetracycline 19.0-26.5 2.0-4.2
aMZD, mean zone diameter; CI, confidence interval. All P values were < 0.0001.
ure S2 posted at https://github.com/lmc297/2017 AEM Figure S2). All AMR
genes in these three isolates matched with 100% sequence identity except for
tet(RG); HUM TYPH WA 09 R9 3271 tet(RG) differed from the other two iso-
lates at nucleotide position 73.
Overall, 41 of the 42 AMR gene groups identified in the 90 isolates in this
study were detected in S. Typhimurium (all except aadB; Figure 2.3). The 37 S.
Typhimurium isolates were distributed into 24 different genotypic MDR pro-
files, the most common of which was aac(6)-Iaa floR sul1 tet(RG) tet(G) blaCARB
aadA PBP gene, which was found in 11% of S. Typhimurium genomes. In ad-
dition, between 2 and 7 unique plasmid replicons were detected per genome
(Figure 2.3). When ANOSIM and PERMANOVA were applied as metrics to as-
sess clustering based on either AMR gene sequences or plasmid replicon pres-
ence/absence, there were no significant differences between bovine and human
isolate clusters or between New York and Washington State clusters (P > 0.05 af-
ter a Holm-Bonferroni correction) (Table 2.2). While neither ANOSIM nor PER-
46
47
AMR Genes Phenotypic AMR Plasmid Replicons
AMR Genes Phenotypic AMR
Plasmid Replicons
BOV_TYPH_NY_10_R8_7307
BOV_TYPH_NY_10_R8_7307 BOV_TYPH_NY_10_R8_7307
BOV_TYPH_NY_10_R8_7307 HBUOMV__TTYYPPHH__NNYY__0180__RR88__07736037 BOV_TYPH_NY_10_R8_7307 BOV_TYPH_NY_10_R8_7307HUM_TYPH_NY_08_R8_0763 HUM_TYPH_NY_08_R8_07631 BOV_TYPH_WA_09_R9_3247
HUM_TYPH_NY_08_R8_0763 HHUUMM_T_TYYPPHH_N_NYY_1_20_8R_R9_80_4037763 HBOUMV__TTYYPPHH__WNYA__0089__RR89__03726437 HUBMO_VT_YTPYHP_HN_YW_0A8__0R98__R097_633247
BOV_TYPH_WA_10_R9_3249 HUM_TYPH_NY_12_R9_0437 HUM_TYPH_NY_12_R9_0437
BOV_TYPH_WA_09_R9_3247 BOV_TYPH_WA_09_R9_3247 BBOV_TYPH_WA_09_R9_3247 BOV_TYPH_WA_09_R9_3247HUM_TYPH_WA_12_R9_3278 OV_TYPH_WA_10_R9_3249 BOV_TYPH_WA_10_R9_32490.2187
1 0.1638 HUM_TYPH_NY_12_R9_0437 HHUUMM_T_TYYPPHH_W_NAY__0192__RR99__30247327 H
HUUMM__TTYYPPHH__NWYA__1122__RR99__03423778 HUHMU_MTY_PTHYP_NHY_W_1A2__1R29__R094_337278
BOV_TYPH_WA_11_R9_3251 HUM_TYPH_WA_09_R9_3272 HUM_TYPH_WA_09_R9_3272
0.2315 BOV_TYPH_WA_10_R9_3249 BBOOVV_T_TYYPPHH_N_WY_A1_11_0R_8R_98_3382749
BBOOVV__TTYYPPHH__WWAA__1101__RR99__33224591 BOBVO_TVY_PTHYP_WH_AW_1A0__1R19__R392_439251
HUM_TYPH_WA_12_R9_3278 BHOUVM_T_YTPYHP_HN_YW_A1_11_2R_8R_98_2372478 HBOUMV__TTYYPPHH__NWYA__1112__RR89__83328778 HUBMO_VT_YTPYHP_HW_AN_Y1_21_1R_9R_83_2873887
1 HUM_TYPH_NY_11_R8_80811 BOV_TYPH_NY_11_R8_8274 BOV_TYPH_NY_11_R8_8274HUM_TYPH_WA_09_R9_3272 HHUUMM_T_TYYPPHH_N_WY_A1_00_9R_8R_96_0382972 HHUUMM__TTYYPPHH__WNYA__0191__RR98__38207821 HUHMU_MTY_PTHYP_WH_AN_Y09__1R1_9R_382_782081
HUM_TYPH_NY_10_R8_5213
BOV_TYPH_WA_11_R9_3251 Figure	3.	Phylogenetic	trees	of	S.	Typhimurium,	Newport,	and	DublinHBUOM	Vi__TsTYYPoPHH__lWWaAA__t0191e__RR9s9__3	322c7511onstructed	usiB
HOUVM__ngT
TYYPP	BH
H__EW
NAYA_
_1110__R8_6089SR9T_32	51 BO
HVU_TMY_PTHY_PWHA__N1Y1__1R09__R382_561089
1 HUM_TYPH_NY_10_R8_5213 HUM_TYPH_NY_10_R8_5213
BOV_TYPH_NY_11_R8_8387 HBUOMV__TTYYPPHH__NNYY__1111__RR88__88133827 BHOUVM__TTYYPPHH__NWYA_1_10_9R_R8_98_3382771 BOHVU_TMY_PTHY_PNHY__W11A__R098__R8398_73271
1 BOV_TYPH_NY_08_R8_0865
BOV_TYPH_NY_11_R8_8274 BBOOVV_T_TYYPPHH_N_NYY_1_21_1_R8_8274 BHOUVM__TTYYPPHH__NNYY__1111__RR88__88217342 BOHVU_TMY_PTHY_PNHY__N1Y1__1R18__R882_784132R8_9815
0.1945 BOV_TYPH_NY_08_R8_0865BOV_TYPH_NY_12_R8_9832 BOV_TYPH_NY_08_R8_0865
HUM_TYPH_NY_11_R8_8081 HUM_TYPH_NY_11_R8_8081 HBOUMV__TTYYPPHH__NNYY__1121__RR88__8908811BOV_TYPH_NY_12_R8_9801 5 HUBMO_VT_YTPYHP_HN_YN_Y11__1R2_8R_880_891815
1 0.2153
0.5415 HUM_TYPH_NY_10_R8_6089
HHUUMM_T_TYYPPHH_N_NYY_1_21_0R_R9_80_0640289 HBOUMV__TTYYPPHH__NNYY__1120__RR88__69088392 HUBMO_VT_YTPYHP_HN_YN_Y10__1R2_8R_680_899832
HUM_TYPH_NY_11_R8_8073 BOV_TYPH_NY_12_R8_9801 BOV_TYPH_NY_12_R8_9801
1 HUM_TYPH_NY_10_R8_5213 BHOUVM_T_YTPYHP_HW_NAY_0_81_0R_R9_83_2542413 HHUUMM__TTYYPPHH__NNYY__1102__RR89__50201432 HUHMU_MTY_PTHYP_NHY_N_1Y0__1R28__R592_103042
HUM_TYPH_WA_09_R9_3271 BOV_TYPH_WA_08_R9_3243 HUM_TYPH_NY_11_R8_8073HUM_TYPH_WA_09_R9_3271 HUM_TYPH_WA_09_R9_3271 HUHMU_MTY_PTHYP_WH_AN_Y09__1R1_9R_382_781073
BOV_TYPH_WA_09_R9_3246 BOV_TYPH_WA_08_R9_3244 BOV_TYPH_WA_08_R9_3244
HUM_TYPH_NY_11_R8_8132 HHUUMM_T_TYYPPHH_W_NAY__1101__RR98__38217342 HBOUMV__TTYYPPHH__WNYA__1018__RR89__83123423 HUBMO_VT_YTPYHP_HN_YW_1A1__0R88__R891_332243
BOV_TYPH_NY_11_R8_9118
BOV_TYPH_NY_08_R8_0865 BOV_TYPH_WA_09_R9_32460.9652 BBOOVV_T_TYYPPHH_W_NAY__0098__RR98__30284655 BOV_TYPH_NY_08_R8_0865 BOBVO_TVY_PTHYP_NHY_W_0A8__0R98__R098_635246HUM_TYPH_WA_10_R9_3274 HUM_TYPH_WA_10_R9_3274
1 BOV_TYPH_NY_12_R8_9815
BBOOVV_T_TYYPPHH_W_NAY__1102__RR98__39284185 BBOOVV__TTYYPPHH__NNYY__1121__RR88__99811158 BOBVO_TVY_PTHYP_NHY_N_1Y2__1R18__R988_195118
0.6682 BOV_TYPH_WA_12_R9_32520.299 BOV_TYPH_WA_09_R9_3245
1BOV_TYPH_NY_12_R8_9832 HBUOMV__TTYYPPHH__NNYY__0182__RR88__09788342 BOV_TYPH_NY_12_R8_9832 BO
BVO_TVY_PTHYP_NHY_W_1A2__0R98__R998_332245
BOV_TYPH_WA_10_R9_3248
HUM_TYPH_NY_08_R8_0764 BOV_TYPH_WA_10_R9_3248
BOV_TYPH_NY_12_R8_9801 S.	Typhimurium HBUOMV__TTYYPPHH__WNAY__1102__RR98__39287031 BBOOVV__TTYYPPHH__NWYA__1122__RR89__93820512 BOBVO_TVY_PTHYP_NHY_W_1A2__1R28__R998_031252
HUM_TYPH_NY_08_R8_0784
HUM_TYPH_NY_12_R9_0042 HHUUMM_T_TYYPPHH_W_NAY__1112__RR99__30207452 HUM_TYPH_NY_12_R9_0042 HUHMU_MTY_PTHYP_NHY_N_1Y2__0R89__R080_402784
1 HUM_TYPH_WA_11_R9_3276 HUM_TYPH_NY_08_R8_0764 HUM_TYPH_NY_08_R8_0764
HUM_TYPH_NY_11_R8_8073 HHUUMM_T_TYYPPHH_W_NAY__1121__RR98__38207773 HHUUMM__TTYYPPHH__NWYA__1110__RR89__83027733 HUHMU_MTY_PTHYP_NHY_W_1A1__1R08__R890_733273
HUM_TYPH_WA_08_R9_3269 HUM_TYPH_WA_11_R9_3275
BOV_TYPH_WA_08_R9_3244 BOV_TYPH_WA_08_R9_3244 BOHVU_TMY_PTHY_PWHA__W0A8__1R19__R392_434275
1 HUM_TYPH_WA_08_R9_3270
BHOUVM__TTYYPPHH__WWAA__0181__RR99__33224746 HUM_TYPH_WA_11_R9_3276
BOV_TYPH_WA_08_R9_3243 BOV_TYPH_WA_08_R9_3243 BHOUVM__TTYYPPHH__WWAA__0182__RR99_32770.9964 0.1794 _3243 BOHVU_TMY_PTHY_PWHA__W0A8__1R29__R392_433277
HUM_TYPH_WA_08_R9_3269
BOV_TYPH_WA_09_R9_3246 HUM_TYPH_WA_08_R9_3269BOV_TYPH_WA_09_R9_32460.2501 BHOUVM__TTYYPPHH__WWAA__0098__RR99__33224760 BOV_TYPH_WA_09_R9_3246HUM_TYPH_WA_08_R9_3270
1 C* HUM_TYPH_WA_10_R9_3274 HUM_TYPH_WA_10_R9_3274 HUM_TYPH_WA_10_R9_3274 HUM_TYPH_WA_10_R9_32740.1539
1 BOV_TYPH_NY_11_R8_9118 BOV_TYPH_NY_11_R8_9118 BOV_TYPH_NY_11_R8_9118 BOV_TYPH_NY_11_R8_9118
BOV_TYPH_WA_09_R9_3245 BOV_TYPH_WA_09_R9_3245 BOV_TYPH_WA_09_R9_3245 BOV_TYPH_WA_09_R9_3245
1 BOV_TYPH_WA_10_R9_3248 BOV_TYPH_WA_10_R9_3248 BOV_TYPH_WA_10_R9_3248 BOV_TYPH_WA_10_R9_3248
1
BOV_TYPH_WA_12_R9_3252 BOV_TYPH_WA_12_R9_3252 BOV_TYPH_WA_12_R9_3252 BOV_TYPH_WA_12_R9_3252
HUM_TYPH_NY_08_R8_0784 HUM_TYPH_NY_08_R8_0784 HUM_TYPH_NY_08_R8_0784 HUM_TYPH_NY_08_R8_0784
1
HUM_TYPH_NY_08_R8_0764 HUM_TYPH_NY_08_R8_0764 HUM_TYPH_NY_08_R8_0764 HUM_TYPH_NY_08_R8_0764
HUM_TYPH_WA_10_R9_3273 HUM_TYPH_WA_10_R9_3273 HUM_TYPH_WA_10_R9_3273 HUM_TYPH_WA_10_R9_3273
0.9986
B* 1 HUM_TYPH_WA_11_R9_3275 HUM_TYPH_WA_11_R9_3275 HUM_TYPH_WA_11_R9_3275 HUM_TYPH_WA_11_R9_32751
1 HUM_TYPH_WA_11_R9_3276 HUM_TYPH_WA_11_R9_3276 HUM_TYPH_WA_11_R9_3276 HUM_TYPH_WA_11_R9_3276
1 HUM_TYPH_WA_12_R9_3277 HUM_TYPH_WA_12_R9_3277 HUM_TYPH_WA_12_R9_3277 HUM_TYPH_WA_12_R9_3277
HUM_TYPH_WA_08_R9_3269 HUM_TYPH_WA_08_R9_3269 HUM_TYPH_WA_08_R9_3269 HUM_TYPH_WA_08_R9_3269
1
HUM_TYPH_WA_08_R9_3270 HUM_TYPH_WA_08_R9_3270 HUM_TYPH_WA_08_R9_3270 HUM_TYPH_WA_08_R9_3270
8.0E-6
Figure 2.3: Phylogenetic treeSamoplfe S. Typhimurium isolates constructed using BEAST. Gene groups for AMR genes detected
in each genome sequence at more than 50% coverage and 75% identity using BLAST (blastn) and ARG-ANNOT are
indicated in green. Antimicrobials to which each isolate is resistant are indicated in red, and intermediate resistance to
an antimicrobial is indicated in orange. Plasmid replicons detected in each genome sequence using PlasmidFinder are
indicated in purple. Branch lengths are reported in substitutions per site, while posterior probabilities are reported at
tree nodes.
Figure 4. Phylogenetic tree of S. Typhimurium isolates constructed using BEAST. Gene groups for AMR genes 
Figure 4. dPehtyelcotgeedn einti ce atrcehe  osef qSu. Tenypchei matu mriuomre  itshoalante 5s 0c%on sctoruvcetreadg ues ianngd B 7E5A%S Tid. AenMtiRty g uesniensg d eBteLcAteSd Tin ( ebalcahs tsne)q aunendc Ae RatG m-oAreN tNhaOn T5 0a%re c overage 
and 75%in iddiecnatitteyd u isnin ggr eBeLnA. SATn (tibmlaisctnro) baniadl sA tRoG w-AhNicNh OeaTc ahr ei siondlaictea tiesd  riens gisrteaennt.  Aarnet iimnidcircoabtieadls  itno  rwehdi,c wh eitahc hin itseorlamtee disi aretesi srteasnits taarne cined tioc ated in 
red, wainth a innttiemrmicerdoiabtiea rle isnisdtiacnactee dto  bayn  oanratinmgiec.r oPbliaasl miniddi craetpedli cboyn osr adnegtee.c Ptelads mini de arecphl isceoqnus ednecteec tuesdi ning  ePalcahs smeqiduFenincde eursianrge  PilnadsimciadtFeidn dine r are 
ipnudripcalete. dB irna pnucrhp llee.n Bgrtahnsc ha rlee nrgetphos ratreed r ienp osrutebds tiintu stuibosntsit upteior nssi tpee,r w sihteil, ew phoilset eproisoter rpiorro pbraobbialibtiileitsi easr ear ree rpeoprotretedd a att  ttrreeee  nnooddeess.. 
a a c ( 3 ) − I I a
a a c ( 3 ) − I I a
a a c ( 3 ) - I I a
d f r A
d f r A
d f r A
s u l 3
s u l 3
s u l 3
o q x A
o q x A
o q x A
o q x B g b
o q x B g b
o q x B g b
d f r A 1
d f r A 1
d f r A 1
q n r S
q n r S
q n r S
c a t A 2
c a t A 2
c a t A 2
C T X − M − 1
C T X − M − 1
C T X - M - 1
o x y
o x y
o x y
a a c − a a d
a a c − a a d
a a c - a a d
T e t ( D )
T e t ( D )
T e t ( D )
S H V − O K P − L E N
S H V − O K P − L E N
S H V - O K P - L E N
d f r A 1 9
d f r A 1 9
d f r A 1 9
a a c ( 6 ) − I I c
a a c ( 6 ) − I I c
a a c ( 6 ) - I I c
q n r B
q n r B
q n r B
e r e A
e r e A
e r e A
T e t ( C )
T e t ( C )
T e t ( C )
T e t ( B )
T e t ( B )
T e t ( B )
O X A − 1
O X A − 1
O X A - 1
c a t B x
c a t B x
c a t B x
a a c ( 3 ) − I v a
a a c ( 3 ) − I v a
a a c ( 3 ) - I v a
a r r
a r r
a r r
c m l A
c m l A
c m l A
a p h ( 4 ) − I a
a p h ( 4 ) − I a
a p h ( 4 ) - I a
s u l 2
s u l 2
s u l 2
s t r B
s t r B
s t r B
s t r A
s t r A
s t r A
C M Y
C M Y
C M Y
T e t ( A )
T e t ( A )
T e t ( A )
T e t ( R )
T e t ( R )
T e t ( R )
s u l 1
s u l 1
s u l 1
a a d A
a a d A
a a d A
f l o R
f l o R
f l o R
C A R B
C A R B
C A R B
T e t ( R G )
T e t ( R G )
T e t ( R G )
T e t ( G )
T e t ( G )
T e t ( G )
T E M − 1 D
T E M − 1 D
T E M - 1 D
a p h ( 3 ' ' ) − I a
a p h ( 3 ' ' ) − I a
a p h ( 3 ’ ’ ) - I a
a a c ( 6 ) − I a a
a a c ( 6 ) − I a a
a a c ( 6 ) - I a a
P e n i c i l l i n _ B i n d i n g _ P r o t e i n _ E c o l i
P e n i c i l l i n _ B i n d i n g _ P r o t e i n _ E c o l i
P B P
S X T
S X T
S X T
C I P R O
C I P R O
C I P R O
N A L
N A L
N A L
A M C
A M C
A M C
F O X
F O X
F O X
T I O
T I O
T I O
C R O
C R O
C R O
C H L
C H L
C H L
A M P
A M P
A M P
T E T
T E T
T E T
S T R
S T R
S T R
S X S X
S X
I n c I 1
I n c I 1
I n c I 1
I n c A / C 2
I n c A / C 2
I n c A / C 2
I n c P
I n c P
I n c P
C o l 8 2 8 2
C o l 8 2 8 2
C o l 8 2 8 2
C o l 1 5 6
C o l 1 5 6
C o l 1 5 6
I n c Q 1
I n c Q 1
I n c Q 1
I n c H I 2
I n c H I 2
I n c H I 2
I n c H I 2 A
I n c H I 2 A
I n c H I 2 A
I n c I 2
I n c I 2
I n c I 2
C o l ( B S 5 1 2 )
C o l ( B S 5 1 2 )
C o l ( B S 5 1 2 )
I n c X 1
I n c X 1
I n c X 1
I n c F I B ( K )
I n c F I B ( K )
I n c F I B ( K )
I n c F I B ( A P 0 0 1 9 1 8 )
I n c F I B ( A P 0 0 1 9 1 8 )
F I B ( A P 0 0 1 9 1 8 )
I n c F I A ( H I 1 )
I n c F I A ( H I 1 )
I n c F I A ( H I 1 )
I n c H I 1 B ( R 2 7 )
I n c H I 1 B ( R 2 7 )
I n c H I 1 B ( R 2 7 )
I n c H I 1 A
I n c H I 1 A
I n c H I 1 A
I n c F I B ( S )
I n c F I B ( S )
I n c F I B ( S )
I n c F I I ( S )
I n c F I I ( S )
I n c F I I ( S )
C o l p V C
C o l p V C
C o l p V C
C o l R N A I
C o l R N A I
C o l R N A I
MANOVA found significant associations between AMR genes and either source
or state after correcting for multiple testing (P > 0.05) (Table 2.2), Fisher’s ex-
act test indicated that the IncI1 replicon was more commonly detected in New
York State isolates than in Washington State isolates (Table 2.5) (P < 0.05, after
Holm-Bonferroni correction).
Table 2.5: Odds ratios for association of AMR gene groups, AMR phenotype,
and plasmid replicons with source or location (only associations with P values
of < 0.05 are shown).a
Characteristic Serotype Source/location favored by OR Uncorrected P value
OR
Source
Gene
aac(3)-IIa Typhimurium Human Infinity (only in humans) 0.009
floR Typhimurium Human 5.42 0.021
aph(3”)-Ia Newport Bovine 0.0831 0.019
Antimicrobial
CHL Typhimurium Human 5.42 0.021
NAL Typhimurium Human Infinity (only in humans) 0.022
SXT Typhimurium Human Infinity (only in humans) 0.004
TET Typhimurium Human Infinity (all human isolates) 0.005
STR Dublin Human 9.28 0.030
Plasmid
IncA/C2 Typhimurium Human 8.18 0.048
ColpVC Newport Bovine 0 (found in all bovine iso- 0.024
lates)
Geographic location
Gene
blaTEM-1D Typhimurium WA 4.60 0.045
aph(3”)-Ia Newport NY 0.172 0.049
aadB Dublin WA Infinity (found only in WA) 0.005
cmlA Dublin WA Infinity (found only in WA) 0.005
Antimicrobial
NAL Typhimurium WA Infinity (found only in WA) 0.020
STR Typhimurium WA 8.51 0.042
SX Typhimurium WA 10.8 0.019
SXT Typhimurium WA 9.36 0.042
STR Dublin NY 0.052 0.008
Plasmid
IncI1 Typhimurium NY 0.0602 0.003
IncP Typhimurium WA Infinity (found only in WA) 0.046
IncFII(S) Dublin NY 0 (present in all NY iso- 0.001
lates)
aAn odds ratio (OR) of infinity or 0 includes a short statement (in parentheses) that indicates which source or location was the driver for
that OR (e.g., only in humans indicates that the given gene/phenotype/plasmid replicon was found in only human isolates and in none
of the bovine isolates). WA, Washington State; NY, New York State. Values in boldface were significant (P < 0.05) after a Holm-Bonferroni
correction was applied to the respective analysis.
At the phenotypic level, the number of antimicrobials to which S. Ty-
phimurium isolates were resistant ranged from 1 to 11 (Figure 2.3). The most
common phenotypic resistance profiles for S. Typhimurium were AMC-AMP-
CHL-SX-STR-TET and AMC-AMP-FOX-TIO-CRO, which were found in 27%
and 11% of the isolates, respectively. When ANOSIM and PERMANOVA
48
were used as metrics to assess clustering, no significant differences between
bovine and human clusters or between New York and Washington State clusters
formed by phenotypic resistance/susceptibility profiles were detected (P > 0.05
after a Holm-Bonferroni correction [Table 2.2]). However, when Fisher’s ex-
act test was used to test for differences at the individual antimicrobial level,
resistance to SXT was seen only in human-associated S. Typhimurium isolates
(P < 0.05 after a Holm-Bonferroni correction [Table 2.5]). In addition, all human-
associated S. Typhimurium isolates were resistant to TET, while only 65% of
bovine isolates were resistant to TET (P < 0.05 after a Holm-Bonferroni correc-
tion [Table 2.5]).
In addition to possessing the most diverse genotypic and phenotypic AMR
profiles, S. Typhimurium was the only serotype in which resistance to NAL (a
quinolone) and CIP (a fluoroquinolone) was observed. All isolates that were
resistant to NAL and CIP originated from human clinical samples in Wash-
ington State (Figure 2.3). qnr genes, which are plasmid-mediated quinolone
resistance (PMQR) genes, were detected in the sequences of the two S. Ty-
phimurium isolates that showed intermediate resistance to NAL (Table 2.6). For
each of the four NAL-resistant isolates, point mutations were identified in the
quinolone resistance-determining region (QRDR) of gyrA (Table 2.6). These nu-
cleotide changes resulted in non-synonymous amino acid changes (Asp87Asn,
Asp87Tyr, and Ser83Tyr) that have been previously observed in quinolone-
resistant Salmonella isolates (Cloeckaert and Chaslus-Dancla 2001). In addition,
three of the four NAL-resistant isolates possessed oqxA and oqxB (Table 2.6).
These genes encode the OqxAB multidrug efflux pump, which confers resis-
tance to multiple agents, including low-level resistance to quinolones (Andres
et al. 2013; Hansen et al. 2007).
49
Table 2.6: S. Typhimurium isolates with qnr and/or oqx genes and/or point
mutations in gyrA and/or gyrB and/or parC.a
S/I/R status Point mutationb detected in:
Isolate NAL CIP qnr and/or gyrA gyrB parC
oqx gene(s)
detected
BOV TYPH NY 12 R8 9801 S S None 1641: T→ G WT WT
BOV TYPH NY 12 R8 9815 S S None 1641: T→ G WT WT
BOV TYPH NY 12 R8 9832 S S None 1641: T→ G WT WT
HUM TYPH NY 11 R8 8073 S S None WT 2202: G→ A WT
HUM TYPH NY 12 R9 0042 S S None WT 2202: G→ A WT
HUM TYPH WA 08 R9 3269 I S qnrS WT WT 1713: C→ T
HUM TYPH WA 08 R9 3270 R I oqxA, oqxB Asp87Tyr 259: G→ T WT 1713: C→ T
HUM TYPH WA 09 R9 3271 S S None WT 759: A→ G WT
HUM TYPH WA 10 R9 3273 R S oqxA, oqxB Ser83Tyr 248: C→ A WT 1713: C→ T
HUM TYPH WA 10 R9 3274 I S qnrB WT WT WT
HUM TYPH WA 11 R9 3275 R S oqxA, oqxB Asp87Asn 259: G→ A WT 1713: C→ T
HUM TYPH WA 11 R9 3276 R S None Asp87Asn 259: G→ A WT 1713: C→ T
HUM TYPH WA 12 R9 3277 S S None WT WT 1713: C→ T
aNo point mutations were detected in parE.
bFor gyrA, gyrB, and parC, synonymous point mutations resulting in no amino acid change are shown as position: nt→ nt (e.g., 259: G→
A); amino acid substitutions are formatted as ”reference amino acid:position:alternate amino acid”; WT, gene with no mutations.
2.4.4 S. Newport phylogeny, AMR genes, AMR phenotypes,
and plasmid replicons
Among the 19 S. Newport isolates from New York State, 11 clustered into a
single, well-supported clade (posterior probability, 1) (Figure 2.4). The inclusion
of an additional isolate from New York State yielded a 12-isolate clade with a
posterior probability of 0.9574.
The AMR gene profiles of the 32 S. Newport isolates showed a high degree
of similarity, with only 5 different genotypic profiles (Figure 2.4). The two most
common genotypic profiles, i.e., aac(6)-Iaa floR CMY sul2 tet(A) aph(3”)-Ia strB
strA tet(R) PBP gene and aac(6)-Iaa floR CMY sul2 tet(A) strB strA tet(R) PBP
gene, were detected in 66% and 19% of S. Newport genomes, respectively. At
the individual gene level, genes belonging to the aac(6)-Iaa, CMY, strA, strB, sul2,
tet(A), tet(R), and PBP gene groups were detected in the sequences of all 32 iso-
lates (Table 2.1). All S. Newport isolates had identical copies of each of these
genes except for CMY, as a truncated version of the gene was detected in isolate
50
51
AMR Genes Phenotypic AMR Plasmid Replicons
Plasmid Replicons
AMR Genes Phenotypic AMR
HUM_NEWP_NY_08_R8_2947 HUM_NEWP_NY_08_R8_2947
HUM_NEWP_NY_08_R8_2947
HUM_NEWP_NY_09_R8_4995
HUM_NEWP_NY_08_R8_2947 HUM_NEWP_NY_09_R8_4995HUM_NEWP_NY_08_R8_2947 HUM_NEWP_NY_09_R8_4995HUM_NEWP_WA_12_R9_3268 HUHMU_NME_WNEPW_NPY__W0A8__R128__R2994_73268
BOV_NEWP_WA_10_R9_3241 BOV_NEWP_WA_10_R9_3241 HUM_NEWP_NY_08H_URM8__2N94E7WP_WA_12_R9_3268
BOV_NEWP_NY_09_R8_4007
0.9954 BOV_NEWP_NY_09_R8_4007
BOV_NEWP_WA_10_R9_3241
HUMH_UNMEW_NPE_WWAP__1N1Y__R099__3R286_54995 HUM_NEWP_WA_11_R9_3265 BOV_NEWP_NY_09_R8_4007HUM_NEWP_NY_09_R8_4995 HUM_NEWP_WA_10_R9_3264 HUHMU_NME_WNEPW_NPY__W0A9__R108__R4999_53264 HUM_NEWP_NY_09H_URM8__4N99E5WP_WA_11_R9_3265
0.9995 BOV_NEWP_WA_10_R9_3240 BOV_NEWP_WA_10_R9_3240 HUM_NEWP_WA_10_R9_3264HUM_NEWP_WA_12_R9_3267
0.8462 HUM_NEWP_WA_12_R9_3267 BOV_NEWP_WA_10_R9_3240HUM_NEWP_WA_12_R9_3268 HUMH_UNMEW_NPE_WWAP__0W8A_R_192__3R2599_3268 HHUM_NEWP_WA_08_R9_3260 HUMU_NME_WNEPW_WPA_W_1A2__0R89__R392_638259 HUM_NEWP_WA_12_R9_3267HUM_NEWP_WA_08_R9_3260 HUM_NEWP_WA_12H_URM9__3N2E68WP_WA_08_R9_3259
HUM_NEWP_WA_09_R9_3261 HUM_NEWP_WA_09_R9_3261 HUM_NEWP_WA_08_R9_3260
BOVB_NOEVWP_NY_08_R8_2690BOV_NEWP_WA_10_R9_3241 HUM_NE_WNPE_WNPY__0WA_1 BOV_NEWP_NY_08_R8_2690 HUM_NEWP_WA_09_R9_32618_R80__1R5998_3241 BOVH_UNME_WNPE_WWPA__N1Y0__0R89__R382_411598 BOV_NEWP_NY_08_R8_2690
HUM_NEWP_WA_09_R9_3254 BOV_NEWP_WA_10_R9_3241HUM_NEWP_WA_09_R9_3254
HUM_NEWP_NY_09_R8_4908 HUM_NEWP_NY_08_R8_1598HUM_NEWP_NY_09_R8_4908
BOV_NEWP_NY_09_R8_4007 BOVB_NOEVW_NPE_WWAP__1N2Y__R099__3R284_24007 HUM_NEWP_WA_09_R9_3254BOVB_ONVE_WNEPW_NPY__W0A9__R128__R4090_73242HUM_NEWP_WA_10_R9_3263 HUM_NEWP_WA_10_R9_3263 BOV_NEWP_NY_09H_RU8M_4_0N0E7WP_NY_09_R8_4908
HUM_NEWP_NY_08_R8_0802
HUM_NEWP_NY_08_R8_0802 BOV_NEWP_WA_12_R9_3242
0.9778 HUM_NEWP_WA_11_R9_3266 HUM_NEWP_WA_11_R9_3266 HUM_NEWP_WA_10_R9_3263HUM_NEWP_WA_11_R9_3265 BOVH_NUEMW_NP_ENWYP__0W8_AR_81_12_8R793_3265 HUBMO_VN_ENWEPW_WP_AN_Y11__0R8_9R_382_625873 HUM_NEWP_NY_08_R8_0802
BOV_NEWP_NY_08_R8_0830 HUM_NEWP_WA_11_R9_3265
1 BOV_NEWP_NY_08_R8_0830 HUM_NEWP_WA_11_R9_3266BOV_NEWP_NY_11_R8_8188 BOV_NEWP_NY_11_R8_8188 BOV_NEWP_NY_08_R8_2873
HUM_NEWP_WA_10_R9_3264 HUMH_UNMEW_NPE_WNYP__0W9_AR_81_04997HUM_NEWP_NY_11_R8_8_6R894_3264 HUHUMHMU_NME_WNEPW_WPA_NY_09_R8_4997_NEWP_N_1Y0__1R19__R382_684684 HUM_NEWP_WA_10BB_OV_NEWP_NY_08_R8_0830ORV9__3N2E6W4 P_NY_11_R8_8188
0.9273 BOV_NEWP_NY_10_R8_4821 BOV_NEWP_NY_10_R8_4821BOV_NEWP_NY_10_R8_5045 HUM_NEWP_NY_09_R8_4997
BOV_NEWP_WA_10_R9_3240 BOVB_NOEVW_NEWP_WA_10_R9_3240
BOV_NEWP_NY_10_R8_5045
P_NY_09_R8_4157 HUM_NEWP_NY_11_R8_8684
BOV_NEWP_NY_09_R8_4157
BOV_NEWP_NY_09_R8_4108 BOV_NEWP_WA_10_R9_3240 BOV_NEWP_WA_10B_ORV9__3N2E40WP_NY_10_R8_4821
BOV_NEWP_NY_09_R8_4108
BOV_NEWP_NY_10_R8_5931 BOV_NEWP_NY_10_R8_5045
BOV_NEWP_NY_10_R8_5931
BOVH_NUEMW_NP_ENWYP_11_R8_8631HUM_NEWP_WA_12_R9_3267 _WA_12_R9_3267
BOV_NEWP_NY_09_R8_4157
HUBMO_VN_ENWEPW_WP_AN_Y12__1R1_R8_8631HUM_NEWP_NY_08_R8_29260.2447 HUM_NEWP_NY_089__R382_627926 HUM_NEWP_WA_12B_ORV9__3N2E6W7 P_NY_09_R8_41080.9978 BOV_NEWP_NY_10_R8_5931
BOV_NEWP_NY_11_R8_8631
HUM_NEWP_WA_08_R9_3259 HUM_NEWP_WA_08_R9_3259 HUM_NEWP_WA_08_R9_3259 HUM_NEWP_WA_08H_URM9__3N2E59WP_NY_08_R8_2926
0.182
1 0.1597 HUM_NEWP_WA_08_R9_3260 HUM_NEWP_WA_08_R9_3260 HUM_NEWP_WA_08_R9_3260 HUM_NEWP_WA_08_R9_3260
0.2011
HUM_NEWP_WA_09_R9_3261 HUM_NEWP_WA_09_R9_3261 HUM_NEWP_WA_09_R9_3261 HUM_NEWP_WA_09_R9_3261
BOV_NEWP_NY_08_R8_2690 BOV_NEWP_NY_08_R8_2690 BOV_NEWP_NY_08_R8_2690 BOV_NEWP_NY_08_R8_2690
0.9968
0.1722 0.9278 HUM_NEWP_NY_08_R8_1598 HUM_NEWP_NY_08_R8_1598 HUM_NEWP_NY_08_R8_1598 HUM_NEWP_NY_08_R8_1598
HUM_NEWP_WA_09_R9_3254 HUM_NEWP_WA_09_R9_3254 HUM_NEWP_WA_09_R9_3254 HUM_NEWP_WA_09_R9_3254
HUM_NEWP_NY_09_R8_4908 HUM_NEWP_NY_09_R8_4908 HUM_NEWP_NY_09_R8_4908 HUM_NEWP_NY_09_R8_4908
0.984 BOV_NEWP_WA_12_R9_3242 BOV_NEWP_WA_12_R9_3242 BOV_NEWP_WA_12_R9_3242 BOV_NEWP_WA_12_R9_3242
1 HUM_NEWP_WA_10_R9_3263 HUM_NEWP_WA_10_R9_3263 HUM_NEWP_WA_10_R9_3263 HUM_NEWP_WA_10_R9_3263
0.5125
0.2092
1 HUM_NEWP_NY_08_R8_0802 HUM_NEWP_NY_08_R8_0802 HUM_NEWP_NY_08_R8_0802 HUM_NEWP_NY_08_R8_0802
HUM_NEWP_WA_11_R9_3266 HUM_NEWP_WA_11_R9_3266 HUM_NEWP_WA_11_R9_3266 HUM_NEWP_WA_11_R9_3266
BOV_NEWP_NY_08_R8_2873 BOV_NEWP_NY_08_R8_2873 BOV_NEWP_NY_08_R8_2873 BOV_NEWP_NY_08_R8_2873
BOV_NEWP_NY_08_R8_0830 BOV_NEWP_NY_08_R8_0830 BOV_NEWP_NY_08_R8_0830 BOV_NEWP_NY_08_R8_0830
BOV_NEWP_NY_11_R8_8188 BOV_NEWP_NY_11_R8_8188 BOV_NEWP_NY_11_R8_8188 BOV_NEWP_NY_11_R8_8188
0.9574 0.1196
HUM_NEWP_NY_09_R8_4997 HUM_NEWP_NY_09_R8_4997 HUM_NEWP_NY_09_R8_4997
0.0243 HUM_NEWP_NY_09_R8_4997
HUM_NEWP_NY_11_R8_8684 HUM_NEWP_NY_11_R8_8684 HUM_NEWP_NY_11_R8_8684
0.1215 HUM_NEWP_NY_11_R8_8684
1 0.0759 BOV_NEWP_NY_10_R8_4821 BOV_NEWP_NY_10_R8_4821 BOV_NEWP_NY_10_R8_4821 BOV_NEWP_NY_10_R8_4821
BOV_NEWP_NY_10_R8_5045 BOV_NEWP_NY_10_R8_5045 BOV_NEWP_NY_10_R8_5045
0.2206 0.1191 BOV_NEWP_NY_10_R8_5045
BOV_NEWP_NY_09_R8_4157 BOV_NEWP_NY_09_R8_4157 BOV_NEWP_NY_09_R8_4157 BOV_NEWP_NY_09_R8_4157
1 BOV_NEWP_NY_09_R8_4108 BOV_NEWP_NY_09_R8_4108 BOV_NEWP_NY_09_R8_4108 BOV_NEWP_NY_09_R8_4108
BOV_NEWP_NY_10_R8_5931 BOV_NEWP_NY_10_R8_5931 BOV_NEWP_NY_10_R8_5931
1 1 BOV_NEWP_NY_10_R8_5931
BOV_NEWP_NY_11_R8_8631 BOV_NEWP_NY_11_R8_8631 BOV_NEWP_NY_11_R8_8631 BOV_NEWP_NY_11_R8_8631
HUM_NEWP_NY_08_R8_2926 HUM_NEWP_NY_08_R8_2926 HUM_NEWP_NY_08_R8_2926 HUM_NEWP_NY_08_R8_2926
5.0E-7
Figure 2.4: Phylogenetic tree of S. Newport isolates constructed using BEAST. Gene groups for AMR genes detected
in each genome sequence at more than 50% coverage and 75% identity using BLAST (blastn) and ARG-ANNOT are
indicated in green. Antimicrobials to which each isolate is resistant are indicated in red, and intermediate resistance to
an antimicrobiaFl iigsurien 6d. iPchaytleodgenientico trraeen ogfe S.. NPelwapsomrti idsolraetepsl ciocnosntrsucdteedt uescinteg dBEiAnSeTa. AchMRg egnenoems deetesceteqdu ien neacceh usesqiunegnceP alta smidFinder are
indicated inFpiguurprel e6.m. PoBrhrey athlnoacngh e5n0l%eetn iccgo vttrheersaeg aoer fae nSd.r  e7N5pe%ow ripdteeondrtti tiiyns ouslsiauntgeb sBs tcLioAtnuSstTti ro(ubnclastsetpdn) e uarsnidsn iAgteR B,GEw-AAhNSiNlTeO. TGp oaernset eien grdiriocoaruteppds r ifnoo bgrr aeAbeMni.l iRti egsenaerse reported at
tree nodes. deteActnetdim iincr oebaicahls  stoe qwuheicnhc eea acht  misoolarete  tihs arens i5st0a%nt  acroe vinedriacgatee da nind r e7d5, %wi tihd ienntetrimtye duisaitne gre BsisLtaAncSeT to ( abnl aasnttnim) iacnrodb iAalR G-
ANiNndOicTat eadr eb yin odraicngae. Pllengths artee rde 
ainsm girde reenpl.i cAonnsti dmetected in each sequence using PlasmidFinder are indicated in purple. Branch 
intermediate resistance tpoo ratned a inn tsiumbsictirtuotbio
icrobia
inasl  pinerd s
l
ii
s 
ctea,
t ow wtedh il
he ipchby oos
 teach isolatraenrigoer .p Prolbaasbmi
eli is itdie s
re
r eapr
seil 
s
ir
t
ce
a
op
n
no
tr taere is dde atte 
ntrdeeic naoted in red, with cted inde es.ach sequence 
using PlasmidFinder are indicated in purple. Branch lengths are reported in substitutions per site, while 
posterior probabilities are reported at tree nodes. 
d f r A
d f r A
d f r A
s u l 1
s u l 1
s u l 1
a a d A
a a d A
a a d A
a p h ( 3 ' ' ) − I a
a p h ( 3 ’ ’ ) - I a
a p h ( 3 ' ' ) − I a
f l o R
f l o R f l o R
P e n i c i l l i n _ B i n d i n g _ P r o t e i n _ E c o l i
P e n i c i l l i n _ B i n d i n g _ P r o t e i n _ E c o l i
P B P
T e t ( R )
T e t ( R )
T e t ( R )
s t r A s t r A
s t r A
s t r B
s t r B
s t r B
T e t ( A )
T e t ( A )
T e t ( A )
s u l 2
s u l 2
s u l 2
a a c ( 6 ) − I a a
a a c ( 6 ) − I a a
a a c ( 6 ) - I a a
C M Y
C M Y
C M Y
S X T
S X T
S X T
C H L C H L
C H L
T I O T I O
T I O
C R O
C R O
C R O
T E T
T E T
T E T
S T R
S T R
S T R
S X
S X
S X
F O X
F O X
F O X
A M C
A M C
A M C
A M P
A M P
A M P
C o l ( B S 5 1 2 )
C o l ( B S 5 1 2 )
C o l ( B S 5 1 2 )
I n c I 1
I n c I 1
I n c I 1
C o l p V C
C o l p V C
C o l p V C
I n c A / C 2
I n c A / C 2 I n c A / C 2
C o l R N A I
C o l R N A I
C o l R N A I
BOV NEWP WA 10 R9 3240. In addition, the IncA/C2 and ColRNAI replicons
were detected in all S. Newport genomes (Table 2.1). Neither ANOSIM nor
PERMANOVA detected significant associations between AMR genes or plas-
mid replicon presence/absence and source after correcting for multiple testing
(P > 0.05 after a Holm-Bonferroni correction [Table 2.2]). However, the AMR
gene sequences of Washington State and New York State isolates were found to
differ when ANOSIM was used as a metric (P < 0.05 after a Holm-Bonferroni
correction [Table 2.2]). When Fisher’s exact test was used to assess source and
geographic associations at the individual gene level, genes belonging to the
aph(3”)-Ia group were more commonly present in (i) S. Newport bovine isolates
and (ii) isolates from New York State (P < 0.05 after a Holm-Bonferroni correc-
tion [Table 2.5]). Additionally, the ColpVC plasmid replicon was detected in all
bovine S. Newport isolates and only 67% of the human isolates (P < 0.05 after a
Holm-Bonferroni correction [Table 2.5]).
S. Newport isolates appeared even more similar at the phenotypic AMR
level than at the genetic level. No significant source or geographic differences in
MDR phenotype were observed when ANOSIM and PERMANOVA were used
to assess clustering (P > 0.05 after a Holm-Bonferroni correction) (Table 2.2). All
32 S. Newport isolates were resistant to AMC, AMP, FOX, TIO, CRO, SX, STR,
and TET, and only 3 different phenotypic profiles were detected (Figure 2.4).
The most common of these, AMC-AMP-FOX-TIO-CRO-CHL-SX-STR-TET, was
carried by 27 of the 32 (84%) S. Newport isolates. Three isolates showed addi-
tional resistance to SXT; hence, the two most common profiles accounted for 30
of the 32 (94%) isolates. The three SXT-resistant isolates possessed aadA, dfrA,
and sul1, which were not detected in any other S. Newport genomes (Figure
2.4).
52
2.4.5 S. Dublin phylogeny, AMR genes, AMR phenotypes, and
plasmid replicons
S. Dublin isolates clustered into two separate clades with a posterior probabil-
ity of 1, one of which consisted of 10 isolates exclusively from Washington State
(referred to here as the Washington State clade) (Figure 2.5). The other clade
included all eight S. Dublin isolates from New York State and three isolates
from Washington State (referred to here as the mixed clade) (Figure 2.5). Both
genotypic and phenotypic differences were observed between the two major
clades. AMR genes aadB and cmlA, which were detected in all but 1 Washing-
ton State state clade isolate, were not detected in any of the mixed clade isolates
(P < 0.05 after a Holm-Bonferroni correction) (Figure 2.5). Not surprisingly, the
frequencies at which these genes were detected in New York and Washington
States were significantly different when Fisher’s exact test was used (P < 0.05
after a Holm-Bonferroni correction) (Table 2.5). ANOSIM and PERMANOVA
did not identify significant differences between S. Dublin geographic clusters
formed by AMR gene sequences (Table 2.2). However, when ANOSIM and
PERMANOVA were conducted using plasmid replicon presence/absence data,
significant differences between New York and Washington State isolate clus-
ters were observed for S. Dublin (P < 0.05 after a Holm-Bonferroni correction)
(Table 2.2). In addition, when Fisher’s exact test was used to test for possible
geographic associations of individual plasmid replicons, the IncFII(S) replicon
was detected only in mixed clade isolates, making it more commonly associated
with isolates from New York State (P < 0.05 after a Holm-Bonferroni correction)
(Figure 2.5).
Significant differences between New York and Washington State isolate clus-
53
54
AMR Genes Phenotypic AMR Plasmid Replicons
AMR Genes Phenotypic AMR Plasmid Replicons
BOV_DUBN_WA_08_R9_3231
BOV_DUBN_WA_08_R9_3231 BOV_DUBN_WA_08_R9_3231
BOV_DUBN_WA_08_R9_3231 HUM_DUBN_WA_10_R9_3256BOV_DUBN_WA_08_R9_3231 HUM_BDOUVB_ND_UWBAN_1_0W_AR_90_83_2R569_3231 BOHVU_MD_UDBUNB_NW_AW_A08__1R0_9_R392_331256
0.3322 BOV_DUBN_WA_12_R9_3236 BOV_DUBN_WA_12_R9_3236 BOV_DUBN_WA_12_R9_3236
BOV_DUBN_WA_09_R9_3232
HUM_DUBN_WA_10_R9_3256 BOV_DUBN_WA_09_R9_3232 BOV_DUBN_WA_09_R9_3232
0.5409 HBUOMV__DDUUBBNN__WWAA__1101__RR99__33225560 BOV_DHUUBMN__DWUAB_N1_1W_RA9__1302_5R09_3256 HUBOMV_D_DUUBBNN_W_WA_A1_01_1R_9R_93_2352650
BOV_DUBN_WA_10_R9_3234
BOV_DUBN_WA_10_R9_3234 BOV_DUBN_WA_10_R9_3234
BOV_DUBN_WA_12_R9_3236 HUM_DUBN_WA_10_R9_3255BOV_DUBN_WA_12_R9_3236 HUM_BDOUVB_ND_UWBAN_1_0W_AR_91_23_2R559_3236 BOHVU_MD_UDBUNBHUM_DUBN_WA_11_R9_3257 _
NW_AW_A12__1R0_9_R392_336255
HUM_DUBN_WA_11_R9_3257 HUM_DUBN_WA_11_R9_3257
BOV_DUBN_WA_11_R9_3235
0.3212 BOV_DUBN_WA_11_R9_3235 BOV_DUBN_WA_11_R9_3235BOV_DUBN_WA_09_R9_3232 BBOOVV__DDUUBBNN__WWAA__0190__RR99__33223323 BOV_DBUOBVN_D_WUAB_N1_0W_RA9__0392_3R39_3232 BOBVO_VD_UDBUNB_NW_AW_A0_91_0R_9R_93_2332233
1 HUM_DUBN_NY_08_R8_1599 HUM_DUBN_NY_08_R8_1599 HUM_DUBN_NY_08_R8_1599
BOV_DUBN_NY_10_R8_7251
BOV_DUBN_WA_11_R9_3250 BOV_DUBN_NY_10_R8_7251 BOV_DUBN_NY_10_R8_7251BHOUVM_D_DUUBBNN_W_NAY__1110__RR98__37295506 BOV_DUBN_WA_11_R9_3250 BOV_DUBN_WA_11_R9_3250
HUM_DUBN_NY_10_R8_7956 HUM_DUBN_NY_10_R8_7956
1 HUM_DUBN_NY_08_R8_3349 HUM_DUBN_NY_08_R8_3349 HUM_DUBN_NY_08_R8_3349
BOV_DUBN_WA_10_R9_3234 HUM_DUBN_NY_10_R8_5384BOV_DUBN_WA_10_R9_3234 HUM_BDOUVB_ND_UNBYN_1_0W_RA_81_503_8R49_3234 BOHVU_MD_UDBUNB_NW_AN_Y1_01_0R_9R_382_354384
0.3592 HUM_DUBN_WA_12_R9_3258 HUM_DUBN_WA_12_R9_3258 HUM_DUBN_WA_12_R9_3258
0.4936 HUM_DUBN_NY_10_R8_4810
HUM_DUBN_WA_10_R9_3255 HUM_DUBN_NY_10_R8_4810 HUM_DUBN_NY_10_R8_4810HBUOMV__DDUUBBNN__WWAA__1009__RR99__33225359 HU
BOV_DHUUBMN__DWUAB_N0_9W_RA9__1302_3R99_3255 BOMV_D_DUUBBNN_W_WA_A1_00_9R_9R_93_2352539
1 BOV_DUBN_NY_08_R8_3274
BOV_DUBN_NY_08_R8_3274 BOV_DUBN_NY_08_R8_3274
HUM_DUBN_NY_08_R8_3358
HUM_DUBN_WA_11_R9_3257 HUM_DUBN_WA_11_R9_3257 HUM_DUBN_NY_08_R8_3358 HUHMUM_D_UDBUNBN_NY_08_R8_33581 HUM_DUBN_WA_08_R9_3253 HUM_DUBN_WA_11_R9_3257 _WA_11_R9_3257
HUM_DUBN_WA_08_R9_3253 HUM_DUBN_WA_08_R9_3253
BOV_DUBN_WA_11_R9_3235 BOV_DUBN_WA_11_R9_3235 BOV_DUBN_WA_11_R9_3235 BOV_DUBN_WA_11_R9_3235
BOV_DUBN_WA_10_R9_3233 BOV_DUBN_WA_10_R9_3233 BOV_DUBN_WA_10_R9_3233 BOV_DUBN_WA_10_R9_3233
HUM_DUBN_NY_08_R8_1599 HUM_DUBN_NY_08_R8_1599 HUM_DUBN_NY_08_R8_1599 HUM_DUBN_NY_08_R8_1599
BOV_DUBN_NY_10_R8_7251 BOV_DUBN_NY_10_R8_7251
1 BOV_DUBN_NY_10_R8_7251
BOV_DUBN_NY_10_R8_7251
1 1
HUM_DUBN_NY_10_R8_7956
0.23 HUM_DUBN_NY_10_R8_7956 HUM_DUBN_NY_10_R8_7956 HUM_DUBN_NY_10_R8_7956
0.2389 HUM_DUBN_NY_08_R8_33491 HUM_DUBN_NY_08_R8_3349 HUM_DUBN_NY_08_R8_3349 HUM_DUBN_NY_08_R8_3349
HUM_DUBN_NY_10_R8_5384 HUM_DUBN_NY_10_R8_5384 HUM_DUBN_NY_10_R8_5384 HUM_DUBN_NY_10_R8_5384
HUM_DUBN_WA_12_R9_3258 HUM_DUBN_WA_12_R9_3258 HUM_DUBN_WA_12_R9_3258 HUM_DUBN_WA_12_R9_3258
1
HUM_DUBN_NY_10_R8_4810 HUM_DUBN_NY_10_R8_4810 HUM_DUBN_NY_10_R8_4810 HUM_DUBN_NY_10_R8_4810
0.466
BOV_DUBN_WA_09_R9_3239 BOV_DUBN_WA_09_R9_3239 BOV_DUBN_WA_09_R9_3239 BOV_DUBN_WA_09_R9_3239
0.528
BOV_DUBN_NY_08_R8_3274 BOV_DUBN_NY_08_R8_3274 BOV_DUBN_NY_08_R8_3274 BOV_DUBN_NY_08_R8_3274
1
HUM_DUBN_NY_08_R8_3358 HUM_DUBN_NY_08_R8_3358 HUM_DUBN_NY_08_R8_3358 HUM_DUBN_NY_08_R8_3358
0.5507
HUM_DUBN_WA_08_R9_3253 HUM_DUBN_WA_08_R9_3253 HUM_DUBN_WA_08_R9_3253 HUM_DUBN_WA_08_R9_3253
4.0E-7
Figure 2.5: Phylogenetic tree of S. Dublin isolates constructed using BEAST. Gene groups for AMR genes detected in each
genome sequence at more than 50% coverage and 75% identity using BLAST (blastn) and ARG-ANNOT are indicated in
green. Antimicrobials to which each isolate is resistant are indicated in red, and intermediate resistance to an antimicro-
bial is indicated in orange. Plasmid replicons detected in each genome sequence using PlasmidFinder are indicated in
puFripguler.e B8.r aPnhyclhogleennegtitch tsreaer oef rSe.p Dourbtleidn iisnolsautebs sctointusttriuocntesd puesirngs iBteE,AwShTi. lGeepnoe sgtreoruiposr fporr oAbMaRb iglietnieess daerteecrteepd oinr teeadcha stetqrueeencneo adt emso. re 
than 50% coverage and 75% identity using BLAST (blastn) and ARG-ANNOT are indicated in green. Antimicrobials to which each isolate is 
resistant are indicated in red, with intermediate resistance to an antimicrobial indicated by orange. Plasmid replicons detected in each 
sequFeingcuer eu s8i.n Pgh yPlloagsemneidtiFc itnredee rofa Sre.  Dinudbilcina tiesodl ainte sp ucorpnsletr.u Bctreadn ucshin lge nBgEthAsS aTr. eA rMepRo grteende si nd estuecbtsetdit iunt ieoancsh  pseeqr useintec,e  wath miloer pe othsatenr i5o0r% p rcoobvearbaigliet ies are 
and 75% identity using BLAST (blastn) and ARG-ANNOT arree pinodritceadt eadt  itnr eger eneond. Aesn.t imicrobials to which each isolate is resistant are indicated 
in red, with intermediate resistance to an antimicrobial indicated by orange. Plasmid replicons detected in each sequence using PlasmidFinder are 
indicated in purple. Branch lengths are reported in substitutions per site, while posterior probabilities are reported at tree nodes.
c m l A
c m l A
c m l A
a a d B
a a d B
a a d B
T E M − 1 D
T E M − 1 D
T E M - 1 D
a p h ( 3 ' ' ) − I a
a p h ( 3 ' ' ) − I a
a p h ( 3 ’ ’ ) - I a
f l o R
f l o R
f l o R
T e t ( R )
T e t ( R )
T e t ( R )
s t r A
s t r A
s t r A
T e t ( A )
T e t ( A )
T e t ( A )
s t r B
s t r B
s t r B
P e n i c i l l i n _ B i n d i n g _ P r o t e i n _ E c o l i
P e n i c i l l i n _ B i n d i n g _ P r o t e i n _ E c o l i
P B P
s u l 2
s u l 2
s u l 2
a a c ( 6 ) − I a a
a a c ( 6 ) − I a a
a a c ( 6 ) - I a a
C M Y
C M Y
C M Y
S X T
S X T
S X T
S T R
S T R
S T R
T I O T I O
T I O
C R O
C R O
C R O
A M P
A M P
A M P
T E T
T E T
T E T
C H L
C H L
C H L
S X
S X
S X
A M C
A M C
A M C
F O X
F O X
F O X
I n c F I I ( S )
I n c F I I ( S )
I n c F I I ( S )
C o l p V C
C o l p V C
C o l p V C
C o l R N A I
C o l R N A I
C o l R N A I
I n c A / C 2
I n c A / C 2
I n c A / C 2
I n c X 1
I n c X 1
I n c X 1
ters were observed for S. Dublin when ANOSIM and PERMANOVA were con-
ducted using phenotypic resistance/susceptibility data (P < 0.05 after a Holm-
Bonferroni correction) (Table 2.2). Despite the detection of both strA and strB
in 20 of the 21 genomes (Table 2.1), STR resistance was observed only in iso-
lates in the mixed clade (P < 0.05 after a Holm-Bonferroni correction) (Figure
2.5). While the strB sequence was the same for the 20 isolates, the strA sequence
showed a strong geographical association: all isolates in the Washington State
clade possessed a truncated form of the gene, with the first 91 bp of the gene
missing. Aside from this 91-bp deletion, the strA sequences were identical in all
isolates. Overall, 11 isolates carried strB and a full-length strA; 10 of these iso-
lates showed phenotypic STR resistance. However, 9 isolates carried strB and
a truncated strA; all of these isolates were sensitive to STR. These data suggest
that the presence of the truncated strA variant found here does not confer STR
resistance and also suggest that the presence of only the strB variant found here,
in the absence of a full-length strA, does not confer STR resistance.
The S. Dublin isolates were distributed into 8 different AMR genotypic pro-
files, with 33% of isolates genes belonging to the aac(6)-Iaa floR CMY sul2 tet(A)
aph(3”)-Ia blaTEM-1D strB strA tet(R) PBP gene genotypic profile. The most
common resistance genes in S. Dublin belonged to the aac(6)-Iaa, CMY, and
sul2 groups, all of which were detected in all 21 S. Dublin isolates (Table 2.1).
The sequences of these genes were identical for all S. Dublin isolates, regard-
less of source or geographic location. The PBP gene was also detected in all 21
genomes (Table 2.1). PBP gene sequences for 20 isolates were identical; only the
sequence of isolate BOV DUBN WA 09 R9 3239 differed by a single nucleotide
from the 20 other sequences. In addition, the replicon for IncX1, which had
been detected in only 1 S. Typhimurium isolate and no S. Newport isolates in
55
this study, was detected in all 21 S. Dublin genomes (Figure 2.5). At the phe-
notypic level, 6 different phenotypic profiles were observed. The two most
common, AMC-AMP-FOX-TIO-CRO-CHL-SX-TET and AMC-AMP-FOX-TIO-
CRO-CHL-SX-STR-TET, were observed in 43% and 38% of S. Dublin isolates,
respectively.
2.5 Discussion
Antimicrobial resistance in zoonotic and foodborne pathogens is considered to
be one of the most serious threats to public health today (CDC 2013; WHO
2014). The emergence and dispersal of AMR Salmonella are particularly prob-
lematic, due to (i) the fact that non-typhoidal Salmonella represents one of the
most common causes of foodborne disease cases and associated deaths world-
wide (WHO 2015) and (ii) reports on the emergence and dispersal of differ-
ent multidrug-resistant Salmonella strains (e.g., Salmonella Typhimurium DT104)
(Helms et al. 2005; Leekitcharoenphon et al. 2016; Ribot et al. 2002). Studies
of the relationships between AMR determinants and MDR strains found in
humans and animals are often confounded by the selection of the isolates in-
cluded in a given study, in which human and animal isolates may be of different
serotypes, geographical locations, or temporal intervals. To further our under-
standing of AMR diversity and dispersal in Salmonella, we thus assembled and
characterized a set of Salmonella isolates that (i) represented 3 serotypes associ-
ated with both human and bovine populations, (ii) were isolated over the same
time frame (2008 to 2012), (iii) were matched by source (human or animal) so
that approximately equal numbers of human and bovine isolates were selected
from each serotype, and (iv) were matched by geographical location so that sim-
56
ilar numbers of human and bovine isolates of the three different serotypes were
obtained from each of the states of Washington and New York. Our data ob-
tained from these isolates suggest that (i) WGS can be used to reliably predict
phenotypic resistance across Salmonella isolates from both human and bovine
sources, (ii) geographical differences can contribute to distinct, location-specific
AMR patterns, and (iii) despite an overlap of AMR geno- and phenotypes, hu-
man and bovine isolates differ significantly based on a number of AMR-related
geno- and phenotypic characteristics.
2.5.1 WGS can be used to predict phenotypic resistance in
bovine and human-associated Salmonella Typhimurium,
Newport, and Dublin with high sensitivity and specificity
Our study reported here demonstrates that in silico AMR gene predictions
are highly correlated with phenotypic resistance in Salmonella enterica Ty-
phimurium, Newport, and Dublin, as AMR genotype correlated with AMR phe-
notype with an overall sensitivity and specificity of 97.2 and 85.2%, respectively.
The ability to predict AMR phenotype from WGS data with high sensitivity and
specificity has previously been observed in Salmonella enterica isolated from hu-
mans and retail meats (McDermott et al. 2016) and S. Typhimurium from swine
(Zankari et al. 2012), as well as in other organisms, including Staphylococcus au-
reus (Gordon et al. 2014; Bradley et al. 2015), Campylobacter spp. (Zhao et al.
2016), and Mycobacterium tuberculosis (Bradley et al. 2015). The results of our
study further attest to the robustness of WGS in predicting resistance pheno-
types in Salmonella enterica serotypes Typhimurium, Newport, and Dublin from
57
both bovine and human sources. Verification of the ability of WGS to predict
phenotypic AMR in bovine isolates is important, as AMR in isolates from dif-
ferent hosts can be facilitated by different mechanisms, as also shown here. Our
data further support that as WGS becomes faster, cheaper, and more accessible,
it may represent a valuable tool that could replace classical phenotypic AMR
testing across human medical, public health, and veterinary fields.
In this study, the lowest sensitivity of predicting AMR phenotype from geno-
typic data occurred for NAL. This was not surprising, since the AMR pheno-
type prediction approach used here was based on the presence of genes that
confer resistance to a given antibiotic. While AMR gene-based approaches gen-
erally work well, quinolone and fluoroquinolone resistance in particular can
result from point mutations in housekeeping genes (e.g., gyrA) rather than from
the presence of resistance genes, even though the presence of some resistance
genes (e.g., PMQR genes) may also confer low-level resistance to quinolones
and fluoroquinolones (Cloeckaert and Chaslus-Dancla 2001; Hooper and Ja-
coby 2015). In our study, the two isolates that showed intermediate resistance to
NAL possessed PMQR genes, but no mutations in housekeeping genes known
to confer resistance to quinolones. This is consistent with previous findings, in
which isolates possessing PMQR genes have been shown to have reduced sus-
ceptibility to quinolones but were not clinically resistant (Hooper and Jacoby
2015). Of the four NAL-resistant isolates, three concurrently possessed PMQR
genes and non-synonymous mutations in the quinolone resistance-determining
region (QRDR) of gyrA. One isolate that was NAL resistant due to the presence
of only a non-synonymous mutation in gyrA was falsely predicted to be NAL
sensitive, due to an absence of quinolone resistance genes in its genome. This
showcases that relying solely on gene presence/absence to predict AMR can re-
58
sult in reduced sensitivity. However, this drawback can be easily alleviated by
incorporating SNP-based prediction of AMR (as now has been implemented in
the ARG-ANNOT and CARD bioinformatic tools) (Gupta et al. 2014; Jia et al.
2017).
In this study, the lowest specificity of WGS-based AMR prediction was
observed for STR, which accounted for more than one-half of all phenotype-
susceptible/genotype-resistant (P-:G+) discrepancies. Here, more than 50% of
these discrepancies were attributed to S. Dublin isolates from the Washington
State clade, which carry a truncated strA that appeared to not confer STR resis-
tance, while still being identified computationally as an STR resistance deter-
minant. Similar discrepancies have been observed in a previous study (Davis,
Besser, Orfe, et al. 2011) of Escherichia coli isolates from dairy calves; in this
study, point mutations in strA were hypothesized to affect its ability to confer
STR resistance. Additionally, a previous study that assessed phenotypic and
genotypic resistance in non-typhoidal Salmonella isolated from retail meat and
human clinical samples also found STR (P-:G+) discrepancies to be the most
common (McDermott et al. 2016). The authors of this previous study suggest
that STR (P-:G+) discrepancies could be due to inaccurate clinical breakpoints
for STR susceptibility in Salmonella, due in part to the fact that STR is not used
to treat enteric infections (McDermott et al. 2016). Overall, these findings sug-
gest that refinement of WGS-based AMR prediction methods could benefit from
the incorporation of tools that also classify specific allelic variants of resistance
genes for their ability (or inability) to confer resistance. In the future, WGS-
based AMR prediction tools that incorporate feedback from clinical use of an-
tibiotics may even further improve the ability of WGS-based tools to predict the
clinical outcome of treatment with a given antimicrobial.
59
2.5.2 Both phenotypic and genomic data show geographic dif-
ferences in resistance-related characteristics for Salmonella,
suggesting a need for location-specific AMR control
strategies.
Our data show significant differences between New York and Washington State
isolates with regard to AMR-relevant genotypic and phenotypic characteris-
tics. Specifically, when ANOSIM and/or PERMANOVA were used as metrics,
Washington and New York State isolates differed by (i) AMR gene sequences
(in serotype Newport) and (ii) phenotypic resistance/susceptibility and plas-
mid replicon presence/absence (in serotype Dublin) (Table 2.2). In addition,
a number of genes, antimicrobials, and plasmid replicons showed strong ge-
ographical associations, even after corrections for multiple testing (Table 2.5).
For example, the presence of aadB and cmlA was associated with S. Dublin
isolates from Washington State, while STR resistance was associated with S.
Dublin from New York State. In S. Typhimurium, the IncI1 plasmid replicon,
which has been previously associated with extended-spectrum cephalosporin
resistance in S. Typhimurium (Folster et al. 2014; Jean-Yves Madec et al. 2011),
was more commonly detected in isolates from New York State. In S. Dublin, the
IncFII(S) plasmid replicon was also more commonly detected in isolates from
New York State; the IncFII(S) replicon, along with IncFIB(S), are characteristic
of the Salmonella virulence plasmids (Carattoli et al. 2014) found in serotypes
such as S. Typhimurium and S. Dublin, and it has been proposed that some
virulence plasmids previously associated with S. Dublin have evolved from
IncFII-like plasmids (Chu et al. 2008). The geographic differences observed for
60
MDR-relevant genotypic and phenotypic characteristics suggest that different
ecological factors and selective pressures may contribute to the development
of AMR in different geographical locations (New York State and Washington
State in our study here), suggesting a need for geographically specific inter-
ventions to effectively combat the spread of AMR. Our findings are consistent
with previous studies that have shown that contemporary Salmonella antibiotic
resistance patterns differ, even within a given country. For example, Davis et
al. (Davis, Besser, Eckmann, et al. 2007) showed that a specific MDR Salmonella
Typhimurium strain emerged prior to 2000 in bovine populations in the Pacific
Northwest (which includes Washington State) but was not found among con-
temporary isolates from the Northeast. Similarly, a large-scale WGS study of
Salmonella Typhi isolates from across the world identified a specific MDR clone
that emerged in Asia and Africa with subsequent inter- and intracontinental
transmission events (Wong et al. 2015). Importantly, our findings are also con-
sistent with a WGS-based study (Strachan et al. 2015) of Escherichia coli O157 iso-
lates from different sources (e.g., animals, humans, and the environment/food)
and different countries and continents. This study reported significant genetic
differences among isolates from different geographical regions and hypothe-
sized that a combination of local emergence events and international transmis-
sion leads to a ”patchwork” of geographically confined and widely distributed
clades. This is similar to what we have observed, as we have identified cer-
tain geographic location-specific clones (e.g., a Washington State-specific Dublin
clade that carries a truncated strA allele), as well as broadly distributed clonal
groups with similar AMR profiles.
61
2.5.3 S. enterica isolates from humans contain a more diverse
range of AMR genes and plasmid replicons than those
isolated from bovine populations
The development and spread of AMR have often been attributed to the mis-
use of antimicrobials in agricultural settings. However, the AMR profiles of
Salmonella isolated from human infections cannot be fully explained by AMR
in bovine isolates in this study alone. Here, resistance to CIP, NAL, and SXT
were observed only in isolates from humans with salmonellosis. At the geno-
typic level, over one-half of the total of 42 AMR genes detected in this study
were detected only in human isolates. Similar results were observed for plas-
mid replicons, as nearly one-half of the plasmid replicons detected were found
only in human isolates. These results, along with the phylogenetic relationship
of the isolates, suggest that some AMR genes are associated primarily with a
particular host, with little overlap between species. Mather et al. (A. E. Mather
et al. 2013) observed similar results for human- and animal-associated S. Ty-
phimurium DT104: Salmonella isolates from humans and animals, as well as the
AMR genes associated with them, were found to remain largely within their re-
spective host populations, with little transmission from animals to humans and
vice versa (A. E. Mather et al. 2013).
While many AMR genes and phenotypes were confined to the human iso-
lates in this study, overlaps between the resistomes of bovine and human-
associated Salmonella isolates were observed on numerous occasions, with the
high degree of AMR sequence identity observed for S. Newport isolates serving
as the most prominent example. This also is consistent with previous studies
62
(Spoor et al. 2013; Ward et al. 2014; J.-Y. Madec et al. 2017) that similarly de-
scribed that certain clonal groups of AMR pathogens can be found in both hu-
mans and animals. However, further studies using WGS data from temporally
sampled Salmonella enterica are needed to assess the spread of AMR Salmonella
and the resistance genes associated with it in New York State and Washington
State.
2.6 Acknowledgments
This material is based on work supported by the National Science Foundation
Graduate Research Fellowship Program under grant no. DGE-1144153. Re-
search reported in this publication was supported by the Agriculture and Food
Research Initiative Competitive Grant no. 2010-51110-21131 from the USDA
National Institute of Food and Agriculture. The content is solely the responsi-
bility of the authors and does not necessarily represent the official views of the
USDA.
2.7 References
Anderson, Marti J. (2001). “A new method for non-parametric multivariate anal-
ysis of variance”. In: Austral Ecology 26.1, pp. 32–46. DOI: doi:10.1111/j.
1442-9993.2001.01070.pp.x.
Anderson, Marti J. and Daniel C. I. Walsh (2013). “PERMANOVA, ANOSIM,
and the Mantel test in the face of heterogeneous dispersions: What null
hypothesis are you testing?” In: Ecological Monographs 83.4, pp. 557–574.
DOI: 10 . 1890 / 12 - 2010 . 1. eprint: https : / / esajournals .
onlinelibrary.wiley.com/doi/pdf/10.1890/12-2010.1.
63
Andres, Patricia et al. (2013). “Differential distribution of plasmid-mediated
quinolone resistance genes in clinical enterobacteria with unusual pheno-
types of quinolone susceptibility from Argentina”. In: Antimicrob Agents
Chemother 57.6, pp. 2467–2475. DOI: 10.1128/AAC.01615-12.
Andrews, S. (2014). “FastQC A Quality Control tool for High Throughput
Sequence Data”. In: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
DOI: citeulike-article-id:11583827.
Baele, Guy, Philippe Lemey, et al. (2012). “Improving the accuracy of demo-
graphic and molecular clock model comparison while accommodating phy-
logenetic uncertainty”. In: Mol Biol Evol 29.9, pp. 2157–2167. DOI: 10.1093/
molbev/mss084.
Baele, Guy, Wai Lok Sibon Li, Alexei J. Drummond, Marc A. Suchard, and
Philippe Lemey (2013). “Accurate model selection of relaxed molecular
clocks in bayesian phylogenetics”. In: Mol Biol Evol 30.2, pp. 239–243. DOI:
10.1093/molbev/mss243.
Bankevich, A. et al. (2012). “SPAdes: a new genome assembly algorithm and
its applications to single-cell sequencing”. In: J Comput Biol 19.5, pp. 455–77.
DOI: 10.1089/cmb.2012.0021.
Bolger, A. M., M. Lohse, and B. Usadel (2014). “Trimmomatic: a flexible trimmer
for Illumina sequence data”. In: Bioinformatics 30.15, pp. 2114–20. DOI: 10.
1093/bioinformatics/btu170.
Bradley, Phelim et al. (2015). “Rapid antibiotic-resistance predictions from
genome sequence data for Staphylococcus aureus and Mycobacterium tubercu-
losis”. In: Nat Commun 6, pp. 10063–10063. DOI: 10.1038/ncomms10063.
Bushnell, B. (2015). “BBMap v. 35.49, https://sourceforge.net/projects/bbmap/”.
In:
Camacho, C. et al. (2009). “BLAST+: architecture and applications”. In: BMC
Bioinformatics 10, p. 421. DOI: 10.1186/1471-2105-10-421.
64
Carattoli, A. et al. (2014). “In silico detection and typing of plasmids using Plas-
midFinder and plasmid multilocus sequence typing”. In: Antimicrob Agents
Chemother 58.7, pp. 3895–903. DOI: 10.1128/AAC.02412-14.
CDC (2013). Antibiotic resistance threats in the United States, 2013. CDC, Atlanta,
GA.
Chase, Jonathan M., Nathan J. B. Kraft, Kevin G. Smith, Mark Vellend, and
Brian D Inouye (2011). “Using null models to disentangle variation in com-
munity dissimilarity from variation in alpha-diversity”. In: Ecosphere 2.2,
art24. DOI: 10.1890/ES10-00117.1. eprint: https://esajournals.
onlinelibrary.wiley.com/doi/pdf/10.1890/ES10-00117.1.
Chu, Chishih et al. (2008). “Evolution of genes on the Salmonella Virulence plas-
mid phylogeny revealed from sequencing of the virulence plasmids of S. en-
terica serotype Dublin and comparative analysis”. In: Genomics 92.5, pp. 339–
343.
Clarke, K. R. (1993). “Non-parametric multivariate analyses of changes in com-
munity structure”. In: Australian Journal of Ecology 18.1, pp. 117–143. DOI:
10 . 1111 / j . 1442 - 9993 . 1993 . tb00438 . x. eprint: https : / /
onlinelibrary.wiley.com/doi/pdf/10.1111/j.1442-9993.
1993.tb00438.x.
Cloeckaert, Axel and Elisabeth Chaslus-Dancla (2001). “Mechanisms of
quinolone resistance in Salmonella”. In: Vet. Res. 32.3-4, pp. 291–300. DOI:
10.1051/vetres:2001105.
CLSI (2012). Performance standards for antimicrobial susceptibility testing, twenty-
second informational supplement. M100-D22, 22nd ed. Clinical and Laboratory
Standards Institute, Wayne, PA.
— (2013). Performance standards for antimicrobial disk and dilution susceptibility
tests for bacteria isolated from animals approved standard, fourth edition, VET01-
A4, 3rd ed. Clinical and Laboratory Standards Institute, Wayne, PA.
Cody, Sara H. et al. (1999). “Two Outbreaks of Multidrug-Resistant Salmonella
Serotype Typhimurium DT104 Infections Linked to Raw-Milk Cheese in
65
Northern California”. In: JAMA 281.19, pp. 1805–1810. DOI: 10 . 1001 /
jama.281.19.1805. eprint: https://jamanetwork.com/journals/
jama/articlepdf/189982/joc81201.pdf.
Croucher, N. J. et al. (2015). “Rapid phylogenetic analysis of large samples of
recombinant bacterial whole genome sequences using Gubbins”. In: Nucleic
Acids Res 43.3, e15. DOI: 10.1093/nar/gku1196.
Davis, Margaret A., Thomas E. Besser, Kaye Eckmann, et al. (2007). “Multidrug-
resistant Salmonella typhimurium, Pacific Northwest, United States”. In:
Emerg Infect Dis 13.10, pp. 1583–1586. DOI: 10.3201/eid1310.070536.
Davis, Margaret A., Thomas E. Besser, Lisa H. Orfe, et al. (2011). “Genotypic-
Phenotypic Discrepancies between Antibiotic Resistance Characteristics of
Escherichia coli Isolates from Calves in Management Settings with High
and Low Antibiotic Use”. In: Applied and Environmental Microbiology 77.10,
pp. 3293–3299. DOI: 10.1128/AEM.02588-10. eprint: https://aem.
asm.org/content/77/10/3293.full.pdf.
Drummond, A. J., S. Y. Ho, M. J. Phillips, and A. Rambaut (2006). “Relaxed
phylogenetics and dating with confidence”. In: PLoS Biol 4.5, e88. DOI: 10.
1371/journal.pbio.0040088.
Drummond, A. J., A. Rambaut, B. Shapiro, and O. G. Pybus (2005). “Bayesian co-
alescent inference of past population dynamics from molecular sequences”.
In: Mol Biol Evol 22.5, pp. 1185–92. DOI: 10.1093/molbev/msi103.
Drummond, Alexei J., Marc A. Suchard, Dong Xie, and Andrew Rambaut (2012).
“Bayesian phylogenetics with BEAUti and the BEAST 1.7”. In: Mol Biol Evol
29.8, pp. 1969–1973. DOI: 10.1093/molbev/mss075.
Fey, Paul D. et al. (2000). “Ceftriaxone-Resistant Salmonella Infection Acquired
by a Child from Cattle”. In: New England Journal of Medicine 342.17. PMID:
10781620, pp. 1242–1249. DOI: 10.1056/NEJM200004273421703. eprint:
https://doi.org/10.1056/NEJM200004273421703.
Folster, Jason P. et al. (2014). “Characterization of blaCMY plasmids and
their possible role in source attribution of Salmonella enterica serotype Ty-
66
phimurium infections”. In: Foodborne Pathog Dis 11.4, pp. 301–306. DOI: 10.
1089/fpd.2013.1670.
Fricke, W. Florian et al. (2009). “Comparative genomics of the IncA/C multidrug
resistance plasmid family”. In: J Bacteriol 191.15, pp. 4750–4757. DOI: 10.
1128/JB.00189-09.
Gardner, S. N. and B. G. Hall (2013). “When whole-genome alignments just
won’t work: kSNP v2 software for alignment-free SNP discovery and phylo-
genetics of hundreds of microbial genomes”. In: PLoS One 8.12, e81760. DOI:
10.1371/journal.pone.0081760.
Gordon, N. C. et al. (2014). “Prediction of Staphylococcus aureus Antimicrobial
Resistance by Whole-Genome Sequencing”. In: Journal of Clinical Microbiol-
ogy 52.4. Ed. by K. C. Carroll, pp. 1182–1191. DOI: 10.1128/JCM.03117-
13. eprint: https://jcm.asm.org/content/52/4/1182.full.pdf.
Gupta, S. K. et al. (2014). “ARG-ANNOT, a new bioinformatic tool to dis-
cover antibiotic resistance genes in bacterial genomes”. In: Antimicrob Agents
Chemother 58.1, pp. 212–20. DOI: 10.1128/AAC.01310-13.
Hald, Tine et al. (2016). “World Health Organization Estimates of the Relative
Contributions of Food to the Burden of Disease Due to Selected Foodborne
Hazards: A Structured Expert Elicitation”. In: PLOS ONE 11.1, pp. 1–35. DOI:
10.1371/journal.pone.0145839.
Hansen, Lars Hestbjerg, Lars Bogo Jensen, Heidi Iskou Sorensen, and Soren Jo-
hannes Sorensen (2007). “Substrate specificity of the OqxAB multidrug re-
sistance pump in Escherichia coli and selected enteric bacteria”. In: Journal of
Antimicrobial Chemotherapy 60.1, pp. 145–147. DOI: 10.1093/jac/dkm167.
eprint: http://oup.prod.sis.lan/jac/article-pdf/60/1/145/
2178195/dkm167.pdf.
Helms, M., S. Ethelberg, K. Molbak, and D. T. Study Group (2005). “Interna-
tional Salmonella Typhimurium DT104 infections, 1992-2001”. In: Emerg Infect
Dis 11.6, pp. 859–67. DOI: 10.3201/eid1106.041017.
67
Hendriksen, Susan W. M., Karin Orsel, Jaap A. Wagenaar, Angelika Miko,
and Engeline van Duijkeren (2004). “Animal-to-human transmission of
Salmonella Typhimurium DT104A variant”. In: Emerg Infect Dis 10.12,
pp. 2225–2227. DOI: 10.3201/eid1012.040286.
Hoelzer, Karin, Andrea Isabel Moreno Switt, and Martin Wiedmann (2011).
“Animal contact as a source of human non-typhoidal salmonellosis”. In: Vet
Res 42.1, pp. 34–34. DOI: 10.1186/1297-9716-42-34.
Holm, Sture (1979). “A Simple Sequentially Rejective Multiple Test Procedure”.
In: Scandinavian Journal of Statistics 6.2, pp. 65–70.
Holmes, A. et al. (2015). “Utility of Whole-Genome Sequencing of Escherichia
coli O157 for Outbreak Detection and Epidemiological Surveillance”. In: J
Clin Microbiol 53.11, pp. 3565–73. DOI: 10.1128/JCM.01066-15.
Hooper, David C. and George A. Jacoby (2015). “Mechanisms of drug resis-
tance: quinolone resistance”. In: Ann N Y Acad Sci 1354.1, pp. 12–31. DOI:
10.1111/nyas.12830.
Inouye, M. et al. (2014). “SRST2: Rapid genomic surveillance for public health
and hospital microbiology labs”. In: Genome Med 6.11, p. 90. DOI: 10.1186/
s13073-014-0090-6.
Iqbal, Zamin, Mario Caccamo, Isaac Turner, Paul Flicek, and Gil McVean (2012).
“De novo assembly and genotyping of variants using colored de Bruijn
graphs”. In: Nature Genetics 44, pp. 226–232.
Jia, Kun et al. (2017). “Preliminary Transcriptome Analysis of Mature Biofilm
and Planktonic Cells of Salmonella Enteritidis Exposure to Acid Stress”. In:
Front Microbiol 8, pp. 1861–1861. DOI: 10.3389/fmicb.2017.01861.
Johnson, James R. et al. (2007). “Antimicrobial drug-resistant Escherichia coli
from humans and poultry products, Minnesota and Wisconsin, 2002-2004”.
In: Emerg Infect Dis 13.6, pp. 838–846. DOI: 10.3201/eid1306.061576.
68
Kimura, M. (1980). “A simple method for estimating evolutionary rates of base
substitutions through comparative studies of nucleotide sequences”. In: J
Mol Evol 16.2, pp. 111–120.
Kruskal, J. B. (1964a). “Multidimensional scaling by optimizing goodness of fit
to a nonmetric hypothesis”. In: Psychometrika 29.1, pp. 1–27. DOI: 10.1007/
BF02289565.
— (1964b). “Nonmetric multidimensional scaling: A numerical method”. In:
Psychometrika 29.2, pp. 115–129. DOI: 10.1007/BF02289694.
Kwong, J. C. et al. (2016). “Prospective Whole-Genome Sequencing Enhances
National Surveillance of Listeria monocytogenes”. In: J Clin Microbiol 54.2,
pp. 333–42. DOI: 10.1128/JCM.02344-15.
Leekitcharoenphon, P. et al. (2016). “Global Genomic Epidemiology of
Salmonella enterica Serovar Typhimurium DT104”. In: Appl Environ Microbiol
82.8, pp. 2516–26. DOI: 10.1128/AEM.03821-15.
Li, H. et al. (2009). “The Sequence Alignment/Map format and SAMtools”.
In: Bioinformatics 25.16, pp. 2078–9. DOI: 10 . 1093 / bioinformatics /
btp352.
Lozupone, Catherine and Rob Knight (2005). “UniFrac: a new phylogenetic
method for comparing microbial communities”. In: Appl Environ Microbiol
71.12, pp. 8228–8235. DOI: 10.1128/AEM.71.12.8228-8235.2005.
Madec, Jean-Yves, Benoit Doublet, Cecile Ponsin, Axel Cloeckaert, and Marisa
Haenni (2011). “Extended-spectrum beta-lactamase blaCTX-M-1 gene car-
ried on an IncI1 plasmid in multidrug-resistant Salmonella enterica serovar
Typhimurium DT104 in cattle in France”. In: Journal of Antimicrobial
Chemotherapy 66.4, pp. 942–944. DOI: 10.1093/jac/dkr014. eprint: http:
//oup.prod.sis.lan/jac/article-pdf/66/4/942/2160001/
dkr014.pdf.
Madec, J.-Y., M. Haenni, P. Nordmann, and L. Poirel (2017). “Extended-
spectrum β-lactamase/AmpC- and carbapenemase-producing Enterobacteri-
69
aceae in animals: a threat for humans?” In: Clinical Microbiology and Infection
23.11, pp. 826–833. DOI: 10.1016/j.cmi.2017.01.013.
Mather, A. E. et al. (2013). “Distinguishable epidemics of multidrug-resistant
Salmonella Typhimurium DT104 in different hosts”. In: Science 341.6153,
pp. 1514–7. DOI: 10.1126/science.1240578.
Mather, Alison E. et al. (2012). “An ecological approach to assessing the epi-
demiology of antimicrobial resistance in animal and human populations”.
In: Proc Biol Sci 279.1733, pp. 1630–1639. DOI: 10.1098/rspb.2011.1975.
McDermott, Patrick F. et al. (2016). “Whole-Genome Sequencing for Detect-
ing Antimicrobial Resistance in Nontyphoidal Salmonella”. In: Antimicrobial
Agents and Chemotherapy 60.9, pp. 5515–5520. DOI: 10.1128/AAC.01030-
16. eprint: https://aac.asm.org/content/60/9/5515.full.pdf.
Oksanen, Jari et al. (2017). vegan: Community Ecology Package. R package version
2.4-2.
PLINK/Seq (2014). “PLINK/Seq v. 0.10. https://atgu.mgh.harvard.edu/plinkseq/”.
In:
Price, Lance B. et al. (2012). “Staphylococcus aureus CC398: Host Adaptation and
Emergence of Methicillin Resistance in Livestock”. In: mBio 3.1. Ed. by Fer-
nando Baquero. DOI: 10.1128/mBio.00305-11. eprint: https://mbio.
asm.org/content/3/1/e00305-11.full.pdf.
R Core Team (2016). R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing. Vienna, Austria.
Rambaut, A. (2013). Analysis of variable sites only in BEAST or MrBayes.
https://groups.google.com/forum/#!topic/beast-users/V5vRghILMfw.
Rambaut, A., T. T. Lam, L. Max Carvalho, and O. G. Pybus (2016). “Exploring the
temporal structure of heterochronous sequences using TempEst (formerly
Path-O-Gen)”. In: Virus Evol 2.1, vew007. DOI: 10.1093/ve/vew007.
70
Ribot, Efrain M., Rachel K. Wierzba, Frederick J. Angulo, and Timothy J. Barrett
(2002). “Salmonella enterica serotype Typhimurium DT104 isolated from hu-
mans, United States, 1985, 1990, and 1995”. In: Emerg Infect Dis 8.4, pp. 387–
391. DOI: 10.3201/eid0804.010202.
Scallan, E. et al. (2011). “Foodborne illness acquired in the United States–major
pathogens”. In: Emerg Infect Dis 17.1, pp. 7–15. DOI: 10.3201/eid1701.
P1110110.3201/eid1701.091101p1.
Silbergeld, Ellen K., Jay Graham, and Lance B. Price (2008). “Industrial Food An-
imal Production, Antimicrobial Resistance, and Human Health”. In: Annual
Review of Public Health 29.1. PMID: 18348709, pp. 151–169. DOI: 10.1146/
annurev.publhealth.29.020907.090904. eprint: https://doi.
org/10.1146/annurev.publhealth.29.020907.090904.
Spoor, Laura E. et al. (2013). “Livestock Origin for a Human Pandemic Clone
of Community-Associated Methicillin-Resistant Staphylococcus aureus”. In:
mBio 4.4. Ed. by Fernando Baquero. DOI: 10.1128/mBio.00356- 13.
eprint: https://mbio.asm.org/content/4/4/e00356-13.full.
pdf.
Strachan, Norval J. C. et al. (2015). “Whole Genome Sequencing demonstrates
that Geographic Variation of Escherichia coli O157 Genotypes Dominates
Host Association”. In: Scientific Reports 5. Article, p. 14145.
Tamura, Koichiro, Glen Stecher, Daniel Peterson, Alan Filipski, and Sudhir Ku-
mar (2013). “MEGA6: Molecular Evolutionary Genetics Analysis version
6.0”. In: Mol Biol Evol 30.12, pp. 2725–2729. DOI: 10.1093/molbev/mst197.
Tavare, Simon. “Some probabilistic and statistical problems in the analysis of
DNA sequences”. In: Lectures on mathematics in the life sciences 17.2, pp. 57–
86.
Taylor, Angela J. et al. (2015). “Characterization of Foodborne Outbreaks of
Salmonella enterica Serovar Enteritidis with Whole-Genome Sequencing Sin-
gle Nucleotide Polymorphism-Based Analysis for Surveillance and Out-
break Detection”. In: Journal of Clinical Microbiology 53.10. Ed. by D. J.
Diekema, pp. 3334–3340. DOI: 10.1128/JCM.01280-15. eprint: https:
//jcm.asm.org/content/53/10/3334.full.pdf.
71
Van Boeckel, T. P. et al. (2015). “Global trends in antimicrobial use in food an-
imals”. In: Proc Natl Acad Sci U S A 112.18, pp. 5649–54. DOI: 10.1073/
pnas.1503141112.
Ward, M. J. et al. (2014). “Time-Scaled Evolutionary Analysis of the Trans-
mission and Antibiotic Resistance Dynamics of Staphylococcus aureus Clonal
Complex 398”. In: Applied and Environmental Microbiology 80.23. Ed. by C. A.
Elkins, pp. 7275–7282. DOI: 10.1128/AEM.01777- 14. eprint: https:
//aem.asm.org/content/80/23/7275.full.pdf.
White, David G. et al. (2001). “The Isolation of Antibiotic-Resistant Salmonella
from Retail Ground Meats”. In: New England Journal of Medicine 345.16.
PMID: 11642230, pp. 1147–1154. DOI: 10.1056/NEJMoa010315. eprint:
https://doi.org/10.1056/NEJMoa010315.
WHO (2014). Antimicrobial resistance: global report on surveillance 2014. WHO,
Geneva, Switzerland.
— (2015). WHO estimates of the global burden of foodborne diseases, 2007-2015.
WHO, Geneva, Switzerland.
Wong, Vanessa K. et al. (2015). “Phylogeographical analysis of the dominant
multidrug-resistant H58 clade of Salmonella Typhi identifies inter- and in-
tracontinental transmission events”. In: Nat Genet 47.6, pp. 632–639. DOI:
10.1038/ng.3281.
Zankari, Ea et al. (2012). “Genotyping using whole-genome sequencing is a re-
alistic alternative to surveillance based on phenotypic antimicrobial suscep-
tibility testing”. In: Journal of Antimicrobial Chemotherapy 68.4, pp. 771–777.
DOI: 10.1093/jac/dks496. eprint: http://oup.prod.sis.lan/jac/
article-pdf/68/4/771/2083079/dks496.pdf.
Zhang, S. et al. (2015). “Salmonella serotype determination utilizing high-
throughput genome sequencing data”. In: J Clin Microbiol 53.5, pp. 1685–92.
DOI: 10.1128/JCM.00323-15.
72
Zhao, S. et al. (2016). “Whole-Genome Sequencing Analysis Accurately Predicts
Antimicrobial Resistance Phenotypes in Campylobacter spp.” In: Appl Environ
Microbiol 82.2, pp. 459–466. DOI: 10.1128/AEM.02873-15.
73
CHAPTER 3
IDENTIFICATION OF NOVEL MOBILIZED COLISTIN RESISTANCE
GENE MCR-9 IN A MULTIDRUG-RESISTANT, COLISTIN-SUSCEPTIBLE
SALMONELLA ENTERICA SEROTYPE TYPHIMURIUM ISOLATE1
1FROM CARROLL, LAURA M., AHMED GABALLA, CLAUDIA GULDIMANN,
GENEVIEVE SULLIVAN, LORY O. HENDERSON, AND MARTIN WIEDMANN (2019).
”IDENTIFICATION OF NOVEL MOBILIZED COLISTIN RESISTANCE GENE MCR-9 IN A
MULTIDRUG-RESISTANT, COLISTIN-SUSCEPTIBLE SALMONELLA ENTERICA SEROTYPE
TYPHIMURIUM ISOLATE”. IN: MBIO 10, PP. E00853-19. DOI: 10.1128/MBIO.00853-19.
74
3.1 Abstract
Mobilized colistin resistance (mcr) genes are plasmid-borne genes that confer re-
sistance to colistin, an antibiotic used to treat severe bacterial infections. To date,
eight known mcr homologues have been described (mcr-1 to -8). Here, we de-
scribe mcr-9, a novel mcr homologue detected during routine in silico screening
of sequenced Salmonella genomes for antimicrobial resistance genes. The amino
acid sequence of mcr-9, detected in a multidrug-resistant (MDR) Salmonella en-
terica serotype Typhimurium (S. Typhimurium) strain isolated from a human
patient in Washington State in 2010, most closely resembled mcr-3, aligning
with 64.5% amino acid identity and 99.5% coverage using Translated Nucleotide
BLAST (tblastn). The S. Typhimurium strain was tested for phenotypic resis-
tance to colistin and was found to be sensitive at the 2-mg/liter European Com-
mittee on Antimicrobial Susceptibility Testing breakpoint under the tested con-
ditions. mcr-9 was cloned in colistin-susceptible Escherichia coli NEB5α under
an IPTG (isopropyl-β-d-thiogalactopyranoside)-induced promoter to determine
whether it was capable of conferring resistance to colistin when expressed in
a heterologous host. Expression of mcr-9 conferred resistance to colistin in E.
coli NEB5α at 1, 2, and 2.5mg/liter colistin, albeit at a lower level than mcr-
3. Pairwise comparisons of the predicted protein structures associated with all
nine mcr homologues (Mcr-1 to -9) revealed that Mcr-9, Mcr-3, Mcr-4, and Mcr-
7 share a high degree of similarity at the structural level. Our results indicate
that mcr-9 is capable of conferring phenotypic resistance to colistin in Enter-
obacteriaceae and should be immediately considered when monitoring plasmid-
mediated colistin resistance.
IMPORTANCE: Colistin is a last-resort antibiotic that is used to treat se-
75
vere infections caused by MDR and extensively drug-resistant (XDR) bac-
teria. The World Health Organization (WHO) has designated colistin as
a ”highest priority critically important antimicrobial for human medicine”
(WHO, Critically Important Antimicrobials for Human Medicine, 5th re-
vision, 2017, https://www.who.int/foodsafety/publications/antimicrobials-
fifth/en/), as it is often one of the only therapies available for treating serious
bacterial infections in critically ill patients. Plasmid-borne mcr genes that con-
fer resistance to colistin pose a threat to public health at an international scale,
as they can be transmitted via horizontal gene transfer and have the potential
to spread globally. Therefore, the establishment of a complete reference of mcr
genes that can be used to screen for plasmid-mediated colistin resistance is es-
sential for developing effective control strategies.
3.2 Observation
Until recently, bacterial resistance to colistin, a last-resort antibiotic reserved for
treating severe infections, was thought to be acquired solely via chromosomal
point mutations (Liu et al. 2016). However, in 2015, plasmid-mediated colistin
resistance gene mcr-1 was described in Escherichia coli (Liu et al. 2016). Mcr-
1 is a phosphoethanolamine transferase that modifies cell membrane lipid A
head groups with a phosphoethanolamine residue, reducing affinity to colistin
(Anandan et al. 2017). Since then, seven additional mcr homologues (mcr-2 to
-8) have been identified in Enterobacteriaceae (Xavier et al. 2016; Yin et al. 2017;
Carattoli, Villa, et al. 2017; Borowiak et al. 2017; AbuOun et al. 2017; Yang et al.
2018; Wang et al. 2018). Here, we report novel mcr homologue mcr-9, which
was identified in a Salmonella enterica serotype Typhimurium (S. Typhimurium)
76
genome.
3.2.1 In silico identification of mcr-9 in an MDR S. Ty-
phimurium genome
MDR S. Typhimurium strain HUM TYPH WA 10 R9 3274 (NCBI RefSeq ac-
cession no. GCF 002091095.1) was isolated from a patient in Washing-
ton State in 2010 (Carroll, Wiedmann, et al. 2017). It had previously
been tested for resistance to a panel of 12 antimicrobials that did not in-
clude colistin (Carroll, Wiedmann, et al. 2017). ABRicate version 0.8
(https://github.com/tseemann/abricate) identified 20 antimicrobial resistance
(AMR) genes in the HUM TYPH WA 10 R9 3274 assembly using the ResFinder
database (accessed 11 June 2018) (Zankari et al. 2012) and minimum identity
and coverage thresholds of 75 and 50% (Carroll, Wiedmann, et al. 2017), respec-
tively, none of which had been previously described to confer colistin resistance
(see Table S1 in the supplemental material). Four plasmid replicons, including
IncHI2 and IncHI2A, were detected with at least 80% identity and 60% coverage
using ABRicate and PlasmidFinder (accessed 11 June 2018 [Table S1]) (Carattoli,
Zankari, et al. 2014).
To detect mcr-9 in the HUM TYPH WA 10 R9 3274 assembly, all col-
istin resistance-conferring nucleotide sequences available in ResFinder (52 se-
quences, accessed 22 January 2019 [see Table S2 in the supplemental material])
were translated into amino acid sequences using EMBOSS Transeq (reading
frame 1 [https://www.ebi.ac.uk/Tools/st/emboss transeq/]). The implemen-
tation of Translated Nucleotide BLAST (tblastn) (Camacho et al. 2009) in BTyper
77
version 2.3.2 (Carroll, Kovac, et al. 2017) selected mcr-3.17 as the highest-scoring
mcr allele, which aligned to mcr-9 with 64.5% amino acid identity and 99.5%
coverage (Table S1).
MUSCLE version 3.8.31 (Edgar 2004) was used to construct alignments of
the amino acid sequence of mcr-9 (NCBI protein accession no. WP 001572373.1)
and the following: (i) the 52 mcr amino acid sequences from ResFinder (53
sequences [Table S2]), (ii) the top 100 hits produced when mcr-9 was queried
against NCBI’s non-redundant protein (nr) database using the Protein BLAST
(blastp) web server (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins
[accessed 22 January 2019]; 152 sequences excluding mcr-9’s self-match [see Ta-
ble S3 in the supplemental material]), and (iii) amino acid sequences of 61 pu-
tative phosphoethanolamine transferases used in other papers describing novel
mcr genes (Yin et al. 2017; Carattoli, Villa, et al. 2017; Yang et al. 2018; Wang
et al. 2018) (213 sequences [see Table S4 in the supplemental material]). For each
alignment, RAxML version 8.2.12 (Stamatakis 2014) was used to construct a
phylogeny using the PROTGAMMAAUTO method and 1,000 bootstrap repli-
cates.
The amino acid sequence of mcr-9 most closely resembled those of mcr-3 and
mcr-7 (Figure 3.1; see Fig. S1 in the supplemental material). However, the S.
Typhimurium isolate in which mcr-9 was detected was not resistant to colistin
at the > 2-mg/liter European Committee on Antimicrobial Susceptibility Test-
ing (EUCAST [http://www.eucast.org]) breakpoint when a broth microdilution
method was used to determine the colistin MIC (see Table S5 in the supplemen-
tal material).
78
100 mcr-2.2_1_MF176239_1
40 mcr-2.1_1_LT598652_1
59 mcr-6.1_1_MF176240_1
87 mcr-1.10_1_MF176238_1mcr-1.13_1_MG384739_1
81 mcr-1.6_1_KY352406_1
5656 mcr-1.2_1_KX236309_167 mcr-1.12_1_LC337668_1
mcr-1.8_1_KY683842_1
74 mcr-1.3_1_KU934208_1
0 mcr-1.4_1_KY041856_1
7
0 mcr-1.14_1_LS398440_1mcr-1.9_1_KY964067_1
1 mcr-1.1_1_KP347127_1
1 mcr-1.11_1_KY853650_1
7 mcr-1.5_1_KY283125_1
mcr-1.7_1_KY488488_1
100 mcr-5.1_1_KY807921_1
mcr-5.2_1_MG384740_1
100 mcr-8_1_MG736312_1
100 100
mcr-4.4_1_MG822665_1
36 mcr-4.5_1_MG822664_1
48 mcr-4.2_1_MG822663_1
93 65 mcr-4.1_1_MF543359_153 mcr-4.6_1_MH423812_1
mcr-4.3_1_MG026621_1
100 mcr-9_WP_001572373.1
84 mcr-7.1_1_MG267386_1mcr-3.17_1_MH332767_1
100 mcr-3.6_1_MF598076_1
44 mcr-3.15_1_MH332765_1
5 mcr-3.8_1_MF598078_1328
38 mcr-3.25_1_NG060585_1
47 mcr-3.14_1_MH332764_1
30 mcr-3.3_1_MF495680_1
2127 mcr-3.13_1_MH332763_169 mcr-3.18_1_MH332768_1
mcr-3.16_1_MH332766_1
100 mcr-3.12_1_MG564491_1
55 mcr-3.7_1_MF598077_1
57 mcr-3.9_1_MF598080_1
44 mcr-3.10_1_MG214531_1mcr-3.21_1_NG060582_1
79 mcr-3.24_1_NG060580_1
548 mcr-3.22_1_NG060581_1
30 mcr-3.11_1_MG489958_1
36 mcr-3.4_1_FLXA01000011_1
9 mcr-3.20_1_NG055493_1
11 mcr-3.23_1_NG060583_1
3 mcr-3.1_1_KY924928_1
13 mcr-3.19_1_NG055497_1
44 mcr-3.5_1_MF489760_1
mcr-3.2_1_NMWW01000143_1
0.2
Figure 3.1: Comparison of mcr-9 to all previously described mcr homologues,
based on amino acid sequence. The maximum likelihood phylogeny was con-
structed using RAxML version 8.2.12 with the amino acid sequences of novel
mobilized colistin resistance gene mcr-9 (in blue) and all previously described
mcr genes (mcr-1 to -8 [in black]). The phylogeny is rooted at the midpoint, with
branch lengths reported in substitutions per site. Branch labels correspond to
bootstrap support percentages out of 1,000 replicates.
3.2.2 mcr-9 confers resistance to colistin when cloned into
colistin-susceptible E. coli NEB5α
Coding regions of mcr-9 and mcr-3 were cloned under the control of an IPTG
(isopropyl-β-d-thiogalactopyranoside)-induced SPAC/lacOid promoter and ex-
pressed in E. coli NEB5α (see Text S1 in the supplemental material). Colistin
79
killing assays (Figure 3.2; see Figure S2 in the supplemental material) were per-
formed by incubating E. coli harboring the empty pLIV2 vector (negative con-
trol), pLIV2 with mcr-3 (positive control), or pLIV2 with mcr-9 with different
concentrations of colistin (0, 1, 2, 2.5, and 5 mg/liter). E. coli cells harboring the
empty vector failed to survive at all tested colistin concentrations > 0 mg/liter.
While mcr-3 expression conferred clinical levels of colistin resistance (i.e., be-
yond the 2-mg/liter EUCAST breakpoint) in E. coli at all tested concentrations,
mcr-9 expression conferred clinical resistance at 1, 2, and 2.5 mg/liter, but not 5
mg/liter of colistin (Figure 3.2; Figure S2).
3.2.3 Mcr-3, Mcr-4, Mcr-7, and Mcr-9 are highly similar at the
structural level
Three-dimensional (3D) structural models of all nine Mcr homologues (Figure
3.3) based on EptA (Anandan et al. 2017) were constructed using the Phyre2
server (Kelley et al. 2015) and visualized using UCSF Chimera (Pettersen et
al. 2004). Congruent with the phylogeny based on their amino acid sequences
(Figure 3.1), comparisons of different Mcr protein models using Dali (Holm and
Laakso 2016) revealed that Mcr-3, Mcr-4, Mcr-7, and Mcr-9 were closely related
at the structural level (Figure 3.4).
Proteins encoded by mcr-1 to -9 revealed high levels of conservation for both
the membrane-anchored domain and the soluble catalytic domain (Figure 3.3).
Interestingly, analyses of structural models of the nine Mcr homologues using
the ESPript 3 server (Robert and Gouet 2014) showed that both amino acids
and structural elements were conserved on the C-terminal catalytic domain,
80
Figure 3.2: Colistin killing assay of E. coli NEB5α harboring a pLIV2 empty vec-
tor (negative control), mcr-3 (positive control), or mcr-9, expressed under the
control of the IPTG-controlled SPAC/lacOid promoter. Cells were grown in
MH-II (Mueller-Hinton II) medium with IPTG to the mid-exponential phase.
Colistin was added at concentrations of 0, 1, 2, 2.5, or 5 mg/liter, and the bac-
teria were incubated at 37◦C for 1h. The samples were diluted in phosphate-
buffered saline (PBS) and plated on LB agar plates for the determination of CFU.
Log CFU reduction was calculated by comparing CFU after each treatment to
CFU levels obtained at 0 mg/liter colistin, using three independent biological
replicates. Asterisks denote significant differences compared to empty vector
treatment (P < 0.05 by Student’s t test relative to the concentration’s respective
negative control after a Bonferroni correction).
81
A B
A B
Figure 2. (A) Colistin killing assay of E. coli NEB5α harboring a pLIV2 empty vector (negative control), pLIV2 with 
mcr-3 (positive control), or pLIV2 with mcr-9, expressed under the control of the IPTG controlled SPAC/lacOid
promoter. Cells were grown in MH-II media with IPTG to mid-exponential phase. Colistin was added at concentrations 
of 0, 1, 2, 2.5, or 5 mg/L, and the bacteria were incubated at 37⁰C for 1 h. The samples were diluted in PBS and plated 
on LB agar plates for the determination of CFU by direct colony count. Log CFU reduction after treatment was 
calculated for three independent biological replicates. Asterisks denote significant differences compared to empty 
vector treatment (Student's t-test relative to the concentration's respective negative control after a Bonferroni correction 
P < 0.05). (B) In silico modeling of Mcr-9, Mcr-2 and Mcr-3 based on lipooligosaccharide phosphoethanolamine
transferase, EptA. Modeling was done using the Phyre2 server, and structures were viewed and edited using UCSF 
Chimera. Structural models show conservation of two EptA domains: trans-membrane anchored and soluble 
periplasmic domains. Mcr-9 structure shows the putative active site residues as derived from the Mcr-2 active site (24).
A B
SupplSeumpepnSlteuampl peFlniegtmaulre Fen itSga3ul .rF Iein gS su3irl.ie cI noS  3smi. loIicndo es limilniocgod o emfli onaldgle oplifun bagl ilos phfu eabdlll  iMpshucebrdlpi sMrhoectedri pnMrso c(trMepincrrso- t1(eM itnocs r- -(81M)  taconr --d18  M)t oac n-rd8-9 )M  abncadrs-e 9Md  bcoarns- 9ed b oasne d on 
lipoolliigpoosoallciipcghoaosraliicdgceohspahrciocdshepaphrhiodoeetshpahnooesltpahhmaonienotelhaatmrnaoinlsaefmetrriannseseft, erEaranpssteAfe, .rE aMpsetoA,d E.e MlpitnAogd.  weMlainos gdd eowlnianesg  u dwsoianseg  d utohsnein ePg uh tsyhirene gP2  htshyeer evP2eh rsy, erarenv2de  srs,e tarvunecdrt ,us atrrenusdc  tsutrruesc tures 
Figure 2. (A) Colistin killing assay of E. coli NEBw5αe hrea rvbwioerwirneeg wvd aie  pLIV2 emptymcr-3 (positive control), or pLIV2 with mcr-9, expressed under theaerwne c deov dnieet drawointled od fe   autdhns
 iveetdien  de
cgtd ouUir IPTGsteCi
(
 n
ndSecog uF
gative control), pLIV2 with 
n Ust rCioCnhlglSie mFUd  eSCrPShaA.iFmC C/elahrcaiOm. iedra. Supplemental Figure S3. In silico modeling of all published Mcr proteins (Mcr-1 to -8) and Mcr-9 based on 
promoter. Cells were grown in MH-II media with IPTG to mid-exponential phase. Colistin was added at concentrations lipooligosaccharide phosphoethanolamine transferase, EptA. Modeling was done using the Phyre2 server, and structures 
of 0, 1, 2, 2.5, or 5 mg/L, and the bacteria were incubated at 37⁰C for 1 h. The samples were diluted in PBS and plated 
on LB agar plates for the determination of CFU by direct colony count. Log CFU reduction after treatment was were viewed and edited using UCSF Chimera.
calculated for three independent biological replicates. Asterisks denote significant differences compared to empty 
vector treatment (Student's t-test relative to the concentration's respective negative control after a Bonferroni correction 
P < 0.05). (B) In silico modeling of Mcr-9, Mcr-2 and Mcr-3 based on lipooligosaccharide phosphoethanolamine
transferase, EptA. Modeling was done using the Phyre2 server, and structures were viewed and edited using UCSF 
Chimera. Structural models show conservation of two EptA domains: trans-membrane anchored and soluble 
periplasmic domains. Mcr-9 structure shows the putative active site residues as derived from the Mcr-2 active site (24).
Figure 3.3: Structural models of all published Mcr proteins (Mcr-1 to -8) and
SupplemSeunptpalle Fmiegnutrael  SF3ig. uInr es iSli3c.o I nm soidliecloin mg oodfe alliln gp uobfl aislhl epdu bMlicshrepdr oMtecinr sp r(oMtecirn-s1 M (tMoc -cr8-r-9)1 ,a tnboda  -sM8e)dc arn-o9dn  bMalsciepr-do9 oo bnlai gseods aocnc haride phosphoethanolamine transferase EptA.
lipooligolispaococlhiagroisdaecpchhaorsipdheopehthoaspnhoolaemthianneotlraamnsinfeeratrsaen, sEfeprtaAse. ,M EopdtAel.i nMgo wdealsi ndgo nwea usM sdinogdn eet hluses iwPnhgey rtrheee2c  Poshneyrsvrteer2ru,  csaetnerdvd esrut,r suaicntdug rsettrhsu ectPurheys re2 server, and structures were viewed
were viewweerde  avniedw eeddit aendd u esidnigte dU CusSinFg C UhCimSeFr aC.himera.Figure 2. (A) ColistSinu kpilplSilneugmp apeslsneaatmyna olde fnF Eetiagd. lcui otFreleid g NSuEu3rB.es I5 iSnα3  gsh.i alUIrinbcC oosri SimlniFcgoo Cad  mephlLionImdVge2e lo irenfam ga.p lolSt yfpt ravuuleblcc ltpitosurh br(enladiesl ghMmaetdicovr eMdp comcr Supplemental Figure S3.m Icnr silico modeling of all published Mcr proteins e(r
cMolr
nsttpersoirhnol)sto,e  p(wiMnLsIcV (orM2-n1 wcs tireot-hr1 - v 8tao)t  ai-o8n)nd a Monfdc trM-w9c obr-a9s ebda soend  on 
-3 (positive control), olri ppoLoIVlig2o wsiatchc har-i9d,e expphroesspsehdo uethdaern othlea mcoinnterotr oafn tshfee rIaPsTeG,  EcopntAtro. lMledo dSePcAlirnC-1g/l  atwocOa -si8d d) oanned  uMsicnrg- 9th bea Psehdy roen2  server, and structures 
promoter. lCipeollos lwigeorelsi agprcoocowhlniag rioswerind Me
acEHpcph-hIotIaA srmpidehdedooiaepm thwhoaiatsinhpn oIhsPlo:aTemtGtrhia tnaonen smotmrliadaem-nemsxifnpebeorrnataesrnnaetn,ei asE-lfa pepnrthAacashs.ee oM,.  rCEeoopddltieAsatlii.nn nM dgw oawssdo aealsdlui ddnbeogdlen  waetp  acuesos rndiincpoegnln aettrhs aumetsi oiPinnchgsy d trhoeem2  Psaehirynvrsee.r2,  saenrdv esrt,r uacntdu rsetrsu ctures 
of 0, 1, 2, w2.e5r, eo rv 5ie mwwgee/Lre,  avnide w
eth ve dibe wacnteeddr i eaad nwidtee reded iuintseciudnb gua tsUeidnC gaS tU F37 CC⁰SChF ifm oCre h1ri ahm.. eTrhae. samples were diluted in PBS and plated 
on LB agar plates for thde  adnetde remdiintaetdio un soifn Cg FUUC bSy Fd iCrehcti mcoeloran.y count. Log CFU reduction after treatment was 
calculated for three independent biological replicates. Asterisks denote significant differences compared to empty 
vector treatment (Student's t-test relative to the concentration's respective negative 8co2ntrol after a Bonferroni correction P < 0.05). (B) In silico modeling of Mcr-9, Mcr-2 and Mcr-3 based on lipooligosaccharide phosphoethanolamine
transferase, EptA. Modeling was done using the Phyre2 server, and structures were viewed and edited using UCSF 
Chimera. Structural models show conservation of two EptA domains: trans-membrane anchored and soluble 
periplasmic domains. Mcr-9 structure shows the putative active site residues as derived from the Mcr-2 active site (24).
Mcr-6 Mcr-3 Mcr-9
Mcr-7 Mcr-4 Mcr-1
Mcr-8 Mcr-5 Mcr-2
m c r -3 . 2 0
m c r -9
m c r -3 . 2 5
m c r -3 . 1 7
m c r -4 . 4
m c r -4 . 6
m c r -3 . 1 6
m c r -4 . 2
m c r -7 . 1
m c r -5 . 2 70
m c r -3 . 1 3
m c r -3 . 3
m c r -3 . 1 4
m c r -3 . 6
m c r -1 . 5
m c r -1 . 9
m c r -1 . 4
m c r -1 . 8
m c r -1 . 3
m c r -1 . 1 5
m c r -6 . 1
m c r -3 . 7
m c r -3 . 9
m c r -3 . 1 2 65
m c r -3 . 8
m c r -3 . 1 8
m c r -3 . 5
m c r -3 . 1 0
m c r -3 . 2 3
m c r -3 . 2
m c r -3 . 1 9
m c r -3 . 2 2
m c r -3 . 1
m c r -3 . 4
m c r -3 . 1 1
m c r -3 . 2 1
m c r -3 . 2 4
m c r -3 . 1 5 60
m c r -4 . 3
m c r -4 . 5
m c r -4 . 1
m c r -5 . 1
m c r -5 . 3
m c r -8 . 1
m c r -8 . 2
m c r -1 . 1
m c r -1 . 1 2
m c r -1 . 1 3
m c r -1 . 6
m c r -1 . 7
m c r -1 . 1 0
m c r -1 . 2 55
m c r -1 . 1 1
m c r -2 . 1
m c r -2 . 2
.2 .1 1 1 .2 1 02 2 . 1 . 1 .
7 6 3 2 1 2 1 3 1 1 5 3 5 4 1 1 4 1 2 9 2 3 0
1 . .1 .1 1 . 8 . 8 . 5 . 5 . 4 . 4 . 4 . .1 .2 .2 .1 3 . 3 . .2 .1 3 . .2 .1 3 .
5 8 .8 2 .9 .7 .1 5 .3 .8 .4 .9 .5 .6 4 .3 3 .2 .1 .2 6 .6 .4 7 5 -9 0
r - r - -1 - - - - - - - - - - - - - - -
.1 -3 .1 -3 -3 -6 .1 -1 -1 -1 -1 -1 -3 .1 -3 .1 -5 -7 -4 .1 -4 -4 .1 .2 c r .2
c c c r c
r r -1 c r c r r -1 -1 r r r r r r r r -3 -3 -3 -3 rc c c r c c c c c c c c c r c r c r c r c c
r r -3c c r
-3 c r r -3c c r
-3 c r -3 r -3 r r r -1 r r r r r r -3 r -3 r r r -3 r r -3 -3 -3
m m m m m m m m m m m m m m m m m c
r c c r c c c c r c c c c c c c r c c r c c c r c c r r m r
m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m
c
m m m
c c c
m m m
Figure 3.4: Similarity matrix (composed of Dali Z-scores) of all previously de-
scribed Mcr groups (Mcr-1 to -8) and Mcr-9, based on protein structure. The
Dali server was used to perform all-against-all comparisons of 3D structural
models based on all mcr homologues (Figure 3.3); for this analysis, amino acid
sequences of mcr-5.3 and mcr-8.2, which were not available in ResFinder, were
additionally included from the National Database of Antibiotic Resistant Or-
ganisms (NDARO).
83
while only structural elements were conserved on the membrane-anchored N-
terminal domain (Figure 3.5).
3.2.4 Numerous genera of Enterobacteriaceae harbor mcr-9 on
IncHI2 plasmids.
blastp searches of mcr-9 against NCBI’s nr database revealed that mcr-9 was
present in multiple genera of Enterobacteriaceae (Table S3). The 10 highest-
scoring hits in the nr database matched mcr-9 with at least 99% amino acid
identity (including mcr-9 characterized here [Table S3 and Figure S1A]); the
amino acid identities of the remaining hits with high query coverage (> 90%)
dropped below 88% identity (Table S3 and Figure S1A). mcr-9 was detected in
335 genomes linked to NCBI identical protein groups (IPGs) associated with
the 10 highest-scoring protein accession numbers (accessed 23 January 2019
[see Tables S3 and S6 in the supplemental material]). Analysis of the mcr-9
promoter region in 321 of these genomes (Text S1) showed conserved puta-
tive σ70 family-dependent -35 and -10 regions and an inverted repeat (Figure
3.6). The conserved DNA motif in the mcr-9 promoter is likely a recognition se-
quence for a transcription regulator, suggesting that additional factors or induc-
tion/derepression conditions might be needed for full expression of wild-type
mcr-9. Promoter variation (Huang et al. 2018) and testing conditions (Zhang
et al. 2017; Gwozdzinski et al. 2018) have been shown to influence mcr expres-
sion and the colistin MIC, which may explain why the S. Typhimurium strain
queried here was colistin susceptible under the tested conditions.
Of the 335 genomes in which mcr-9 was detected, 65 had at least one plas-
84
3 3
4 4
5
5
1
1 2 2
Figure 3.5: Location of Mcr-9 secondary structure elements within the align-
ment of Mcr amino acid sequences, constructed using the ESPript 3 server. The
top track denotes Mcr-9 secondary structure elements (alpha helixes and beta
sheets). Green digits below the alignment denote cysteine residues forming a
disulfide bridge (e.g., 1 forms a bridge with 1, 2 with 2, etc.). Within the amino
acid sequence alignment itself, a strict identity (i.e., identical amino acid residue
at a site) is denoted by a red box and a white character. A yellow box around
an amino acid residue denotes similarity across groups, where groups were de-
fined using the default ”all” specification in ESPript 3 (ESPript 3 total score [TSc]
> in-group threshold [ThIn]), while a residue in boldface denotes similarity within
a group (ESPript 3 in-group score [ISc] > ThIn).
85
IRR-IS5 IRR-IS6
Tnase mcr-9
aagcCTCGTTAAGGTTAACCTAAGATTTCAGaatgataatctctgctTTGCAG-(17bp)-ATATTA-(25bp)-ATG
-35 -10 M
Figure 3.6: Organization of the mcr-9 locus in S. Typhimurium. An unknown
function cupin fold metalloprotein is encoded by the gene downstream of mcr-9
(unlabeled black arrow). The mcr-9 locus is flanked by two different terminal
repeat sequences (IRR) from the IS5 (orange box) and IS6 (red box) families.
The mcr-9 upstream region contains highly conserved putative -35 and -10 σ70-
dependent promoter elements (blue boxes and blue text). Moreover, the mcr-9
promoter region contains an inverted repeat motif (green box, green text, and
sequence logo) that is conserved in more than 95% of 321 mcr-9 genes, as shown
by the sequence logo (constructed using WebLogo) (Crooks et al. 2004).
mid replicon (detected using ABRicate and PlasmidFinder as described above)
present on the same contig as mcr-9; in 59 of these 65 genomes, IncHI2 and/or
IncHI2A replicons were detected on the same contig as mcr-9 (Table S6). In 32
of the 37 closed genomes in which it was detected, mcr-9 was harbored on a
plasmid (Table S6). These results indicate that mcr-9 has the potential to reduce
susceptibility to colistin, up to and beyond the EUCAST breakpoint, and can be
found extrachromosomally in multiple species of Enterobacteriaceae, making it a
relevant threat to public health. Future studies querying the plasmids that har-
bor mcr-9 (e.g., transferability, stability, and copy number variation) will offer
further insight into the potential role that mcr-9 plays in the dissemination of
colistin resistance worldwide.
86
3.2.5 Accession number(s) and supplemental material
The nucleotide and amino acid sequences of mcr-9 are available under
NCBI reference sequence accession no. NZ NAAN01000063.1 (NCBI pro-
tein accession no. WP 001572373.1). Supplemental material is available at
https://mbio.asm.org/content/10/3/e00853-19/figures-only.
3.3 Acknowledgments
This material is based on work supported by the National Science Foundation
(NSF) Graduate Research Fellowship Program under grant no. DGE-1650441,
with additional funding provided by an NSF Graduate Research Opportunities
Worldwide (GROW) grant through a partnership with the Swiss National Sci-
ence Foundation (SNF).
We thank Julie Siler (Cornell University) for providing colistin resistance
testing materials.
3.4 References
AbuOun, M. et al. (2017). “mcr-1 and mcr-2 variant genes identified in Moraxella
species isolated from pigs in Great Britain from 2014 to 2015”. In: J Antimicrob
Chemother 72.10, pp. 2745–2749. DOI: 10.1093/jac/dkx286.
Anandan, A. et al. (2017). “Structure of a lipid A phosphoethanolamine trans-
ferase suggests how conformational changes govern substrate binding”. In:
Proc Natl Acad Sci U S A 114.9, pp. 2218–2223. DOI: 10 . 1073 / pnas .
1612927114.
87
Borowiak, M. et al. (2017). “Identification of a novel transposon-associated phos-
phoethanolamine transferase gene, mcr-5, conferring colistin resistance in
d-tartrate fermenting Salmonella enterica subsp. enterica serovar Paratyphi
B”. In: J Antimicrob Chemother 72.12, pp. 3317–3324. DOI: 10.1093/jac/
dkx327.
Camacho, C. et al. (2009). “BLAST+: architecture and applications”. In: BMC
Bioinformatics 10, p. 421. DOI: 10.1186/1471-2105-10-421.
Carattoli, A., L. Villa, et al. (2017). “Novel plasmid-mediated colistin resistance
mcr-4 gene in Salmonella and Escherichia coli, Italy 2013, Spain and Belgium,
2015 to 2016”. In: Euro Surveill 22.31. DOI: 10.2807/1560- 7917.ES.
2017.22.31.30589.
Carattoli, A., E. Zankari, et al. (2014). “In silico detection and typing of plasmids
using PlasmidFinder and plasmid multilocus sequence typing”. In: Antimi-
crob Agents Chemother 58.7, pp. 3895–903. DOI: 10.1128/AAC.02412-14.
Carroll, L. M., J. Kovac, R. A. Miller, and M. Wiedmann (2017). “Rapid,
high-throughput identification of anthrax-causing and emetic Bacillus cereus
group genome assemblies using BTyper, a computational tool for virulence-
based classification of Bacillus cereus group isolates using nucleotide se-
quencing data”. In: Appl Environ Microbiol. DOI: 10.1128/AEM.01096-
17.
Carroll, L. M., M. Wiedmann, et al. (2017). “Whole-Genome Sequencing of
Drug-Resistant Salmonella enterica Isolates from Dairy Cattle and Humans
in New York and Washington States Reveals Source and Geographic Associ-
ations”. In: Appl Environ Microbiol 83.12. DOI: 10.1128/AEM.00140-17.
Crooks, G. E., G. Hon, J. M. Chandonia, and S. E. Brenner (2004). “WebLogo: a
sequence logo generator”. In: Genome Res 14.6, pp. 1188–90. DOI: 10.1101/
gr.849004.
Edgar, R. C. (2004). “MUSCLE: multiple sequence alignment with high accuracy
and high throughput”. In: Nucleic Acids Res 32.5, pp. 1792–7. DOI: 10.1093/
nar/gkh340.
88
Gwozdzinski, K., S. Azarderakhsh, C. Imirzalioglu, L. Falgenhauer, and T.
Chakraborty (2018). “An Improved Medium for Colistin Susceptibility Test-
ing”. In: J Clin Microbiol 56.5. DOI: 10.1128/JCM.01950-17.
Holm, L. and L. M. Laakso (2016). “Dali server update”. In: Nucleic Acids Res
44.W1, W351–5. DOI: 10.1093/nar/gkw357.
Huang, B. et al. (2018). “Promoter Variation and Gene Expression of mcr-1-
Harboring Plasmids in Clinical Isolates of Escherichia coli and Klebsiella pneu-
moniae from a Chinese Hospital”. In: Antimicrob Agents Chemother 62.5. DOI:
10.1128/AAC.00018-18.
Kelley, L. A., S. Mezulis, C. M. Yates, M. N. Wass, and M. J. Sternberg (2015).
“The Phyre2 web portal for protein modeling, prediction and analysis”. In:
Nat Protoc 10.6, pp. 845–58. DOI: 10.1038/nprot.2015.053.
Liu, Y. Y. et al. (2016). “Emergence of plasmid-mediated colistin resistance mech-
anism MCR-1 in animals and human beings in China: a microbiological and
molecular biological study”. In: Lancet Infect Dis 16.2, pp. 161–8. DOI: 10.
1016/S1473-3099(15)00424-7.
Pettersen, E. F. et al. (2004). “UCSF Chimera–a visualization system for ex-
ploratory research and analysis”. In: J Comput Chem 25.13, pp. 1605–12. DOI:
10.1002/jcc.20084.
Robert, X. and P. Gouet (2014). “Deciphering key features in protein structures
with the new ENDscript server”. In: Nucleic Acids Res 42.Web Server issue,
W320–4. DOI: 10.1093/nar/gku316.
Stamatakis, A. (2014). “RAxML version 8: a tool for phylogenetic analysis and
post-analysis of large phylogenies”. In: Bioinformatics 30.9, pp. 1312–3. DOI:
10.1093/bioinformatics/btu033.
Wang, X. et al. (2018). “Emergence of a novel mobile colistin resistance gene,
mcr-8, in NDM-producing Klebsiella pneumoniae”. In: Emerg Microbes Infect
7.1, p. 122. DOI: 10.1038/s41426-018-0124-z.
89
Xavier, B. B. et al. (2016). “Identification of a novel plasmid-mediated colistin-
resistance gene, mcr-2, in Escherichia coli, Belgium, June 2016”. In: Euro
Surveill 21.27. DOI: 10.2807/1560-7917.ES.2016.21.27.30280.
Yang, Y. Q., Y. X. Li, C. W. Lei, A. Y. Zhang, and H. N. Wang (2018). “Novel
plasmid-mediated colistin resistance gene mcr-7.1 in Klebsiella pneumoniae”.
In: J Antimicrob Chemother. DOI: 10.1093/jac/dky111.
Yin, W. et al. (2017). “Novel Plasmid-Mediated Colistin Resistance Gene mcr-3
in Escherichia coli”. In: MBio 8.3. DOI: 10.1128/mBio.00543-17.
Zankari, E. et al. (2012). “Identification of acquired antimicrobial resistance
genes”. In: J Antimicrob Chemother 67.11, pp. 2640–4. DOI: 10.1093/jac/
dks261.
Zhang, H. et al. (2017). “Expression characteristics of the plasmid-borne mcr-1
colistin resistance gene”. In: Oncotarget 8.64, pp. 107596–107602. DOI: 10.
18632/oncotarget.22538.
90
CHAPTER 4
RAPID, HIGH-THROUGHPUT IDENTIFICATION OF
ANTHRAX-CAUSING AND EMETIC BACILLUS CEREUS GROUP
GENOME ASSEMBLIES VIA BTYPER, A COMPUTATIONAL TOOL FOR
VIRULENCE-BASED CLASSIFICATION OF BACILLUS CEREUS GROUP
ISOLATES BY USING NUCLEOTIDE SEQUENCING DATA1
1FROM CARROLL, LAURA M., JASNA KOVAC, RACHEL A. MILLER, AND MARTIN
WIEDMANN (2017). ”RAPID, HIGH-THROUGHPUT IDENTIFICATION OF ANTHRAX-
CAUSING AND EMETIC BACILLUS CEREUS GROUP GENOME ASSEMBLIES VIA BTYPER,
A COMPUTATIONAL TOOL FOR VIRULENCE-BASED CLASSIFICATION OF BACILLUS
CEREUS GROUP ISOLATES BY USING NUCLEOTIDE SEQUENCING DATA”. IN: APPLIED
AND ENVIRONMENTAL MICROBIOLOGY 83, PP. E01096-17. DOI: 10.1128/AEM.01096-17.
91
4.1 Abstract
The Bacillus cereus group comprises nine species, several of which are
pathogenic. Differentiating between isolates that may cause disease and those
that do not is a matter of public health and economic importance, but it can be
particularly challenging due to the high genomic similarity within the group.
To this end, we have developed BTyper, a computational tool that employs a
combination of (i) virulence gene-based typing, (ii) multilocus sequence typ-
ing (MLST), (iii) panC clade typing, and (iv) rpoB allelic typing to rapidly clas-
sify B. cereus group isolates using nucleotide sequencing data. BTyper was ap-
plied to a set of 662 B. cereus group genome assemblies to (i) identify anthrax-
associated genes in non-B. anthracis members of the B. cereus group, and (ii)
identify assemblies from B. cereus group strains with emetic potential. With
BTyper, the anthrax toxin genes cya, lef, and pagA were detected in 8 genomes
classified by the NCBI as B. cereus that clustered into two distinct groups us-
ing k-medoids clustering, while either the B. anthracis poly-γ-d-glutamate cap-
sule biosynthesis genes capABCDE or the hyaluronic acid capsule hasA gene
was detected in an additional 16 assemblies classified as either B. cereus or
Bacillus thuringiensis isolated from clinical, environmental, and food sources.
The emetic toxin genes cesABCD were detected in 24 assemblies belonging
to panC clades III and VI that had been isolated from food, clinical, and en-
vironmental settings. The command line version of BTyper is available at
https://github.com/lmc297/BTyper. In addition, BMiner, a companion appli-
cation for analyzing multiple BTyper output files in aggregate, can be found at
https://github.com/lmc297/BMiner.
IMPORTANCE: Bacillus cereus is a foodborne pathogen that is estimated to
92
cause tens of thousands of illnesses each year in the United States alone. Even
with molecular methods, it can be difficult to distinguish nonpathogenic B.
cereus group isolates from their pathogenic counterparts, including the human
pathogen Bacillus anthracis, which is responsible for anthrax, as well as the in-
sect pathogen B. thuringiensis. By using the variety of typing schemes employed
by BTyper, users can rapidly classify, characterize, and assess the virulence po-
tential of any isolate using its nucleotide sequencing data.
4.2 Introduction
The Bacillus cereus group, also known as Bacillus cereus sensu lato (s.l.), consists of
nine closely related bacterial species: B. anthracis (Logan 2015), B. cereus sensu
stricto (s.s.), B. cytotoxicus (Guinebretiere, Auger, et al. 2013), B. mycoides (Lech-
ner et al. 1998), B. pseudomycoides (Nakamura 1998), B. thuringiensis, B. toyonen-
sis (G. Jimenez et al. 2013), B. weihenstephanensis (Lechner et al. 1998), and B.
wiedmannii (Miller, Beno, et al. 2016). The pathogenic potentials of members
of the B. cereus group vary widely; while some isolates are capable of causing
anthrax or anthrax-like disease (CDC n.d.), foodborne illness (Stenfors Ar-
nesen, Fagerlund, and Granum 2008), or food spoilage issues (Lucking et al.
2013; Doll, Scherer, and Wenning 2017; Ivy et al. 2012), others are used in in-
dustrial settings as probiotics (G. Jimenez et al. 2013; Hong, Le Hong Duc, and
Cutting 2005; Guillermo Jimenez et al. 2013; Zhu et al. 2016), insecticides and
pest control agents (Jouzani, Valijanian, and Sharafi 2017), agents in environ-
mental pollutant bioremediation (Jouzani, Valijanian, and Sharafi 2017; Aceves-
Diez, Estrada-Castaneda, and Castaneda-Sandoval 2015; Dash, Mangwani, and
Das 2014), plant growth promoters (Jouzani, Valijanian, and Sharafi 2017; Ar-
93
mada et al. 2015), and even as producers of bacteriocins (Wang et al. 2014; Lee,
Churey, and Worobo 2009) or parasporins with anticancer activities (Jouzani,
Valijanian, and Sharafi 2017; Ohba, Mizuki, and Uemori 2009; Ammons et al.
2016). As the industrial and agricultural applications of these microorganisms
expand, differentiating between isolates that can cause anthrax or gastrointesti-
nal illness and those that can be used as beneficial microbes in industrial or
agricultural settings becomes critical. Relying strictly on taxonomic classifica-
tion at the species level can lead not only to isolate misclassification, but also
to an inaccurate assessment of a given isolate’s virulence potential. There have
been numerous cases in which probiotics containing B. cereus group isolates sold
for human and/or animal consumption were found to possess strains capable
of producing toxins Nhe and/or Hbl (Hong, Le Hong Duc, and Cutting 2005;
Zhu et al. 2016; Le H. Duc et al. 2004), or the species they contained were incor-
rectly identified (Hong, Le Hong Duc, and Cutting 2005; Zhu et al. 2016; Huys
et al. 2013). Additionally, B. thuringiensis, a biopesticide, can possess B. cereus
s.s. toxin genes and potentially infect humans via the food chain (Rosenquist et
al. 2005), a notable example being a foodborne outbreak associated with salad
that was potentially caused by B. thuringiensis serovar aizawai that had been
sprayed on a produce field (EFSA 2016).
Differentiating between pathogenic and nonpathogenic B. cereus group iso-
lates is a matter of public health and economic importance but can be a challeng-
ing task. Phenotypic and biochemical methods (Tallent et al. 2012), as well as
many commonly used molecular methods, such as 16S rRNA gene sequencing,
may not have sufficient discriminatory power to differentiate between members
of the B. cereus group (Liu et al. 2015a; Fox, Wisotzkey, and Jurtshuk 1992). In
addition, the ability of a particular B. cereus group isolate to cause disease in
94
humans is not species dependent, and taxonomic classification can often be a
poor predictor of an isolate’s virulence potential (Kovac et al. 2016); for exam-
ple, genes encoding diarrheal toxins have been found in B. cereus, B. mycoides,
B. pseudomycoides, B. thuringiensis, and B. weihenstephanensis (Kovac et al. 2016;
Izabela Swiecicka, Van der Auwera, and Mahillon 2006; Pruss et al. 1999). For
these reasons, better tools are needed to classify B. cereus isolates, from both tax-
onomical and food safety risk perspectives (Ehling-Schulz and Messelhausser
2013).
A number of genetic loci have been proposed as markers that can be used
to taxonomically classify and/or differentiate between pathogenic and non-
pathogenic B. cereus group isolates at greater resolution than phenotypic meth-
ods and 16S rRNA gene sequencing (Kovac et al. 2016). Some examples of
taxonomic markers include the housekeeping gene rpoB (Miller, Beno, et al.
2016; Kovac et al. 2016; Caamano-Antelo et al. 2015; Kwan Soo Ko et al. 2004;
K. S. Ko et al. 2003; Martinez, Stratton, and Bianchini 2017; Miller, Kent, et al.
2015), the pantoate-beta-alanine ligase gene panC (Guinebretiere, Thompson, et
al. 2008; Guinebretiere, Velge, et al. 2010; Warda et al. 2016; Schmid et al. 2016;
Sorokin et al. 2006), and multiple loci used in a 7-gene multilocus sequence typ-
ing (MLST) scheme (i.e., glp, gmk, ilv, pta, pur, pyc, and tpi) (Kovac et al. 2016;
Yang, Yu, et al. 2017; Yang, Gu, et al. 2016; Drewnowska and Izabela Swiecicka
2013; Tourasse et al. 2011; A. R. Hoffmaster et al. 2008; Cardazzo et al. 2008)
(https://pubmlst.org/bcereus/). Each of these methods alone provides greater
resolution than its predecessors, and the methods may be implemented in com-
bination with each other and/or with phenotypic methods (Kovac et al. 2016;
Ehling-Schulz and Messelhausser 2013; Guinebretiere, Velge, et al. 2010; Car-
dazzo et al. 2008).
95
The presence and absence of virulence and toxin genes have also served as
indicators in a method by which B. cereus group isolates can be classified as
pathogenic or nonpathogenic (Liu et al. 2015b; Kovac et al. 2016; Bohm et al.
2015). These methods are beneficial from a clinical perspective, as genes asso-
ciated with many medically relevant phenotypes are plasmid carried (Klee et
al. 2010), including anthrax toxin and capsule genes (Zwick et al. 2012), and
ces genes, which encode cereulide synthetase (Hoton et al. 2009). This can be
contrasted with the fact that many genes that encode phenotypic traits used to
distinguish members of the B. cereus group using biochemical and microbiologi-
cal tests are contained on the chromosome (motility, hemolysis, etc.) (Klee et al.
2010). As a result, a disease phenotype, such as the ability to cause anthrax-like
symptoms in a particular host (Zwick et al. 2012), may not be confined to a sin-
gle B. cereus group species, making species-level taxonomy a poor indicator of
an isolate’s pathogenic potential.
Molecular typing methods using housekeeping and virulence genes found
in members of the B. cereus group have been essential for classifying isolates
from both a taxonomical and a public health perspective. However, as whole-
genome sequencing (WGS) becomes cheaper, faster, and more accessible, the
ability to perform molecular typing methods in silico becomes even more at-
tractive. With the goal of creating a readily accessible open-source pipeline that
can be easily used by B. cereus researchers and public health officials, we have
created BTyper, a computational tool to perform (i) virulence gene detection,
(ii) MLST, (iii) panC clade typing, and (iv) rpoB allelic typing using B. cereus
group nucleotide sequencing data in either FASTA, SRA, or gzipped FASTQ
format. Additionally, we applied BTyper and BMiner, a companion application
for analyzing BTyper’s output files in aggregate, to a set of 662 B. cereus group
96
genome assemblies, with the goal of identifying (i) anthrax-associated genes in
non-anthracis Bacillus members of the B. cereus group, and (ii) assemblies from
B. cereus group strains with emetic potential.
4.3 Materials and Methods
4.3.1 Database construction
To construct a virulence gene database specific to B. cereus group isolates, amino
acid sequences from a total of 36 virulence genes (see Table S1 in the supple-
mental material) were collected from the National Center for Biotechnology In-
formation (NCBI) (https://www.ncbi.nlm.nih.gov/). For an MLST database,
the 7-gene MLST database for Bacillus cereus was downloaded from PubMLST
(https://pubmlst.org/bcereus/). For panC typing, chromosomes of 45 B. cereus
group strains were downloaded from the NCBI database (Table S2). panC genes
were extracted from each strain using nucleotide BLAST (BLASTn) (Camacho
et al. 2009) and the panC genes of various B. cereus group type strains, and the
online tool available at https://tools.symprevius.org/Bcereus/english.php was
used to ensure that at least one representative from each of the seven panC
clades was present in the collection (Guinebretiere, Velge, et al. 2010) (Table
S2). For rpoB allelic typing, the rpoB allelic type database created and curated by
Cornell University’s Food Safety Lab and Milk Quality Improvement Program
(CUFSL/MQIP; Ithaca, NY) was used. While 16S rRNA gene typing is not per-
formed by default (see ”Construction of BTyper tool,” below), 16S rRNA gene
typing can be performed using reference 16S rRNA gene sequences from nine
97
different B. cereus group type strain genomes. To obtain these sequences, the 16S
rRNA gene sequence from a cultured B. cereus type strain was downloaded from
the Ribosomal Database Project (RDP) (Cole et al. 2014) and used in conjunc-
tion with BLASTn (Camacho et al. 2009) to extract 16S rRNA gene genes from
each of nine different B. cereus group species type strain genomes (Table S3). All
database files can be downloaded from https://github.com/lmc297/BTyper.
4.3.2 Construction of BTyper tool
BTyper was created with the following dependencies: Python version 2.7
(https://www.python.org/), Biopython version 1.6.8 (Cock et al. 2009), BLAST
version 2.4.0 (Camacho et al. 2009), SPAdes version 3.9.0 (Bankevich et al. 2012),
and SRA toolkit version 2.8.0 (Kodama et al. 2012; Leinonen et al. 2011). The
whole-genome sequences of 22 previously characterized B. cereus group isolates
(Kovac et al. 2016) were downloaded from the NCBI and used as a training set
to optimize parameters (referred to here as the ”training set”; Table S4). For
virulence gene detection using translated nucleotide BLAST (tBLASTn) (Cama-
cho et al. 2009), default minimum coverage and minimum identity thresholds
of 70 and 50%, were chosen, respectively, as they correlated highly with previ-
ously published PCR results (Kovac et al. 2016), and the allele with the highest
corresponding bit score was reported. For MLST, rpoB allelic typing, and panC
clade typing, the highest-scoring allele in the respective database was selected
using its associated BLAST bit score, with no minimum threshold applied (Fig-
ure 4.1). Virulence gene detection, MLST, rpoB allelic typing, and panC clade
typing methods were chosen to be performed by default, as these methods are
valuable for their discriminatory power (Kovac et al. 2016). 16S rRNA gene
98
typing, although not performed by default due to its inability to discriminate
between phylogenetic clades and species (Caamano-Antelo et al. 2015; Rossi-
Tamisier et al. 2015; Chen and Tsen 2002), was added as an option as well, as
many users may be interested in this locus. For this method, the highest-scoring
16S rRNA gene of the nine type strain 16S rRNA genes was selected using its
BLAST bit score, with no minimum threshold applied.
4.3.3 PCR detection of virulence genes
To assess the accuracy of BTyper’s in silico virulence gene detection, each of the
24 isolates in the validation set was screened for eight virulence genes (hblA,
hblC, hblD, nheA, nheB, nheC, cytK, and entFM) using PCR. Bacterial DNA used
as the template in PCRs was extracted by inoculating single colonies into 100 µl
of sterile water; lysates were then heated at 95◦C for 10 min in a thermocycler.
For PCRs, 1 µl of dirty lysate was added to a master mix containing sterile water,
2x GoTaq Green master mix (Promega, Madison, WI), and primers at a concen-
tration of 0.4 µM each (Table S5). The PCRs included an initial denaturation time
of 3 min at 94◦C, followed by 30 cycles of amplification; each cycle consisted of
denaturation at 94◦C for 30 s, annealing (see Table S5 for annealing tempera-
tures) for 30 s, and elongation for 1 min at 72◦C, with a final extension at 72◦C
for 7 min. PCR products were electrophoresed in 1% agarose gels, followed by
ethidium bromide staining to confirm specific amplification. For isolates that
did not yield a PCR amplicon for a given gene, the PCR was repeated at least
once in order to confirm the negative PCR result.
99
B.)cereus)group)typing)method
Virulence)Gene) Multi0Locus) rpoB Allelic) panC Clade)
Typing) Sequence)Typing Typing Typing
(0v/00virulence)True) (0m/00mlst True) (0r/00rpoB True) (0p/00panC True)
NCBI)Sequence) Download)corresponding)SRA)
Read)Archive) data)from)NCBI)(sra0get))
(SRA))
(0t/00type)sra0get) and/or)split)into)zipped)FASTQ)
or)sra) files)(if)necessary)
Illumina)short) Assemble)into)contigs using)
reads)in)zipped) SPAdes using)either)paired0
FASTQ)format end)or)single0end)reads
(0t/00type)pe or)se)
BLAST)against) BLAST)against)
BTyper virulence) PubMLST BLAST)against) BLAST)against)
gene)database) B.)cereus) FSL)rpoB BTyper panC
using)tblastn database)using) database)using) collection)using)
and/or)blastn blastn blastn blastn
Assembly)in) Report)best0
FASTA)format) matching)allelic)
(0t/00type)seq) type)for)each)
gene
Report)virulence) Using)best0matching)
genes)above) allelic)types,)report) Report)best0 Report)best0
coverage)and) corresponding) matching)allelic) matching)panC
identity)threshold) sequence)type,)if) type clade
as)present available
Figure 4.1: BTyper command line workflow for various types of data and de-
fault typingFimgureth1.oBdTsy.peIrncpomumt adndatluinme wtoyrkpfleowisfolrisvtaeridouisndatthaetylpeefstamndadregfaiunlt, twyphingile typing
methods armeelthisodtes.dInpaut data type is listed in the left margin, while typing methods are listed at thetop of the chatrt.tChoemtmoapndolinfetphaeramchetearsrta.ssCocoiamtedmwiathnadpalritincuelapr daartaatmypeeotfetryspiangssociated
with a partmicetuholdararetsyhpowinn ign pmareentthhesoeds. are shown in parentheses. FSL, Food Safety
Lab.
100
4.3.4 MLST
Multilocus sequence typing (MLST) was performed for all 24 isolates in the vali-
dation set using a 7-housekeeping-gene scheme available through the PubMLST
website (https://pubmlst.org/bcereus/). The PCRs consisted of 1 µl of dirty
lysate as the DNA template added to a master mix containing sterile water, 2x
GoTaq Green master mix (Promega), and primers at a final concentration of 0.4
µM each. The PCR cycles included an initial denaturation (3 min at 94◦C), fol-
lowed by 20 cycles of denaturation (94◦C for 30 s), annealing for 30 s with a
touchdown scheme (annealing temperatures that decrease by 0.5◦C per cycle,
starting with 55◦C and reaching 45◦C at the last cycle), and elongation at 72◦C
for 45 s. The 20 cycles of touchdown PCR were followed by an additional 20
cycles using an annealing temperature of 45◦C. A final extension at 72◦C for
5 min was included at the end of the 40 cycles. After amplification, the PCR
products were sequenced at the Biotechnology Resource Center (BRC; Cornell
University, Ithaca, NY), and ATs and sequence types (STs; based on all 7 genes)
were assigned using the PubMLST website. All isolates were submitted to the
B. cereus PubMLST database (Kovac et al. 2016).
4.3.5 rpoB allelic typing
A 632-nucleotide (nt) internal sequence of rpoB, encoding the β-subunit of the
RNA polymerase, was used for assigning rpoB allelic types (ATs), as described
previously (Ivy et al. 2012). The sequences of all rpoB ATs are available in the
Food Microbe Tracker database (Vangay et al. 2013).
101
4.3.6 Validation of BTyper using additional B. cereus group
whole-genome sequences
The genomes of 24 additional B. cereus group isolates were sequenced and as-
sembled according to Miller et al. (referred to here as the ”validation set”; Table
S6) (Miller, Beno, et al. 2016). BTyper was used to perform virulence gene detec-
tion, MLST, rpoB allelic typing, and panC clade typing on each draft genome us-
ing the chosen default settings (see ”Construction of BTyper tool”, above). The
same analyses were performed using the Illumina paired-end reads associated
with each isolate, again using BTyper’s default settings. To assess the accuracy
of the panC clades assigned by BTyper, clade assignments provided by BTyper
were compared to the isolates’ whole-genome sequence clades provided by Ko-
vac et al. (Kovac et al. 2016) and Miller et al. (Miller, Jian, et al. 2018) for the
training and validation sets, respectively. A current version of the command line
tool, as well as the curated virulence gene and rpoB allelic type databases, can be
found at https://github.com/lmc297/BTyper. A link to a Web-based version of
BTyper will also be made available at https://github.com/lmc297/BTyper at a
later time.
4.3.7 Construction of BMiner companion application
BMiner, a companion application for parsing, viewing, and analyzing mul-
tiple BTyper files in aggregate, was created with the following dependen-
cies: R version 3.3.2 (R Core Team 2016) and R packages shiny version 1.01
(Chang et al. 2017), ggplot2 version 2.2.1 (Wickham 2009), readr version
1.1.0 (Wickham, Hester, and Francois 2017), stringr version 1.2.0 (Wickham
102
2017), vegan version 2.4-2 (Oksanen et al. 2017), plyr version 1.8.4 (Wick-
ham 2011), dplyr version 0.5.0 (Wickham, Francois, et al. 2016), cluster ver-
sion 2.0.6 (Maechler et al. 2017), ggrepel version 0.6.5 (Slowikowski 2016), and
magrittr version 1.5 (Bache and Wickham 2014). BMiner is freely available at
https://github.com/lmc297/BMiner.
4.3.8 Application of BTyper and BMiner to whole-genome se-
quencing data
The latest assembly versions for all (n = 651) B. cereus group genome assemblies
available in GenBank were downloaded on 6 April 2017. Genome assemblies
were assigned to one of nine taxa according to their GenBank classification: B.
anthracis (n = 157), B. cereus s.s. (n = 343), B. cytotoxicus (n = 2), B. mycoides
(n = 19), B. pseudomycoides (n = 2), B. thuringiensis (n = 93), B. toyonensis (n = 3),
B. weihenstephanensis (n = 21), and B. wiedmannii (n = 11). BTyper was used
to perform virulence gene detection, MLST, rpoB allelic typing, and panC clade
typing on all 651 isolates, as well as an additional 11 isolates that were part
of the validation set but did not have assemblies in the NCBI database at the
time (total number of B. cereus group genomes, 662). All available metadata
associated with each assembly’s BioSample were downloaded from the NCBI
(Barrett et al. 2012). Data mining using BTyper results from all 662 B. cereus
group assemblies was conducted using BMiner. The final results files for all 662
B. cereus group genome assemblies, as well as the associated metadata, can be
found at https://github.com/lmc297/BTyper.
103
4.3.9 Post hoc statistical analyses
Post hoc statistical analyses were conducted in R version 3.3.2 (R Core Team
2016). Fisher’s exact test was used to test for associations between virulence
genes and panC-based phylogenetic clades using the fisher.test function in R’s
stats package (Table S7). Phylogenetic clades I and VII were excluded from this
analysis, due to both being underrepresented among B. cereus group genomes in
the NCBI database (12 and 2 isolates, respectively), while rare and common vir-
ulence genes present in fewer than 20 and more than n− 20 assemblies (where n
corresponds to the total number of assemblies being tested), respectively, were
also excluded. A Bonferroni correction was used to correct for multiple com-
parisons. To find members of the B. cereus group that clustered with B. anthracis
isolates based on their virulence gene presence-absence profiles, as well as to
assess within-species virulence heterogeneity, k-medoids clustering was per-
formed using the clara function in R’s cluster package (Maechler et al. 2017)
and a Euclidean distance metric. To find an optimum value for k, k-medoids
clustering was performed for each value of k for 2 ≤ k ≤ (n − 1), where n is
662, the total number of assembled genomes. A k value of 31 was selected, as it
corresponded to the largest average silhouette width.
104
4.4 Results
4.4.1 Construction and validation of BTyper using in vitro
methods
BTyper was used to perform in silico (i) virulence gene detection, (ii) MLST,
(iii) panC clade typing, and (iv) rpoB allelic typing using the default settings
described in Materials and Methods. Both assembled genomes and Illumina
paired-end reads from 46 B. cereus group genomes were used (Figure 4.1).
BTyper was successfully able to predict rpoB allelic types and whole-genome
phylogenetic clade using panC for all B. cereus group genomes tested (n = 46;
Table 4.1). For in silico MLST, it was successful at predicting the sequence type
in all but one isolate (45 out of 46; Table 4.1); isolate FSL M8-0091 was the only
isolate for which in silico prediction of sequence type did not match the sequence
type obtained by Sanger sequencing. For this isolate, the only allele that differed
between the two methods was the tpi allele: Sanger sequencing yielded a tpi al-
lelic type of 20, while BTyper’s in silico prediction was tpi allelic type 175, which
was a perfect match and differed from tpi 20 by a single nucleotide at position
284. However, SRST2 (Inouye et al. 2014) also obtained a tpi allelic type of 175,
making it likely that (i) the colony selected to undergo WGS had a different tpi
allele than the colony selected to undergo Sanger sequencing, or (ii) there was
an error in either WGS or Sanger sequencing.
For virulence gene detection, the results obtained from BTyper matched the
PCR results for eight selected virulence genes in over 89% of all isolates (n = 46;
Table 4.1). This resulted in an overall sensitivity and specificity of 99.0% and
105
Table 4.1: Percentage of isolates in which BTyper correctly identified the pres-
ence/absence of eight virulence genes, MLST, rpoB AT, and panC clade
Virulence gene (%)a
Data set hblA hblC hblD nheA nheB nheC cytK entFM MLST rpoB panC
ST AT clade
(%)b (%)c (%)d
Training (n = 22)
Assemblies 100 100 100 100 95.5 100 90.9 95.5 100 100 100
PE readse 100 90.9 100 90.9 95.5 95.5 90.9 95.5 100 100 100
Validation (n = 24)
Assemblies 91.7 100 95.8 87.5 95.8 100 100 91.7 95.8 100 100
PE reads 91.7 100 91.7 87.5 95.8 100 100 91.7 95.8 100 100
Total (n = 46)
Assemblies 95.7 100 97.8 93.5 95.7 100 95.7 93.5 97.8 100 100
PE readse 95.7 95.7 95.7 89.1 95.7 97.8 95.7 93.5 97.8 100 100
aPresence/absence of eight virulence genes from previously published WGS data (training set) or PCR (validation set).
bMultilocus sequence typing (MLST) results from previously published WGS data (training set) or Sanger sequencing (validation set).
crpoB allelic typing (AT) results from previously published WGS data (training set) or Sanger sequencing (validation set).
dpanC clade typing results from previously published WGS data.
eIllumina paired-end (PE) reads.
85.5%, respectively, when the default parameters for assembled genomes were
used, and an overall sensitivity and specificity of 97.0% and 85.5%, respectively,
when default parameters for Illumina paired-end reads were used.
4.4.2 Characteristics associated with B. cereus group phyloge-
netic clade III are most prevalent among genome assem-
blies currently available at NCBI
BTyper was used to perform virulence gene detection, MLST, panC clade typ-
ing, and rpoB allelic typing on 662 B. cereus group genome assemblies (157 as-
semblies labeled as B. anthracis, 353 assemblies as B. cereus s.s., 2 assemblies as
B. cytotoxicus, 19 assemblies as B. mycoides, 2 assemblies as B. pseudomycoides, 94
assemblies as B. thuringiensis, 3 assemblies as B. toyonensis, 21 assemblies as B.
weihenstephanensis, and 11 assemblies as B. wiedmannii). Within the 662 assem-
blies, 13 virulence genes were detected in more than 90% of all genomes when
the default minimum amino acid sequence identity and coverage thresholds of
106
Figure F2i.guPreerc4e.n2:taPgeerc(e%nt)agoef (B%.) ocef rBe.ucsergeurosugproguepnaosmseemabslsieesmibnliweshiicnh awphiacrhticaulaprarticular
virulencveirugleennecewgaesnedewteacstedde.teMcteindi.mMuimnimiduemntiitdyenatnidtycaonvdecraogveertahgreesthhroelsdhsooldfs5o0f 5a0nd 70%,
respectiavnedly7,0w%e,rreeuspseedctifvoerlyv,irwuelerencuesegdenfoerdveitreuclteinocne. gene detection.
50 and 70% were used, respectively (Figure 4.2). The least commonly detected
gene was cytK1 (Figure 4.2), which was detected in both available B. cytotoxicus
genomes and no other WGS assemblies.
For in silico MLST, 544 assemblies were assigned to one of 213 B. cereus se-
quence types (STs), the most common of which was ST1 (n = 123 isolates). This
was unsurprising, considering that ST1 is associated with B. anthracis (Helga-
107
Percentage)(%))of)Genome)Assemblies)with)Virulence)Gene
son et al. 2004), and B. anthracis makes up a considerable portion (23.7%) of the
B. cereus group genome assemblies currently in NCBI’s database. In silico rpoB
allelic typing grouped the 662 isolates into one of 43 different, best-matching
rpoB allelic types (ATs), with 185 isolates matching AT463 most closely. AT463
has been previously associated with clade III isolates (Kovac et al. 2016), the
phylogenetic clade that encompasses B. anthracis.
For panC-based phylogenetic clade typing, a panC locus was detected in 658
out of 662 genomes (Figure 4.3). The most commonly assigned clade was clade
III, a polyphyletic clade which contains B. anthracis, as well as some strains cur-
rently misclassified in the NCBI database as B. cereus s.s. and B. thuringiensis
(Kovac et al. 2016; Guinebretiere, Thompson, et al. 2008; Guinebretiere, Velge,
et al. 2010). Together, clade IV, which consists of some B. cereus s.s. and B.
thuringiensis strains (Kovac et al. 2016; Guinebretiere, Thompson, et al. 2008;
Guinebretiere, Velge, et al. 2010), as well as the type strains of these two species,
and clade III accounted for more than 75% of all B. cereus group WGS assemblies
in the NCBI database (Figure 4.3). Clade VII, which contains the B. cytotoxicus
(Guinebretiere, Auger, et al. 2013) type strain, was the most poorly represented
clade; the two available B. cytotoxicus assemblies were placed here.
4.4.3 Application of BTyper to identify B. anthracis-associated
genes in non-anthracis Bacillus isolates reveals virulence
gene heterogeneity within genome assemblies from an-
thrax toxin-encoding isolates
108
III         IV VI          II V            I           NA       VII
panC Clade
Figure 4.3: Closest-matching phylogenetic clade using the panC loci from 662
FigureB3. .ceCreluossegsrto-mupatgcehninogmephayslsoegmebnleietisc. cAlapdaenCuslioncgusthceoupladnnCotlobcei afsrsoimgne6d62inB4. cereus
group ggeennoomeeaasssseemmbblileies,s.wAhicphanisCdeloncoutesdcboyulNdAn.ot be assigned in 4 genome assemblies,
which is denoted by “NA”.
When Fisher’s exact test was used to determine if any virulence genes were sig-
nificantly associated with a phylogenetic clade, virulence genes typically asso-
ciated with B. anthracis were found to be significantly associated with members
of clade III after a Bonferroni correction was applied (P < 0.05; Table 4.2). The B.
anthracis toxin genes cya (edema factor-encoding), lef (lethal factor-encoding),
and pagA (protective antigen-encoding), as well as their regulator gene atxA
109
Total&Number&of&Genome&Assemblies&belonging&to&panC Clade
(Dai et al. 1995), were found only in clade III isolates (P < 0.05; Table 4.2). In ad-
dition, B. anthracis polyglutamate capsule synthesis genes capABCDE (Candela,
Mock, and Fouet 2005) were more commonly associated with clade III assem-
blies (P < 0.05; Table 4.2) and found primarily in genomes classified in the NCBI
database as B. anthracis. Meanwhile, genes associated with diarrheal disease
(Stenfors Arnesen, Fagerlund, and Granum 2008) were found to be significantly
associated with clades II, IV, V, and VI (P < 0.05; Table 4.2); these included the
diarrheal toxin genes hblCDAB, which were found to be significantly associated
with clades II, IV, V, and VI (P < 0.05; Table 4.2), while being less common in
members of clade III (P < 0.05; Table 4.2), driven by the large number of B.
anthracis assemblies in this clade that did not possess these genes.
Table 4.2: Virulence genes significantly associated with 5 B. cereus group phylo-
genetic clades after a Bonferroni correctiona
Clade Genes
II hblCDAB
III atxA,b capABCDE, cya,b hasA, hlyII, hlyR,
lef,b pagAb
IV bceT, cytK2, hblCDAB
V bceT, hblCDABc
VI bceT, cesC, hblCDABc
aSignificant at a P value of < 0.05. For exact corrected P values, see
Table S7.
bIndicates a virulence gene that was detected only in its respective
clade (includes clades I and VII).
cIndicates a virulence gene that was detected in all members of its
respective clade.
Principal-component analysis (PCA) based on the presence/absence of vir-
ulence genes using BMiner revealed several assemblies labeled as B. cereus and
B. thuringiensis that clustered with B. anthracis assemblies (Figure 4.4A). When
k-medoids clustering was performed with an optimum k of 31, isolates classi-
fied in the NCBI database as B. anthracis were placed into clusters 1 through 8
110
● ● Clustermetadata
A B ● 1 ● 17● 2 ● 18
● 3 ● 19
mNCetBaId&Saptaecies ● 4 ● 20
● B. Ba.n#athnrtahcriascis ● 5 ● 21
B.#c reus 1010 ● B. cereus ● 6 ● 22
● B. Bc.y#tcoyttooxtiocxuiscus ●● ● 7 ● 23● ●●●● ● B. Bm.#ymcoycidoeidses
● ● 8 ● 24
● B. Bp.s#pesueduodmoymcyocidoeidses ● 9 ● 25●●
●● B. Bth.#uthrinugriinegniseinssis ● ● 10 ●● 26
● ●
●
B. Bto.#ytonyoenesinssis ● 11 ● 27● ●
● ● B. Bw.#ewiheeihnesntesptehpahnaen
● ●
esnissis 12 28
● B. Bw.#iwediemdman
● ●
anniinii 5 13 29
5 ● 14 ● 30
● 15 ● 31
PC3
● 16
● −20 ● ●
●●
● ● ● −15
●● ● PC3
● −10 ●●●● ●●● ● −5 ●●● ●●
● −20
● ● ●●●● ● 0 ● ●●● ●
● ● ● ● −15
●●
● ●
●●● ●● ● 0 ● ● ●● ● ●●
● ●
●●● ● ●● ●
●●● ● −10
0 ● ● ●●●● ● ●●●
●● ● ● ●
●●● ●●●
● ● ● ● ●
●
●●●●
● −5
● ●
● ● ●●
●● ● ● 0
●● ●
●
●●●●
●
● ●
● ● ●●
●● ● ● ●
●
●● ●●● ●
●
● −5 0
PC1
−5 0
PC1
Figure 4.4: Principal-component analysis (PCA) of 662 B. cereus group genome
assemblies based on presence/absence of virulence genes. Virulence gene typ-
Figure 4. Principal component analysis (PCA) of 662 B. cereus group genome assemblies 
ing was cabarsreide odn poreusetnceu/asbisnengce BofT vyiruplenrc,e gwenhesi. lVeiruPleCncAe gewne atyspinpg ewrafs ocarrmriede dout ussinign g BMiner.
Principal cBoTmypepr, ownhielen PtCsA 1wa(sP pCerf1or)maedn udsin2g BMiner. Principal components 1 ((PC2) are plotted on the x and y axis, respe(cPtiCvel2y,) wahrilee pprinlocipttael cdomopnonet
PC1
nht 3e
) axnda 2 (PC3)n
 
 d y axes, re-
spectively,cowrrehspiolendsp tro ipnoicnit psiazel. Pclotms arpe oconloerendt by3 (A(P) iCsol3a)te cspoercirees,s aps foonund sin tNoCBpI,o ainnd t size. Plots
are colored(B)b ayssigned cluster using k-medoids. To view interactive versions of these plots containing isolate niasmoelsa atned mseptaedactiae, asl,l BaTsypfeor fuinnald resiunlts NfileCs aBnId m(Aeta)d,ataa ncadn bae sdoswignlnoaedded cluster us-
ing k-medofroidm sht(tpBs:)//.gTithoubv.cioemw/lmicn29t7e/BraTycpteirv/treeev/mearstseiro/sanmsploe_fdatthaeansde vipewloedt sin cBoMnintear.ining isolate
names and metadata, all BTyper final results files and metadata can be down-
loaded from https://github.com/lmc297/BTyper/tree/master/sample data
and viewed in BMiner.
(Figure 4.4B). Additionally, clusters 17, 21, 22, and 29 did not contain any assem-
blies labeled in NCBI as B. anthracis, but they contained at least one assembly
in which one or more of the B. anthracis-associated virulence genes identified
using Fisher’s exact test were detected (Figure 4.5).
Cluster 1 (Figure 4.4B), which contained the majority of isolates labeled as
B. anthracis, contained 110 isolates, 107 of which were classified in the NCBI
database as B. anthracis, and all of which belonged to panC clade III (Figure
4.5). Assemblies derived from human and veterinary clinical isolates associ-
ated with anthrax disease populated a large proportion of the cluster, includ-
ing assemblies associated with isolates from the 2001 anthrax bioterrorism at-
111
PC2
PC2
112
B.#anthracis!Associated+Genes Emetic+Toxin+Genes cytK bceT hbl hly clo plc nhe ent cer inhA
Cluster Size panC cya lef pagA atxA hasA capA capB capC capD capE cesA cesB cesC cesD cytK1 cytK2 bceT hblA hblB hblC hblD hlyR hlyII clo plcA plcB plcR nheA nheB nheC entA entFM cerA cerB inhA1 inhA2
1 110 3 1.00 0.99 0.97 0.97 1.00 0.99 1.00 1.00 0.99 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.99 1.00 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99
2 26 3,+4 0.00 0.00 0.00 0.00 0.00 0.04 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 1.00 1.00 0.96 0.96 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
3 6 3 0.00 0.00 0.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 0.67 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
4 18 3 0.94 0.94 1.00 0.94 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
5 26 3,+4* 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.27 0.00 0.00 0.00 0.00 1.00 0.96 0.96 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
6 10 3,+4 0.00 0.00 0.00 0.00 0.00 1.00 0.80 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.20 0.40 1.00 1.00 1.00 1.00 0.80 0.80 1.00 0.80 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
7 28 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.93 1.00 1.00 1.00 1.00
8 40 2,+3,+4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 1.00 1.00 1.00 0.98 0.98 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
9 38 2,+3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.84 0.92 0.95 1.00 1.00 1.00 1.00 1.00 1.00 0.95 1.00 1.00 1.00
10 37 3,+4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.92 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
11 101 2,+3,+4,+5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 1.00 0.97 1.00 1.00 0.97 1.00 0.00 0.00 1.00 0.97 0.99 1.00 1.00 1.00 1.00 1.00 0.99 0.99 1.00 1.00 1.00
12 19 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 0.00 0.11 0.05 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 0.95 0.95 0.95 1.00 1.00 1.00 1.00 1.00 1.00
13 20 2,+3,+4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.95 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
14 14 2,+3,+6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.79 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
15 14 2,+6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.79 1.00 1.00 1.00 1.00 0.00 0.00 0.93 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
16 25 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 1.00 1.00 0.96 0.96 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.96 1.00 1.00 1.00 1.00
17 13 2,+3,+6 0.00 0.00 0.00 0.00 0.08 0.00 0.08 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00 0.85 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.92 1.00 1.00
18 54 2,+4,+5,+6* 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
19 9 5,+6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 2 * 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.50 0.00 0.00 0.00 1.00 0.50 1.00 0.50 1.00 1.00
21 3 4,+5 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 0.67 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 0.67 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
22 5 3 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 0.80 1.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
23 9 1,+3,+6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.78 1.00 1.00
24 5 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 0.00 0.00 0.60 1.00 1.00 1.00 1.00 0.00 0.00 1.00 0.80 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
25 7 1,+5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.71 1.00 1.00 0.86 0.86 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.57 1.00 1.00 0.00 1.00 1.00
26 7 4,+6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 0.86 1.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
27 5 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.60 0.00 1.00 1.00 0.00 0.00 1.00 0.80 1.00 1.00 0.00 1.00 1.00
28 1 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 1.00 1.00
29 1 3 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
30 7 2,+3,+4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
31 2 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Figure 4.5F: ikg-umree 5d. ok-imdsedcoliudsstcelrusstberass beadseodn onp preresseennccee//abasbesnecne coef voifruvleirnucele gnecneesg deenteecstedd eutseicntge BdTuypseinr.g SiBzeT ycoprerer.spSoinzdes tcoo rresponds
to the numthbe enrumobf ear sosfe amssbemliebslieass assisgingneedd ttoo aa gigvievne cnlucslteurs, twehr,ilew phainleC pcoarnrCespcoonrdrse tsop poanndCscltaodepsa fnoCuncdl aind tehse cfoluustnedr, winitht he cluster,
with an asteriasnk adsteernisokt idnegnootinnge oonre mor omroerea sassseemmbblliieess ththat could not be placed into a panC clade. Numbers within cells correspond to the proportion of assemblies in a givena tclcuostuerl din nwohticbhe thpe lcaocreredspionntdoinag pviarnuClencclea gdeen.e wNaus mdebteecrtsedw. ithin cells
correspond tGoretehne sphardoipngo rctoirornespoofnadsss teom a bvilriuelsenince agegniev deenteccltueds tienr minorew thhaicnh 90t%he ocf oarllr eassspeomnbdliiens ginv ai rcululsetnerc, ewghielen eredw as detected.
Green shadinshgadcionrgr ecsoprroesnpdosndtso tao va ivriurulelennccee ggeennee ddeteetcetecdte idn fienwmero trheant h1a0n%9 o0f% allo afsasellmabslsieesm inb ali celsusintera. Ycleullsotwe rs,hwadhinilge red shading
corresponcdosrrteospaonvdisr utol eBn. caentgheranceisd-aestseocctieatdedi ngefneews edrettehctaend i1n0 f%ewoefr athlalna s9s0e%m bbulti egrseiantera tchlauns 0t%er .oYf easllsoemwbslihesa dini na gclcuostrerre. sponds to
B. anthracis-associated genes detected in fewer than 90% but greater than 0% of assemblies in a cluster.
tacks (https://www.ncbi.nlm.nih.gov/bioproject/299), European heroin users
and an associated outbreak (Ruckert et al. 2012; Price et al. 2012), and a 2011 out-
break in Swedish cattle (Agren et al. 2014). Three assemblies labeled as B. cereus
clustered among them (Figure 4.4B). Two of these assemblies were labeled as B.
cereus strain 03BB102, an isolate that was thought to cause fatal pneumonia in a
welder in San Antonio, TX (Table 4.3), while the third was labeled as B. cereus
biovar anthracis strain CI, which caused fatal anthrax in a chimpanzee (Table
4.3) (Klee et al. 2010). Consistent with these findings, placement into cluster
1 was driven largely by an assembly’s possession of all, or nearly all, anthrax-
associated genes identified using Fisher’s exact test (Figure 4.6); the anthrax
toxin genes cya, lef, and pagA, toxin regulator gene atxA, hyaluronic acid cap-
sule gene hasA, and B. anthracis polyglutamate capsule genes capABCDE were
detected in nearly all (> 97%) cluster 1 assemblies (Figure 4.5).
Despite the fact that all assemblies classified in NCBI as B. anthracis were
assigned to clusters 1 through 8, the only other clusters in addition to clus-
ter 1 in which anthrax toxin genes were detected were clusters 4 and 22.
Like cluster 1, all isolates in clusters 4 and 22 belonged to panC clade III,
and nearly all possessed the anthrax toxin genes cya, lef, and pagA, regula-
tor gene atxA, and hyaluronic acid capsule gene hasA (Figure 4.5). How-
ever, the B. anthracis polyglutamate capsule genes capABCDE were not de-
tected in any of the cluster 4 or cluster 22 assemblies at the default iden-
tity and coverage thresholds (Figure 4.5). While cluster 4 (n = 18; Fig-
ure 4.4B) contained only isolates classified in the NCBI database as B. an-
thracis, it contained assemblies from several strains with attenuated virulence,
including several vaccine strains (Lekota et al. 2015; Okinaka et al. 2014)
(https://www.ncbi.nlm.nih.gov/biosample/SAMN06270273/). Cluster 22 (n =
113
hblD
hblA
1.0 hblB●
hblC
● ●
● ●
●
● ●
● k"medoids
● Cmluesttaedrata
● ●
● ● 1
●
● ●
0.5 ● ● 2
● 3
●
● ● ● 4
● ● 5
●
capC ● 6
capD capA bceT ● 7
capE ● 8
●
capB clo ● 17
0.0 ● ●● ● cytK2
● entFM
● ● 21
at●xA ● hlyR inhA1cerB
cya inhA2 ● 22hasA ● hlyII
lef plcA ● 29
pagA
●
●
●
●
● ●
● ● ●
−0.5 ●
●
●
−0.5 0.0 0.5 1.0
NMDS1
Figure 4.6: Nonmetric multidimensional scaling (NMDS) plot of Bacillus cereus
group clusters that (i) possessed at least one assembly that was classified as
Figure 6B.acNillouns-amntehtrraiccisminuNltiCdBimI, eannsdi/oonral(iis)cpaolisnsegss(eNdMatDleSa)stpolnoet aossfemBabclyililnuswcheicrheus group
clusters atht aleta(sit) opnoessBes. saendthartacliesa-asstsooncieataesdsevmirbullyentcheatgewnaes(ccylaas, slieffi,epdagaAs ,BaatxcAill,uhsasaAn,thracis in
NCBI, aanndd//oorr (ciia)pApBoCssDeEss)ewd aastdleetaesctteodnuesiansgseBmTbylpyeri.n NwMhiDchS awtaslepasetrfoornmeeBd. inanthracis-
associateBdMivnierur luesnicneg vgiernuelen(cceyag,enleef,prpeasegnAc,e/aatbxsAe,ncheadsAat,acaanpdAaBJCaDccEar)dwdaisssimdeitlaerc-ted using
BTyper.iNtyMmDetSriwc.asIspoelartfeosrmareedreinprBesMeninteedr ubsyinpgoivnitrsu, laenndcecgoennveexprheuslelsncaen/dabssheandciengdata and a
Jaccard cdoirsrseismpoilnadrittyo tmheetarsisci.gnIseodlakt-emsedaoreidsrecpluressteern. teVdirublyencpeoignetns,esaanrde pcloonttveedxinhulls and
shadingdcaorrkregsrpaoy.nd to the assigned k-medoids cluster. Virulence genes are plotted in dark
gray.
114
NMDS2
Table 4.3: Non-anthracis Bacillus assemblies in which anthrax toxin genes cya,
lef, and/or pagA were detected using BTyper
Gene(s) detected?
Clustera NCBI panC GenBank accession no.c Strain Isolate source (reference) cya lef pagA atxA hasA capABCDE
species cladeb
clas-
sifica-
tion
1 B. III GCA 000022505.1, 03BB102 Human with fatal pneu- + + + - + +
cereus GCA 000832405.1 monia, San Antonio, TX,
USAd
1 B. III GCA 000143605.1 Biovar Chimpanzee with fatal an- + + + + + +
cereus an- thrax, Ivory Coaste
thracis
strain
CI
22 B. III GCA 000167215.1, G9241 Human with pneumonia, + + + + + -
cereus GCA 000832805.1 nausea, and vomiting, LA,
USAf
22 B. III GCA 000688755.1 BcFL2013 Human with anthrax-like + + + + + -
cereus skin lesion, FL, USAg
22 B. III GCA 000789315.1 03BB87 Human with fatal pneumo- + + + + + -
cereus nia, Lubbock, TX, USAh
22 B. III GCA 002007005.1 LA2007 Human with fatal pneumo- + + + + + -
cereus nia and septic shock, Gal-
liano, LA, USAi
aClusters were assigned using a k-medoids approach (k = 31).
bpanC clades were assigned using BTyper.
cMultiple accession numbers are given for strains associated with multiple assemblies.
dhttps://www.ncbi.nlm.nih.gov/bioproject/31307
e (Klee et al. 2010)
f (Alex R. Hoffmaster et al. 2004)
g (Gee et al. 2014)
h (Johnson et al. 2015)
i (Pena-Gonzalez et al. 2017)
5; Figure 4.4B), however, contained 5 anthrax-associated assemblies, all of which
were classified in the NCBI database as B. cereus (Table 4.3). All assemblies in
cluster 22 originated from human clinical isolates in which the isolate was clas-
sified as B. cereus, but the patient presented anthrax-like symptoms; two as-
semblies were of B. cereus strain G9241, a strain of Bacillus isolated from the
sputum and blood of a patient with pneumonia, nausea, and vomiting (Alex
R. Hoffmaster et al. 2004). The isolate, which had been classified as B. cereus
via biochemical tests and 16S rRNA gene sequencing, was found to possess the
anthrax toxin gene pagA but not the polyglutamate capsule genes capABCDE
(Alex R. Hoffmaster et al. 2004), which is consistent with its classification using
BTyper (Table 4.3). BTyper’s classification of the three other assemblies in this
cluster also aligned with their previously published descriptions and included
115
the following: (i) a B. cereus assembly associated with an isolate from a patient
in Florida possessing an anthrax-like skin lesion (Gee et al. 2014), which was
found to possess anthrax toxin genes cya, lef, and pagA and the hyaluronic acid
capsule gene hasA and belong to ST78 (Gee et al. 2014), (ii) a B. cereus isolate
from a patient with a fatal case of pneumonia in Lubbock, TX, that was also
found to possess B. anthracis virulence genes (Johnson et al. 2015), and (iii) an
assembly associated with a B. cereus isolate that was found to possess anthrax
toxin genes and hasA and was isolated from a patient in Galliano, LA, who had
a fatal case of pneumonia and septic shock (Table 4.3) (Pena-Gonzalez et al.
2017).
While no anthrax toxin genes were detected outside clusters 1, 4, and 22,
other B. anthracis-associated genes identified using Fisher’s exact test were de-
tected in several other clusters and assemblies. Cluster 3 (n = 6; Figure 4.4B)
contained 6 B. anthracis assemblies belonging to panC clade III in which the B.
anthracis toxin regulator gene atxA and polyglutamate capsule genes capABCDE
were detected (Figure 4.5). Other assemblies in this cluster included B. anthracis
strain Smith 1013, described as ”Pasteur-like” in that it possessed plasmid pXO2
(the plasmid associated with cap genes) but not plasmid pXO1 (the plasmid as-
sociated with B. anthracis toxin genes) (Rasko et al. 2005; Terzi et al. 2014), as
well as B. anthracis strain Pasteur itself (Table 4.4).
The polyglutamate capsule genes capABCDE were also detected in assem-
blies assigned to clusters 6, 21, and 29 (Table 4.4). Cluster 6 (n = 10; Figure
4.4B) contained 10 assemblies: 1 assembly classified in NCBI as B. anthracis, 7
assemblies classified as B. cereus, and 2 assemblies classified as B. thuringiensis.
Members of this cluster belonged to panC clades III and IV, and consistent with
116
Table 4.4: Non-anthracis Bacillus assemblies in which B. anthracis-associated
genes were detected, excluding anthrax toxin genes cya, lef, and pagA and regu-
lator atxA
Gene(s) detected?
Cluster NCBI species panC GenBank accession no.a Strain Isolate source (reference) hasA capA capB capC capD capE
classification clade
2 B. cereus III GCA 001286905.1 JRS1 Rhazya stricta rhizosphere, - + + + - -
Jeddah, Saudi Arabiab
6 B. cereus III GCA 000003955.1 AH1273 Human blood, Icelandc - + + + + +
6 B. cereus III GCA 000161395.1 AH1272 Amniotic fluid, Icelandc - + - + + +
6 B. cereus III GCA 000181655.1, 03BB108 Dust containing - + + + + +
GCA 000832865.1 pneumonia-causing B.
cereus strain 03BB012d
6 B. cereus IV GCA 000398945.1 Schrouff Foode - + + + + +
6 B. cereus IV GCA 000399185.1 K- Foode - + + + + +
5975c
6 B. cereus IV GCA 000399305.1 HuB4- Soil, Belgiume - + - + + +
4
6 B. thuringien- III GCA 000161595.1 Serovar Mexicof - + + + + +
sis Mon-
terrey
strain
BGSC
4AJ1
6 B. thuringien- IV GCA 001640965.1 BGSC Bombyx mori, - + + + + +
sis 4C1 Czechoslovakiag
17 B. cereus VI GCA 002014585.1 FSL Soil, USAh + - - - - -
H8-
0485
17 B. thuringien- III GCA 000948155.1 Et10/1 Geothermal spring, Lirima - - + + - -
sis thermal springs, Chilei
21 B. cereus IV GCA 000161315.1 F65185 Open fracture, NY, USAj - + + + + +
21 B. cereus V GCA 000290835.1 VD115 Soil, Guadeloupee - + + + + +
21 B. thuringien- IV GCA 001677055.1 BGSC Red soil, Chinak - + + + + -
sis 4BT1
29 B. cereus III GCA 001913295.1 MOD1 Bc11W9hole black pepper, USAl - + + + + +
aMultiple accession numbers are given for strains associated with multiple assemblies.
bhttps://www.ncbi.nlm.nih.gov/bioproject/290051
c (Zwick et al. 2012)
dhttps://www.ncbi.nlm.nih.gov/bioproject/19959
e (Van der Auwera et al. 2013)
fhttps://www.ncbi.nlm.nih.gov/bioproject/29709
ghttps://www.ncbi.nlm.nih.gov/biosample/SAMN04628222/
hhttps://www.ncbi.nlm.nih.gov/biosample/SAMN06242081
ihttps://www.ncbi.nlm.nih.gov/biosample/SAMN03025783
jhttps://www.ncbi.nlm.nih.gov/bioproject/29689
khttps://www.ncbi.nlm.nih.gov/biosample/SAMN04000100; capE was detected at a lower amino acid identity (47.7%, compared to the
default threshold of 50%)
lhttps://www.ncbi.nlm.nih.gov/biosample/SAMN05608051
the detection of cap genes in this cluster, one of the B. thuringiensis assemblies in
this group had been shown to produce a polyglutamate capsule (Cachat et al.
2008). Cluster 21 (n = 3; Figure 4.4B) contained 2 assemblies labeled as B. cereus
and 1 assembly labeled as B. thuringiensis. One of the B. cereus assemblies came
from B. cereus strain F65185, which was confirmed to belong to ST168 and was
isolated from a patient in New York with an open fracture wound (Table 4.4).
Members of this group belonged to either panC clade IV or V. Cluster 29 (n = 1;
117
Figure 4.4B) consisted of a single B. cereus assembly belonging to panC clade
III and associated with a strain isolated from whole black pepper in the United
States in 2015 (Table 4.4).
Additionally, cap genes were detected in a single isolate in clusters 2 and
17 (n = 26 and 13, respectively; Figure 4.4B). However, B. anthracis-associated
genes were not detected in any other assemblies in this cluster, despite be-
ing composed primarily of assemblies classified as B. anthracis (21, 4, and 1
assemblies labeled in NCBI as B. anthracis, B. cereus, and B. thuringiensis, re-
spectively). Consistent with a lack of virulence genes, this cluster contained
the genome of the avirulent strain B. anthracis Ames, which is commonly
used in laboratory settings and does not possess B. anthracis plasmid pXO1
or pXO2 (https://www.ncbi.nlm.nih.gov/bioproject/57909). All non-anthracis
Bacillus assemblies in this group were isolated from either food or environmen-
tal sources, and all belonged to either panC clade III or IV.
4.4.4 Application of BTyper to identify assemblies associated
with emetic B. cereus group isolates
Assemblies possessing emetic toxin genes cesABCD were grouped into two
clusters using k-medoids. Cluster 12 (n = 19; Figure 4.4B) consisted of 19
assemblies classified as B. cereus in NCBI. All belonged to panC clade III,
cesABCD were detected in all assemblies, and hblCDAB were not detected in
any assemblies (Figure 4.5). Included in this cluster was strain AH187, an
isolate from the United Kingdom that was responsible for a 1972 emetic out-
break (Table 4.5). This isolate tested positive for emetic toxin (cereulide) for-
118
mation and nonhemolytic enterotoxin (Nhe) and negative for Hbl hemolytic
enterotoxin and cytotoxin K, and it belonged to MLST ST26 (Table 4.5)
(https://www.ncbi.nlm.nih.gov/bioproject/17715); these findings were con-
firmed using BTyper. Other notable strains in this cluster included (i) emetic
strain B. cereus H3081.97, a B. cereus strain of sequence type 144 (ST144) which is
closely related to strain AH187, and (ii) emetic strain B. cereus NC7401 (Takeno
et al. 2012).
Table 4.5: B. cereus group assemblies in which emetic toxin genes cesABCD were
detected.
Cluster NCBI species classification panC clade GenBank accession no. Strain Isolate source (reference)
12 B. cereus III GCA 000021225.1 AH187 Vomit of a person who ate
cooked rice; isolate was associated
with an emetic outbreak in 1972
(https://www.ncbi.nlm.nih.gov/
bioproject/17715)
12 B. cereus III GCA 000161075.1 BDRD-ST26 BDRD stock strain (Zwick et al. 2012) a
12 B. cereus III GCA 000171035.2 H3081.97 Food; emetic toxin-producing isolate
from 1997 outbreak linked to rice, TX,
USA
12 B. cereus III GCA 000283675.1 NC7401 Emetic isolate (Takeno et al. 2012)
12 B. cereus III GCA 000290935.2 IS075 Wild mammal (vole) (Ladeuze et al.
2011)
12 B. cereus III GCA 000290995.1 AND1407 Black currant (Hoton et al. 2009) (53)
12 B. cereus III GCA 000291235.1 MSX-A12 Not available (Van der Auwera et al.
2013)
12 B. cereus III GCA 000399205.1 IS845/00 Bank vole, Poland (Van der Auwera et al.
2013; I. Swiecicka and De Vos 2003)
12 B. cereus III GCA 000399225.1 IS195 Bank vole, Poland (Van der Auwera et al.
2013; I. Swiecicka and De Vos 2003)
12 B. cereus III GCA 000743195.1 F1-15 Foodborne source (Zhong et al. 2007)
12 B. cereus III GCA 001566375.1 MB.15 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566385.1 MB.18 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566435.1 MB.16 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566445.1 MB.17 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566455.1 MB.21 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566465.1 MB.8 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566515.1 MB.8-1 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566525.1 MB.20 Food, Munich, Germany (Crovadore et
al. 2016)
12 B. cereus III GCA 001566535.1 MB.22 Food, Munich, Germany (Crovadore et
al. 2016)
24 B. cereus VI GCA 000291155.1 MC67 Sandy loam, Denmark (Thorsen et al.
2006; Van der Auwera et al. 2013; Hen-
driksen, Hansen, and Johansen 2006)
24 B. cereus VI GCA 000291315.1 CER074 Raw milk (Hoton et al. 2009)
24 B. cereus VI GCA 000291335.1 CER057 Parsley (Hoton et al. 2009)
24 B. cereus VI GCA 000293605.1 BtB2-4 Forest soil (Hoton et al. 2009)
24 B. cereus VI GCA 000399245.1 MC118 Sandy loam, Denmark (Thorsen et al.
2006; Van der Auwera et al. 2013; Hen-
driksen, Hansen, and Johansen 2006)
aBDRD, Biological Defense Research Directorate
119
The other cluster in which all cesABCD genes were detected in all assemblies
was cluster 24 (n = 5; Figure 4.4B). This cluster contained 5 assemblies classified
as B. cereus, all of which belonged to panC clade VI (Table 4.5). Unlike cluster
12, hblCDAB genes were detected in all assemblies in this cluster (Figure 4.5).
The assemblies in this cluster originated from food and environmental isolates
(Table 4.5). Despite their assemblies being classified in the NCBI database as B.
cereus, all 5 strains in this cluster were classified as emetic B. weihenstephanensis
in their respective manuscripts, and all were capable of growth at 8◦C (Hoton
et al. 2009; Thorsen et al. 2006).
4.5 Discussion
4.5.1 Accessible whole-genome sequence analysis tools can fa-
cilitate improved taxonomic classification and characteri-
zation of B. cereus group isolate virulence potential
As whole-genome sequencing becomes more widely used in the realms of pub-
lic health and food safety, the ability to classify potential pathogenic microor-
ganisms quickly and effectively becomes increasingly important. A number of
bioinformatics tools already exist for this purpose, including SRST2, which can
be used to perform MLST and detect antimicrobial resistance genes using Illu-
mina reads (Inouye et al. 2014); SeqSero, which performs in silico serotyping
using Illumina reads or nucleotide assemblies from Salmonella enterica isolates
(Zhang et al. 2015); PlasmidFinder, which can be used to detect plasmids in iso-
120
lates using Illumina reads or nucleotide assemblies (Carattoli et al. 2014); and
VirulenceFinder, which can be used to detect virulence genes in Listeria mono-
cytogenes, Staphylococcus aureus, Escherichia coli, and Enterococcus (Joensen et al.
2014). Recently, methods such as in silico MLST and virulence gene detection
have been combined into single computational pipelines that can be used to
characterize numerous bacterial species (Thomsen et al. 2016). Here, we have
created a bioinformatics tool specific to the Bacillus cereus group that combines
virulence gene detection using a curated database of B. cereus virulence factors
with in silico manifestations of established molecular and virulence typing meth-
ods to phylogenetically classify and rapidly assess the virulence potential of any
B. cereus group isolate. Additionally, we have provided a companion applica-
tion, BMiner, that allows users to interact with data from hundreds of genomes
at once, which we anticipate will become increasingly valuable as more B. cereus
group genomes are sequenced.
The in silico typing methods employed by BTyper and other bioinformat-
ics tools are valuable from a public health and food safety perspective, due to
their (i) speed, as BTyper and similar tools can be used to perform gene detec-
tion and typing tasks in seconds using assembled genomes (Zhang et al. 2015;
Carattoli et al. 2014); (ii) scalability, with the ability to provide users with in-
formation about a single isolate or hundreds from the command line (Inouye
et al. 2014; Zhang et al. 2015); and (iii) ability to output concise and easily in-
terpretable summaries of large amounts of data (Inouye et al. 2014), making it
easy for a user to understand their results, share data with colleagues, and make
informed decisions about an isolate in question (i.e., is it pathogenic or not). Ad-
ditionally, the use of virulence gene-based typing as employed by BTyper offers
the advantage that isolates can be classified according to their virulence poten-
121
tial, which means that one does not have to make any prior assumptions about
the taxonomic classification of an isolate in question. This marks a valuable
step forward in distinguishing pathogenic B. cereus group isolates from their
nonpathogenic counterparts; however, marked improvements could be made
to BTyper and similar tools through the integration of phenotypic data. By as-
sociating genotypic characteristics of B. cereus group isolates with phenotypic
data, such as host illness and symptoms and growth temperature, BTyper and
other tools used to genotype foodborne pathogens may become more valuable
from a risk assessment perspective.
4.5.2 Analysis of publicly available B. cereus group assemblies
using BTyper and BMiner identifies virulence gene-based
clusters that capture phylogenetic heterogeneity in iso-
lates with similar phenotypes
Using the output of BTyper and BMiner, virulence gene profiles of 662 B. cereus
group genomes were assigned to one of 31 clusters by employing a k-medoids
approach, without making unnecessary prior assumptions about an assembly’s
taxonomic classification in the public domain. This allowed for the identifica-
tion of several well-defined clusters with clinical or taxonomic relevance, in-
cluding (i) fully virulent B. anthracis and B. anthracis-like B. cereus (cluster 1),
(ii) capABCDE-negative anthrax-causing B. cereus strains (cluster 22), (iii) B. an-
thracis with attenuated virulence (clusters 3 and 4), (iv) 2 emetic clusters (clus-
ters 12 and 24), and (v) B. cytotoxicus (cluster 31). The clustering of the emetic
122
assemblies into 2 separate clusters reflected the observed heterogeneity among
emetic strains of B. cereus and B. weihenstephanensis: Hoton et al. (Hoton et
al. 2009) described two distinct clusters formed by emetic toxin-producing B.
cereus group strains, with psychrotolerant B. weihenstephanensis strains belong-
ing to a distinct emetic cluster (referred to in its respective manuscript as cluster
II) (Hoton et al. 2009; Castiaux et al. 2014). Assemblies from these strains were
placed into a single cluster (k-medoids cluster 24) consisting of B. weihenstepha-
nensis assemblies belonging to panC clade VI, while members of Hoton et al.’s
emetic cluster I were placed into a second cluster (k-medoids cluster 12) contain-
ing assemblies belonging to panC clade III. For B. cytotoxicus, the two available
assemblies, both of which were the only panC clade VII representatives, were
placed into a single cluster composed of only themselves (k-medoids cluster
31), driven largely by their possession of cytK1, as described by Guinebretire et
al. (Guinebretiere, Velge, et al. 2010). For B. anthracis, strains possessing both
anthrax virulence plasmids (pXO1 and pXO2) were assigned to cluster 1, dis-
tinguishing them from attenuated strains in which one or neither plasmid was
detected, as well as B. cereus strains that caused anthrax-like disease (cluster 22).
Despite lacking the polyglutamate capsule genes capABCDE, B. cereus strains
in cluster 22 were able to cause anthrax-like symptoms using a second capsule
encoded by B. cereus exopolysaccharide genes bpsXABCDEFGH (bpsX-H) on a
different plasmid, pBC218 (Oh et al. 2011). The bpsX-H operon in its entirety
was detected in 4 of the 5 anthrax-causing, capABCDE-negative B. cereus assem-
blies in cluster 22 (all but strain BcFL2013) and in no other cluster. It is likely
that results like this from additional studies will be able to further resolve clade
assignments and disease phenotypes with BTyper; recently, Bazinet identified
numerous genes associated with phenotypic traits, such as anthrax and food
123
poisoning (Bazinet 2017). Here, we found associations between B. cereus group
virulence genes and the panC clade, and virulence gene heterogeneity within
disease phenotypes was identified. As more B. cereus group WGS and asso-
ciated metadata become available, the potential for identifying new virulence
alleles or phylogenetic markers that can further identify alleles or genes that are
not only associated with a particular disease, but with specific symptoms or a
clinical outcome using BTyper, becomes promising. For example, future work
will be needed to better define specific genetic markers that can classify B. cereus
group strains and clusters that are likely to cause diarrheal illnesses. Future epi-
demiological studies that assess the associations between different clusters and
disease outcomes and symptoms will also provide an opportunity to further de-
fine and refine the types of disease outcomes and public health risks associated
with different B. cereus group strains.
4.6 Acknowledgments
This material is based on work supported by the National Science Foundation
Graduate Research Fellowship Program under grant no. DGE-1144153. Partial
funding for this project was provided by the New York State Dairy Promotion
Advisory Board through the New York State Department of Agriculture and
Markets.
4.7 References
Aceves-Diez, Angel E., Kelly J. Estrada-Castaneda, and Laura M. Castaneda-
Sandoval (2015). “Use of Bacillus thuringiensis supernatant from a fermen-
124
tation process to improve bioremediation of chlorpyrifos in contaminated
soils”. In: Journal of Environmental Management 157, pp. 213–219.
Agren, Joakim, Maria Finn, Bjorn Bengtsson, and Bo Segerman (2014). “Mi-
croevolution during an Anthrax Outbreak Leading to Clonal Heterogene-
ity and Penicillin Resistance”. In: PLOS ONE 9.2, pp. 1–7. DOI: 10.1371/
journal.pone.0089112.
Ammons, David R. et al. (2016). “Anti-cancer Parasporin Toxins are Associ-
ated with Different Environments: Discovery of Two Novel Parasporin 5-like
Genes”. In: Current Microbiology 72, pp. 184–189. DOI: 10.1007/s00284-
015-0934-3.
Armada, Elisabeth, Rosario Azcon, Olga M. Lopez-Castillo, Monica Calvo-
Polanco, and Juan Manuel Ruiz-Lozano (2015). “Autochthonous arbuscular
mycorrhizal fungi and Bacillus thuringiensis from a degraded Mediterranean
area can be used to improve physiological traits and performance of a plant
of agronomic interest under drought conditions”. In: Plant Physiology and
Biochemistry 90, pp. 64–74.
Bache, Stefan Milton and Hadley Wickham (2014). magrittr: A Forward-Pipe Op-
erator for R. R package version 1.5.
Bankevich, A. et al. (2012). “SPAdes: a new genome assembly algorithm and
its applications to single-cell sequencing”. In: J Comput Biol 19.5, pp. 455–77.
DOI: 10.1089/cmb.2012.0021.
Barrett, T. et al. (2012). “BioProject and BioSample databases at NCBI: facilitat-
ing capture and organization of metadata”. In: Nucleic Acids Res 40.Database
issue, pp. D57–63. DOI: 10.1093/nar/gkr1163.
Bazinet, Adam L. (2017). “Pan-genome and phylogeny of Bacillus cereus sensu
lato”. In: BMC evolutionary biology 17.1, pp. 176–176. DOI: 10 . 1186 /
s12862-017-1020-1.
Bohm, M. E., C. Huptas, V. M. Krey, and S. Scherer (2015). “Massive horizontal
gene transfer, strictly vertical inheritance and ancient duplications differen-
125
tially shape the evolution of Bacillus cereus enterotoxin operons hbl, cytK and
nhe”. In: BMC Evol Biol 15, p. 246. DOI: 10.1186/s12862-015-0529-4.
Caamano-Antelo, S. et al. (2015). “Genetic discrimination of foodborne
pathogenic and spoilage Bacillus spp. based on three housekeeping genes”.
In: Food Microbiology 46, pp. 288–298.
Cachat, Elise, Margaret Barker, Timothy D. Read, and Fergus G. Priest (2008).
“A Bacillus thuringiensis strain producing a polyglutamate capsule resem-
bling that of Bacillus anthracis”. In: FEMS Microbiology Letters 285.2, pp. 220–
226. DOI: 10.1111/j.1574- 6968.2008.01231.x. eprint: https:
//onlinelibrary.wiley.com/doi/pdf/10.1111/j.1574-6968.
2008.01231.x.
Camacho, C. et al. (2009). “BLAST+: architecture and applications”. In: BMC
Bioinformatics 10, p. 421. DOI: 10.1186/1471-2105-10-421.
Candela, Thomas, Michele Mock, and Agnes Fouet (2005). “CapE, a 47-amino-
acid peptide, is necessary for Bacillus anthracis polyglutamate capsule syn-
thesis”. In: Journal of bacteriology 187.22, pp. 7765–7772. DOI: 10.1128/JB.
187.22.7765-7772.2005.
Carattoli, A. et al. (2014). “In silico detection and typing of plasmids using Plas-
midFinder and plasmid multilocus sequence typing”. In: Antimicrob Agents
Chemother 58.7, pp. 3895–903. DOI: 10.1128/AAC.02412-14.
Cardazzo, B. et al. (2008). “Multiple-locus sequence typing and analysis of toxin
genes in Bacillus cereus food-borne isolates”. In: Appl Environ Microbiol 74.3,
pp. 850–60. DOI: 10.1128/AEM.01495-07.
Castiaux, V. et al. (2014). “Diversity of pulsed-field gel electrophoresis patterns
of cereulide-producing isolates of Bacillus cereus and Bacillus weihenstepha-
nensis”. In: FEMS Microbiol Lett 353.2, pp. 124–31. DOI: 10.1111/1574-
6968.12423.
CDC. Anthrax. https://www.cdc.gov/anthrax/index.html.
126
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson
(2017). shiny: Web Application Framework for R. R package version 1.0.1.
Chen, M.L. and H.Y. Tsen (2002). “Discrimination of Bacillus cereus and Bacil-
lus thuringiensis with 16S rRNA and gyrB gene based PCR primers and se-
quencing of their annealing sites”. In: Journal of Applied Microbiology 92.5,
pp. 912–919. DOI: 10.1046/j.1365- 2672.2002.01606.x. eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1046/j.1365-
2672.2002.01606.x.
Cock, P. J. et al. (2009). “Biopython: freely available Python tools for com-
putational molecular biology and bioinformatics”. In: Bioinformatics 25.11,
pp. 1422–3. DOI: 10.1093/bioinformatics/btp163.
Cole, James R. et al. (2014). “Ribosomal Database Project: data and tools for
high throughput rRNA analysis”. In: Nucleic acids research 42.Database issue,
pp. D633–D642. DOI: 10.1093/nar/gkt1244.
Crovadore, Julien et al. (2016). “Whole-Genome Sequences of Seven Strains of
Bacillus cereus Isolated from Foodstuff or Poisoning Incidents”. In: Genome
announcements 4.3, e00435–16. DOI: 10.1128/genomeA.00435-16.
Dai, Zhihao, Jean-Claude Sirard, Michele Mock, and Theresa M. Koehler (1995).
“The atxA gene product activates transcription of the anthrax toxin genes
and is essential for virulence”. In: Molecular Microbiology 16.6, pp. 1171–1181.
DOI: 10.1111/j.1365-2958.1995.tb02340.x. eprint: https://
onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-2958.
1995.tb02340.x.
Dash, H.R., N. Mangwani, and S. Das (2014). “Characterization and poten-
tial application in mercury bioremediation of highly mercury-resistant ma-
rine bacterium Bacillus thuringiensis PW-05”. In: Environ Sci Pollut Res 21,
pp. 2642–2653. DOI: https://doi.org/10.1007/s11356-013-2206-
8.
Doll, Etienne V., Siegfried Scherer, and Mareike Wenning (2017). “Spoilage of
Microfiltered and Pasteurized Extended Shelf Life Milk Is Mainly Induced
by Psychrotolerant Spore-Forming Bacteria that often Originate from Re-
127
contamination”. In: Frontiers in microbiology 8, pp. 135–135. DOI: 10.3389/
fmicb.2017.00135.
Drewnowska, Justyna M. and Izabela Swiecicka (2013). “Eco-Genetic Struc-
ture of Bacillus cereus sensu lato Populations from Different Environments
in Northeastern Poland”. In: PLOS ONE 8.12, pp. 1–11. DOI: 10.1371/
journal.pone.0080175.
Duc, Le H., Huynh A. Hong, Teresa M. Barbosa, Adriano O. Henriques, and
Simon M. Cutting (2004). “Characterization of Bacillus probiotics available
for human use”. In: Applied and environmental microbiology 70.4, pp. 2161–
2171. DOI: 10.1128/aem.70.4.2161-2171.2004.
EFSA (2016). “Risks for public health related to the presence of Bacillus cereus
and other Bacillus spp. including Bacillus thuringiensis in foodstuffs”. In:
EFSA Journal 14.7, e04524. DOI: 10.2903/j.efsa.2016.4524. eprint:
https://efsa.onlinelibrary.wiley.com/doi/pdf/10.2903/j.
efsa.2016.4524.
Ehling-Schulz, M. and U. Messelhausser (2013). “Bacillus next generation di-
agnostics: moving from detection toward subtyping and risk-related strain
profiling”. In: Front Microbiol 4, p. 32. DOI: 10.3389/fmicb.2013.00032.
Fox, G. E., J. D. Wisotzkey, and Jr. Jurtshuk P. (1992). “How close is close: 16S
rRNA sequence identity may not be sufficient to guarantee species identity”.
In: Int J Syst Bacteriol 42.1, pp. 166–70. DOI: 10.1099/00207713-42-1-
166.
Gee, Jay E., Chung K. Marston, Scott A. Sammons, Mark A. Burroughs, and
Alex R. Hoffmaster (2014). “Draft Genome Sequence of Bacillus cereus Strain
BcFL2013, a Clinical Isolate Similar to G9241”. In: Genome announcements 2.3,
e00469–14. DOI: 10.1128/genomeA.00469-14.
Guinebretiere, M. H., S. Auger, et al. (2013). “Bacillus cytotoxicus sp. nov. is a
novel thermotolerant species of the Bacillus cereus Group occasionally asso-
ciated with food poisoning”. In: Int J Syst Evol Microbiol 63.Pt 1, pp. 31–40.
DOI: 10.1099/ijs.0.030627-0.
128
Guinebretiere, M. H., F. L. Thompson, et al. (2008). “Ecological diversification
in the Bacillus cereus Group”. In: Environ Microbiol 10.4, pp. 851–65. DOI: 10.
1111/j.1462-2920.2007.01495.x.
Guinebretiere, M. H., P. Velge, et al. (2010). “Ability of Bacillus cereus group
strains to cause food poisoning varies according to phylogenetic affiliation
(groups I to VII) rather than species affiliation”. In: J Clin Microbiol 48.9,
pp. 3388–91. DOI: 10.1128/JCM.00921-10.
Helgason, Erlendur, Nicolas J. Tourasse, Roger Meisal, Dominique A. Caugant,
and Anne-Brit Kolsto (2004). “Multilocus sequence typing scheme for bac-
teria of the Bacillus cereus group”. In: Applied and environmental microbiology
70.1, pp. 191–201. DOI: 10.1128/aem.70.1.191-201.2004.
Hendriksen, Niels Bohse, Bjarne Munk Hansen, and Jens Efsen Johansen (2006).
“Occurrence and pathogenic potential of Bacillus cereus group bacteria in a
sandy loam”. In: 89, pp. 239–249. DOI: https://doi.org/10.1007/
s10482-005-9025-y.
Hoffmaster, A. R. et al. (2008). “Genetic diversity of clinical isolates of Bacillus
cereus using multilocus sequence typing”. In: BMC Microbiol 8, p. 191. DOI:
10.1186/1471-2180-8-191.
Hoffmaster, Alex R. et al. (2004). “Identification of anthrax toxin genes in a
Bacillus cereus associated with an illness resembling inhalation anthrax”. In:
Proceedings of the National Academy of Sciences of the United States of America
101.22, pp. 8449–8454. DOI: 10.1073/pnas.0402414101.
Hong, Huynh A., Le Hong Duc, and Simon M. Cutting (2005). “The use of
bacterial spore formers as probiotics”. In: FEMS Microbiology Reviews 29.4,
pp. 813–835. DOI: 10.1016/j.femsre.2004.12.001. eprint: https:
//onlinelibrary.wiley.com/doi/pdf/10.1016/j.femsre.
2004.12.001.
Hoton, F. M. et al. (2009). “Family portrait of Bacillus cereus and Bacillus wei-
henstephanensis cereulide-producing strains”. In: Environ Microbiol Rep 1.3,
pp. 177–83. DOI: 10.1111/j.1758-2229.2009.00028.x.
129
Huys, Geert et al. (2013). “Microbial characterization of probiotics–advisory re-
port of the Working Group ”8651 Probiotics” of the Belgian Superior Health
Council (SHC)”. In: Molecular nutrition and food research 57.8, pp. 1479–1504.
DOI: 10.1002/mnfr.201300065.
Inouye, M. et al. (2014). “SRST2: Rapid genomic surveillance for public health
and hospital microbiology labs”. In: Genome Med 6.11, p. 90. DOI: 10.1186/
s13073-014-0090-6.
Ivy, R. A. et al. (2012). “Identification and characterization of psychrotolerant
sporeformers associated with fluid milk production and processing”. In:
Appl Environ Microbiol 78.6, pp. 1853–64. DOI: 10.1128/AEM.06536-11.
Jimenez, Guillermo, Anicet R. Blanch, Javier Tamames, and Ramon Rossello-
Mora (2013). “Complete Genome Sequence of Bacillus toyonensis BCT-7112T,
the Active Ingredient of the Feed Additive Preparation Toyocerin”. In:
Genome announcements 1.6, e01080–13. DOI: 10.1128/genomeA.01080-
13.
Jimenez, G. et al. (2013). “Description of Bacillus toyonensis sp. nov., a novel
species of the Bacillus cereus group, and pairwise genome comparisons of the
species of the group by means of ANI calculations”. In: Syst Appl Microbiol
36.6, pp. 383–91. DOI: 10.1016/j.syapm.2013.04.008.
Joensen, K. G. et al. (2014). “Real-time whole-genome sequencing for routine
typing, surveillance, and outbreak detection of verotoxigenic Escherichia
coli”. In: J Clin Microbiol 52.5, pp. 1501–10. DOI: 10.1128/JCM.03617-13.
Johnson, Shannon L. et al. (2015). “Finished Genome Sequence of Bacillus cereus
Strain 03BB87, a Clinical Isolate with B. anthracis Virulence Genes”. In:
Genome announcements 3.1, e01446–14. DOI: 10.1128/genomeA.01446-
14.
Jouzani, G.S., E. Valijanian, and R. Sharafi (2017). “Bacillus thuringiensis: a suc-
cessful insecticide with new environmental features and tidings”. In: Appl
Microbiol Biotechnol 101, pp. 2691–2711. DOI: 10 . 1007 / s00253 - 017 -
8175-y.
130
Klee, S. R. et al. (2010). “The genome of a Bacillus isolate causing anthrax in
chimpanzees combines chromosomal properties of B. cereus with B. anthracis
virulence plasmids”. In: PLoS One 5.7, e10986. DOI: 10.1371/journal.
pone.0010986.
Ko, K. S. et al. (2003). “Identification of Bacillus anthracis by rpoB sequence anal-
ysis and multiplex PCR”. In: J Clin Microbiol 41.7, pp. 2908–14.
Ko, Kwan Soo et al. (2004). “Population structure of the Bacillus cereus group
as determined by sequence analysis of six housekeeping genes and the plcR
Gene”. In: Infection and immunity 72.9, pp. 5253–5261. DOI: 10.1128/IAI.
72.9.5253-5261.2004.
Kodama, Y., M. Shumway, R. Leinonen, and Collaboration International Nu-
cleotide Sequence Database (2012). “The Sequence Read Archive: explosive
growth of sequencing data”. In: Nucleic Acids Res 40.Database issue, pp. D54–
6. DOI: 10.1093/nar/gkr854.
Kovac, J. et al. (2016). “Production of hemolysin BL by Bacillus cereus group iso-
lates of dairy origin is associated with whole-genome phylogenetic clade”.
In: BMC Genomics 17, p. 581. DOI: 10.1186/s12864-016-2883-z.
Ladeuze, Sandy, Nathalie Lentz, Laurence Delbrassinne, Xiaomin Hu, and
Jacques Mahillon (2011). “Antifungal Activity Displayed by Cereulide, the
Emetic Toxin Produced by Bacillus cereus”. In: Applied and Environmental Mi-
crobiology 77.7, pp. 2555–2558. DOI: 10 . 1128 / AEM . 02519 - 10. eprint:
https://aem.asm.org/content/77/7/2555.full.pdf.
Lechner, S. et al. (1998). “Bacillus weihenstephanensis sp. nov. is a new psychro-
tolerant species of the Bacillus cereus group”. In: Int J Syst Bacteriol 48 Pt 4,
pp. 1373–82. DOI: 10.1099/00207713-48-4-1373.
Lee, Hyungjae, John J. Churey, and Randy W. Worobo (2009). “Biosynthesis and
transcriptional analysis of thurincin H, a tandem repeated bacteriocin ge-
netic locus, produced by Bacillus thuringiensis SF361”. In: FEMS Microbiology
Letters 299.2, pp. 205–213. DOI: 10.1111/j.1574-6968.2009.01749.x.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.
1574-6968.2009.01749.x.
131
Leinonen, R., H. Sugawara, M. Shumway, and Collaboration International Nu-
cleotide Sequence Database (2011). “The sequence read archive”. In: Nucleic
Acids Res 39.Database issue, pp. D19–21. DOI: 10.1093/nar/gkq1019.
Lekota, Kgaugelo E. et al. (2015). “Draft Genome Sequences of Two South
African Bacillus anthracis Strains”. In: Genome announcements 3.6, e01313–15.
DOI: 10.1128/genomeA.01313-15.
Liu, Y. et al. (2015a). “Genomic insights into the taxonomic status of the Bacillus
cereus group”. In: Sci Rep 5, p. 14082. DOI: 10.1038/srep14082.
— (2015b). “Genomic insights into the taxonomic status of the Bacillus cereus
group”. In: Sci Rep 5, p. 14082. DOI: 10.1038/srep14082.
Logan Niall A., Paul De Vos (2015). “Bacillus”. In: Bergey’s Manual of Systematics
of Archaea and Bacteria. John Wiley and Sons, Inc., pp. 1–163. DOI: doi:10.
1002/9781118960608.gbm00530.
Lucking, Genia, Marina Stoeckel, Zeynep Atamer, Jorg Hinrichs, and Monika
Ehling-Schulz (2013). “Characterization of aerobic spore-forming bacte-
ria associated with industrial dairy processing environments and product
spoilage”. In: International Journal of Food Microbiology 166.2, pp. 270–279.
Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik
(2017). cluster: Cluster Analysis Basics and Extensions. R package version 2.0.6.
Martinez, Bismarck A., Jayne Stratton, and Andreia Bianchini (2017). “Isola-
tion and genetic identification of spore-forming bacteria associated with
concentrated-milk processing in Nebraska”. In: Journal of Dairy Science 100.2,
pp. 919–932. DOI: 10.3168/jds.2016-11660.
Miller, R. A., S. M. Beno, et al. (2016). “Bacillus wiedmannii sp. nov., a psychro-
tolerant and cytotoxic Bacillus cereus group species isolated from dairy foods
and dairy environments”. In: Int J Syst Evol Microbiol 66.11, pp. 4744–4753.
DOI: 10.1099/ijsem.0.001421.
Miller, R. A., J. Jian, S. M. Beno, M. Wiedmann, and J. Kovac (2018). “Intraclade
Variability in Toxin Production and Cytotoxicity of Bacillus cereus Group
132
Type Strains and Dairy-Associated Isolates”. In: Appl Environ Microbiol 84.6.
DOI: 10.1128/AEM.02479-17.
Miller, R. A., D. J. Kent, et al. (2015). “Spore populations among bulk tank raw
milk and dairy powders are significantly different”. In: J Dairy Sci 98.12,
pp. 8492–504. DOI: 10.3168/jds.2015-9943.
Nakamura, L. K. (1998). “Bacillus pseudomycoides sp. nov”. In: Int J Syst Bacteriol
48 Pt 3, pp. 1031–5. DOI: 10.1099/00207713-48-3-1031.
Oh, So-Young, Jonathan M. Budzik, Gabriella Garufi, and Olaf Schneewind
(2011). “Two capsular polysaccharides enable Bacillus cereus G9241 to cause
anthrax-like disease”. In: Molecular microbiology 80.2, pp. 455–470. DOI: 10.
1111/j.1365-2958.2011.07582.x.
Ohba, Michio, Eiichi Mizuki, and Akiko Uemori (2009). “Parasporin, a New An-
ticancer Protein Group from Bacillus thuringiensis”. In: Anticancer Research
29.1, pp. 427–433. eprint: http://ar.iiarjournals.org/content/
29/1/427.full.pdf+html.
Okinaka, Richard T. et al. (2014). “Genome Sequence of Bacillus anthracis STI, a
Sterne-Like Georgian/Soviet Vaccine Strain”. In: Genome announcements 2.5,
e00853–14. DOI: 10.1128/genomeA.00853-14.
Oksanen, Jari et al. (2017). vegan: Community Ecology Package. R package version
2.4-2.
Pena-Gonzalez, Angela et al. (2017). “Draft Genome Sequence of Bacillus cereus
LA2007, a Human-Pathogenic Isolate Harboring Anthrax-Like Plasmids”.
In: Genome announcements 5.16, e00181–17. DOI: 10 . 1128 / genomeA .
00181-17.
Price, Lance B. et al. (2012). “Staphylococcus aureus CC398: Host Adaptation and
Emergence of Methicillin Resistance in Livestock”. In: mBio 3.1. Ed. by Fer-
nando Baquero. DOI: 10.1128/mBio.00305-11. eprint: https://mbio.
asm.org/content/3/1/e00305-11.full.pdf.
133
Pruss, B. M., R. Dietrich, B. Nibler, E. Martlbauer, and S. Scherer (1999). “The
hemolytic enterotoxin HBL is broadly distributed among species of the Bacil-
lus cereus group”. In: Appl Environ Microbiol 65.12, pp. 5436–42.
R Core Team (2016). R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing. Vienna, Austria.
Rasko, David A., Michael R. Altherr, Cliff S. Han, and Jacques Ravel (2005).
“Genomics of the Bacillus cereus group of organisms”. In: FEMS Microbiol-
ogy Reviews 29.2, pp. 303–329. DOI: 10.1016/j.fmrre.2004.12.005.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1016/j.
fmrre.2004.12.005.
Rosenquist, H., L. Smidt, S. R. Andersen, G. B. Jensen, and A. Wilcks (2005).
“Occurrence and significance of Bacillus cereus and Bacillus thuringiensis in
ready-to-eat food”. In: FEMS Microbiol Lett 250.1, pp. 129–36. DOI: 10.1016/
j.femsle.2005.06.054.
Rossi-Tamisier, M., S. Benamar, D. Raoult, and P. E. Fournier (2015). “Caution-
ary tale of using 16S rRNA gene sequence similarity values in identification
of human-associated bacterial species”. In: Int J Syst Evol Microbiol 65.Pt 6,
pp. 1929–34. DOI: 10.1099/ijs.0.000161.
Ruckert, Christian et al. (2012). “Draft Genome Sequence of Bacillus anthracis
UR-1, Isolated from a German Heroin User”. In: Journal of Bacteriology 194.21,
pp. 5997–5998. DOI: 10.1128/JB.01410-12. eprint: https://jb.asm.
org/content/194/21/5997.full.pdf.
Schmid, Daniela et al. (2016). “Elucidation of enterotoxigenic Bacillus cereus out-
breaks in Austria by complementary epidemiological and microbiological
investigations, 2013”. In: International Journal of Food Microbiology 232, pp. 80–
86.
Slowikowski, Kamil (2016). ggrepel: Automatically Position Non-Overlapping Text
Labels with ’ggplot2’. R package version 0.6.5.
Sorokin, Alexei et al. (2006). “Multiple-locus sequence typing analysis of Bacil-
lus cereus and Bacillus thuringiensis reveals separate clustering and a distinct
134
population structure of psychrotrophic strains”. In: Applied and environmen-
tal microbiology 72.2, pp. 1569–1578. DOI: 10.1128/AEM.72.2.1569-
1578.2006.
Stenfors Arnesen, L. P., A. Fagerlund, and P. E. Granum (2008). “From soil to
gut: Bacillus cereus and its food poisoning toxins”. In: FEMS Microbiol Rev
32.4, pp. 579–606. DOI: 10.1111/j.1574-6976.2008.00112.x.
Swiecicka, I. and P. De Vos (2003). “Properties of Bacillus thuringiensis iso-
lated from bank voles”. In: Journal of Applied Microbiology 94.1, pp. 60–64.
DOI: 10 . 1046 / j . 1365 - 2672 . 2003 . 01790 . x. eprint: https : / /
onlinelibrary.wiley.com/doi/pdf/10.1046/j.1365-2672.
2003.01790.x.
Swiecicka, Izabela, Geraldine A. Van der Auwera, and Jacques Mahillon (2006).
“Hemolytic and Nonhemolytic Enterotoxin Genes are Broadly Distributed
among Bacillus thuringiensis Isolated from Wild Mammals”. In: Microbial
Ecology 52, pp. 544–551. DOI: https://doi.org/10.1007/s00248-
006-9122-0.
Takeno, Akira et al. (2012). “Complete genome sequence of Bacillus cereus
NC7401, which produces high levels of the emetic toxin cereulide”. In: Jour-
nal of bacteriology 194.17, pp. 4767–4768. DOI: 10.1128/JB.01015-12.
Tallent, S. M., K. M. Kotewicz, E. A. Strain, and R. W. Bennett (2012). “Efficient
Isolation and Identification of Bacillus cereus Group”. In: Journal of Aoac Inter-
national 95.2, pp. 446–451. DOI: 10.5740/jaoacint.11-251.
Terzi, Britta von, Peter C. B. Turnbull, Steve E. Bellan, and Wolfgang Beyer
(2014). “Failure of Sterne- and Pasteur-Like Strains of Bacillus anthracis to
Replicate and Survive in the Urban Bluebottle Blow Fly Calliphora vicina un-
der Laboratory Conditions”. In: PLOS ONE 9.1, pp. 1–7. DOI: 10.1371/
journal.pone.0083860.
Thomsen, M. C. et al. (2016). “A Bacterial Analysis Platform: An Integrated Sys-
tem for Analysing Bacterial Whole Genome Sequencing Data for Clinical
Diagnostics and Surveillance”. In: PLoS One 11.6, e0157718. DOI: 10.1371/
journal.pone.0157718.
135
Thorsen, L. et al. (2006). “Characterization of emetic Bacillus weihenstephanen-
sis, a new cereulide-producing bacterium”. In: Appl Environ Microbiol 72.7,
pp. 5118–21. DOI: 10.1128/AEM.00170-06.
Tourasse, Nicolas J. et al. (2011). “Extended and global phylogenetic view of the
Bacillus cereus group population by combination of MLST, AFLP, and MLEE
genotyping data”. In: Food Microbiology 28.2, pp. 236–244.
Van der Auwera, Geraldine A., Michael Feldgarden, Roberto Kolter, and
Jacques Mahillon (2013). “Whole-Genome Sequences of 94 Environmental
Isolates of Bacillus cereus Sensu Lato”. In: Genome announcements 1.5, e00380–
13. DOI: 10.1128/genomeA.00380-13.
Vangay, P., E. B. Fugett, Q. Sun, and M. Wiedmann (2013). “Food microbe
tracker: a web-based tool for storage and comparison of food-associated mi-
crobes”. In: J Food Prot 76.2, pp. 283–94. DOI: 10.4315/0362-028X.JFP-
12-276.
Wang, Gaoyan et al. (2014). “Bactericidal thurincin H causes unique morpholog-
ical changes in Bacillus cereus F4552 without affecting membrane permeabil-
ity”. In: FEMS Microbiology Letters 357.1, pp. 69–76. DOI: 10.1111/1574-
6968.12486. eprint: https://onlinelibrary.wiley.com/doi/
pdf/10.1111/1574-6968.12486.
Warda, Alicja K. et al. (2016). “Linking Bacillus cereus Genotypes and Carbohy-
drate Utilization Capacity”. In: PloS one 11.6, e0156796–e0156796. DOI: 10.
1371/journal.pone.0156796.
Wickham, Hadley (2009). Ggplot2 : elegant graphics for data analysis. Use R! New
York: Springer, viii, 212 p.
— (2011). “The Split-Apply-Combine Strategy for Data Analysis”. In: 2011 40.1,
p. 29. DOI: 10.18637/jss.v040.i01.
— (2017). stringr: Simple, Consistent Wrappers for Common String Operations. R
package version 1.2.0.
136
Wickham, Hadley, Romain Francois, Lionel Henry, and Kirill Muller (2016).
dplyr: A Grammar of Data Manipulation. R package version 0.5.0.
Wickham, Hadley, Jim Hester, and Romain Francois (2017). readr: Read Rectan-
gular Text Data. R package version 1.1.0.
Yang, Yong, Hua Gu, et al. (2016). “Genotypic heterogeneity of emetic toxin
producing Bacillus cereus isolates from China”. In: FEMS Microbiology Letters
364.1. DOI: 10.1093/femsle/fnw237. eprint: http://oup.prod.sis.
lan/femsle/article-pdf/364/1/fnw237/23928498/fnw237.pdf.
Yang, Yong, Xiaofeng Yu, et al. (2017). “Multilocus sequence type profiles of
Bacillus cereus isolates from infant formula in China”. In: Food Microbiology
62, pp. 46–50.
Zhang, S. et al. (2015). “Salmonella serotype determination utilizing high-
throughput genome sequencing data”. In: J Clin Microbiol 53.5, pp. 1685–92.
DOI: 10.1128/JCM.00323-15.
Zhong, Wenwan, Yulin Shou, Thomas M. Yoshida, and Babetta L. Marrone
(2007). “Differentiation of Bacillus anthracis, B. cereus, and B. thuringiensis by
Using Pulsed-Field Gel Electrophoresis”. In: Applied and Environmental Mi-
crobiology 73.10, pp. 3446–3449. DOI: 10.1128/AEM.02478- 06. eprint:
https://aem.asm.org/content/73/10/3446.full.pdf.
Zhu, Kui et al. (2016). “Probiotic Bacillus cereus Strains, a Potential Risk for Public
Health in China”. In: Frontiers in microbiology 7, pp. 718–718. DOI: 10.3389/
fmicb.2016.00718.
Zwick, M. E. et al. (2012). “Genomic characterization of the Bacillus cereus sensu
lato species: backdrop to the evolution of Bacillus anthracis”. In: Genome Res
22.8, pp. 1512–24. DOI: 10.1101/gr.134437.111.
137
CHAPTER 5
CHARACTERIZATION OF EMETIC AND DIARRHEAL BACILLUS
CEREUS STRAINS FROM A 2016 FOODBORNE OUTBREAK USING
WHOLE-GENOME SEQUENCING: ADDRESSING THE
MICROBIOLOGICAL, EPIDEMIOLOGICAL, AND BIOINFORMATIC
CHALLENGES 1
1FROM CARROLL, LAURA M., MARTIN WIEDMANN, MANJARI MUKHERJEE, DAVID
C. NICHOLAS, LISA A. MINGLE, NELLIE B. DUMAS, JOCELYN A. COLE, AND JASNA
KOVAC (2019). ”CHARACTERIZATION OF EMETIC AND DIARRHEAL BACILLUS
CEREUS STRAINS FROM A 2016 FOODBORNE OUTBREAK USING WHOLE-GENOME SE-
QUENCING: ADDRESSING THE MICROBIOLOGICAL, EPIDEMIOLOGICAL, AND BIOIN-
FORMATIC CHALLENGES”. IN: FRONTIERS IN MICROBIOLOGY 10, PP. 144. DOI:
10.3389/FMICB.2019.00144.
138
5.1 Abstract
The Bacillus cereus group comprises multiple species capable of causing emetic
or diarrheal foodborne illness. Despite being responsible for tens of thousands
of illnesses each year in the U.S. alone, whole-genome sequencing (WGS) is not
yet routinely employed to characterize B. cereus group isolates from foodborne
outbreaks. Here, we describe the first WGS-based characterization of isolates
linked to an outbreak caused by members of the B. cereus group. In conjunc-
tion with a 2016 outbreak traced to a supplier of refried beans served by a fast
food restaurant chain in upstate New York, a total of 33 B. cereus group isolates
were obtained from human cases (n = 7) and food samples (n = 26). Emetic
(n = 30) and diarrheal (n = 3) isolates were most closely related to B. paranthracis
(group III) and B. cereus sensu stricto (group IV), respectively. WGS indicated
that the 30 emetic isolates (24 and 6 from food and humans, respectively) were
closely related and formed a well-supported clade distinct from publicly avail-
able emetic group III genomes with an identical sequence type (ST 26). The 30
emetic group III isolates from this outbreak differed from each other by a mean
of 8.3 to 11.9 core single nucleotide polymorphisms (SNPs), while differing from
publicly available emetic group III ST 26 B. cereus group genomes by a mean of
301.7 to 528.0 core SNPs, depending on the SNP calling methodology used. Us-
ing a WST-1 cell proliferation assay, the strains isolated from this outbreak had
only mild detrimental effects on HeLa cell metabolic activity compared to ref-
erence diarrheal strain B. cereus ATCC 14579. We hypothesize that the outbreak
was a single source outbreak caused by emetic group III B. cereus belonging to
the B. paranthracis species, although food samples were not tested for presence
of the emetic toxin cereulide. In addition to showcasing how WGS can be used
139
to characterize B. cereus group strains linked to a foodborne outbreak, we also
discuss potential microbiological and epidemiological challenges presented by
B. cereus group outbreaks, and we offer recommendations for analyzing WGS
data from the isolates associated with them.
5.2 Introduction
The Bacillus cereus (B. cereus) group, also known as B. cereus sensu lato (s.l.) is
a complex of closely related species that vary in their ability to cause disease
in humans. Foodborne illness caused by members of the group primarily mani-
fests itself in one of two forms: (i) emetic intoxication that is caused by cereulide,
a heat-stable toxin produced by B. cereus within a food matrix prior to consump-
tion, or (ii) a diarrheal toxicoinfection, caused by enterotoxins produced by bac-
teria in the small intestine of the host (Ehling-Schulz, Fricker, and Scherer 2004;
Schoeni and Wong 2005; Stenfors Arnesen, Fagerlund, and Granum 2008). Here
we refer to isolates that carry ces genes encoding the cereulide biosynthetic path-
way as emetic isolates, and isolates that lack ces genes but carry either hbl or
cytK-2 genes that encode diarrheal enterotoxins as diarrheal isolates. The gene
variant cytK-2 was included in this definition, as it was previously found in
non-B. cytotoxicus isolates associated with diarrheal illness (Castiaux et al. 2015;
Miller, Jian, et al. 2018). The presence of nhe genes was not included in our
present definition of diarrheal isolates, due to the fact that nhe genes are ubiq-
uitously found in the majority of the B. cereus group population (Carroll et al.
2017; Miller, Jian, et al. 2018), including all isolates in the present study, and
their contribution to diarrheal toxicoinfection is not yet fully understood (Doll,
Ehling-Schulz, and Vogelmann 2013).
140
As foodborne pathogens, members of the B. cereus group are estimated to
cause 63,400 foodborne disease cases per year in the U.S. (Scallan et al. 2011) and
are confirmed or suspected to have been responsible for 235 outbreaks reported
in the U.S. between 1998 and 2008 (Bennett, K. A. Walsh, and Gould 2013). Due
in part to its typically self-limiting nature, foodborne illness caused by mem-
bers of the B. cereus group is under-reported (Granum and Lund 1997; Stenfors
Arnesen, Fagerlund, and Granum 2008), although severe infections resulting in
patient death have been reported (Naranjo et al. 2011; Sanaei-Zadeh 2012; Lotte
et al. 2017). Furthermore, B. cereus group isolates that have been linked to hu-
man clinical cases of foodborne disease rarely undergo whole-genome sequenc-
ing (WGS), as is becoming the norm for other foodborne pathogens (Joensen
et al. 2014; Ashton et al. 2015; Moura et al. 2017).
Here, we describe a foodborne outbreak caused by members of the B. cereus
group in which WGS was implemented to characterize isolates from human
clinical cases and food. To our knowledge, this is the first description of a B.
cereus outbreak in which WGS was employed to characterize isolates. By testing
various combinations of variant calling methodologies, we showcase how dif-
ferent bioinformatics pipelines can yield vastly different results when pairwise
SNP differences are the desired metric for determining whether an isolate is part
of an outbreak or not. In addition to discussing the bioinformatic challenges, we
examine potential microbiological and epidemiological obstacles that can hin-
der characterization of B. cereus group isolates from suspected foodborne out-
breaks, and we offer recommendations to guide the characterization of future
B. cereus group outbreaks using WGS.
141
5.3 Materials and Methods
5.3.1 Collection of Epidemiological Data
Epidemiological investigations were coordinated by the New York State De-
partment of Health (NYSDOH), and the outbreak was reported to the U.S. Cen-
ters for Disease Control and Prevention (CDC). Investigation methods included
(i) a cohort study, (ii) food preparation review, (iii) an investigation at a fac-
tory/production/treatment plant, (iv) food product traceback, and (v) environ-
ment/food/water sample testing.
5.3.2 Isolation and Initial Characterization of B. cereus Strains
Stool specimens were plated directly onto mannitol-egg yolk-polymyxin (MYP)
agar and incubated aerobically at 37◦C for 24 h. Food samples were diluted
1:10 in 1 X PBS, pH 7.4 in a filter bag for homogenizer blenders and homoge-
nized for 2 min. One hundred µl of each homogenized sample were plated onto
MYP agar and incubated aerobically at 37◦C for 24 h. The MYP agar plates for
both the stool specimens and food samples were observed after the 24 h incuba-
tion period. Individual B. cereus-like colonies (i.e., pink colored and lecithinase
positive) were subcultured on trypticase soy agar (TSA) plates supplemented
with 5% sheep blood and incubated aerobically at 37◦C for 18-24 h. These iso-
lates were identified as B. cereus using the following conventional microbiolog-
ical techniques: Gram stain, colony morphology, hemolysis, motility, and spore
stain. To test for the presence of parasporal crystals often associated with B.
thuringiensis, isolates were cultured for 48 h at 37◦C on sporulation agar slants.
142
Smears were prepared, and slides were heat fixed and then stained using mala-
chite green and counter stained with carbol fuchsin (Tallent, Rhodehamel, et
al. 1998). Slides were then observed for the presence or absence of parasporal
crystals.
5.3.3 rpoB Allelic Typing
The 33 outbreak isolates were streaked onto brain heart infusion (BHI) agar
from their respective cryo stocks stored at −80◦C and incubated overnight at
37◦C. Single isolated colonies were inoculated in 5 ml BHI broth and incubated
overnight at 32◦C and used for genomic DNA extraction using Qiagen DNeasy
blood and tissue kits (Qiagen). Extracted DNA was used as a template in a
PCR reaction using primers targeting a 750 bp sequence of the rpoB gene (Rzr-
poBF: AARYTIGGMCCTGAAGAAAT and RZrpoBR: TGIARTTRTCATCAAC-
CATGTG) (Ivy et al. 2012). PCR was carried out in 25 µl reactions using GoTaq
Green Master Mix (Promega Corporation) under the following thermal cycling
conditions: 3 min at 94◦C, followed by 40 cycles of 30 s at 94◦C, 30 s at 55− 45◦C
(in the first 20 cycles, the temperature was reduced for 0.5◦C per cycle and then
kept at 45◦C in the following 20 cycles), followed by 1 min at 72◦C, and a final
hold at 4◦C. The resulting PCR product was used for genotyping and prelimi-
nary species identification using the rpoB allele type database available in Food
Microbe Tracker (Ivy et al. 2012; Vangay et al. 2013).
143
5.3.4 Bacterial Growth Conditions and Collection of Bacterial
Supernatants
The 33 outbreak isolates, as well as B. cereus s.s. type strain ATCC 14579 and
B. cereus emetic reference strain DSM 4312 (Food Microbe Tracker ID FSL M8-
0547) (Vangay et al. 2013) were streaked onto BHI agar from their respective
cryo stocks stored at −80◦C. For immunoassays and cytotoxicity assays (see sec-
tions ”Hemolysin BL and Non-hemolytic Enterotoxin Detection” and ”WST-1
Metabolic Activity Assay”), cultures grown from single isolated colonies for 18
h at 37◦C without shaking were used for inoculation of fresh BHI broth. Fresh
cultures were grown to early stationary phase as determined by an OD600 of
∼ 1.5, which equals ∼ 108 CFU/ml. After incubation, growth was quenched by
placing cultures on ice. The cultures were then spun down at 16,000 g for 2 min,
and the supernatants were collected, aliquoted in duplicate, and stored at −80◦C
until further use in cytotoxicity assays.
5.3.5 Hemolysin BL and Non-hemolytic Enterotoxin Detection
Diarrheal strains grown as described above were used for qualitative detec-
tion of hemolysin BL (Hbl) and non-hemolytic enterotoxins (Nhe) with the
Duopath Cereus Enterotoxins immunoassay (Merck). Only select representa-
tives of emetic outbreak strains were tested (i.e., FSL R9-6381, FSL R9-6382, FSL
R9-6384, FSL R9-6389, FSL R9-6395, and FSL R9-6399), as they did not carry
genes encoding Hbl and were therefore not expected to produce Hbl. Briefly,
the temperatures of the cultures and immunoassay kits were adjusted to room
temperature. 150 µl of each isolate culture were added to the immunoassay port,
144
following the manufacturer’s instructions. The results were read as positive if
a red test line was visible after a 30-min incubation at room temperature. Tests
were considered valid only when control lines were visible.
5.3.6 WST-1 Metabolic Activity Assay
HeLa cells were seeded in 96-well plates at a seeding density of 8 × 104cells/cm2
(Fisichella et al. 2009) in Eagle’s minimum essential medium (EMEM) supple-
mented with 10% fetal bovine serum (FBS) and allowed to grow for 18-24 h at
37◦C, 5% CO2. After incubation, the medium in each well was replaced with
100 µl of fresh medium containing 5% v/v of bacterial supernatants (prepared
as described in section ”Bacterial Growth Conditions and Collection of Bacte-
rial Supernatants”) that were thawed and pre-warmed to 37◦C. The combined
medium and supernatants were added to the cells using a multichannel pipettor
to minimize the variability in the duration of cell exposure to the toxin amongst
wells of a 96-well plate. Medium containing 5% BHI was used as a negative
control. Medium containing 5% v/v of 1% Triton X-100 dissolved in BHI (final
concentration in the test well was 0.05%) was used as a positive control expected
to significantly reduce the viability of HeLa cells. After 15 min of intoxication at
37◦C, 5% CO2 (Miller, Jian, et al. 2018), 10 µl of WST-1 dye solution (Roche) was
added to each well of the plate, and the plate was incubated for 25 min at 37◦C,
5% CO2, resulting in a total of 40 min exposure of cells to the supernatants. Af-
ter 30 s of orbital shaking at 600 rpm, the absorbances were read by a microplate
reader (Thermo Scientific Multiskan GO, Thermo Fisher Scientific) in a preci-
sion mode at 450 and 690 nm, the latter being subtracted from the former to
account for the background signal (i.e., corrected absorbances) (Fisichella et al.
145
2009). Each test, including 0.05% Triton X-100, was conducted with six technical
replicates and on two different HeLa passages using supernatants from single
biological replicates, resulting in a total of 12 technical replicates per isolate. The
viability of cells was determined by calculating a ratio of corrected absorbances
to that of BHI, converting to percentages, and calculating the mean of the tech-
nical replicates for each isolate. The results were compared to the results for
cells treated with (i) 0.05% Triton X-100, (ii) B. cereus s.s. type strain ATCC 14579
supernatant (i.e., reference for diarrheal strains), and (iii) B. cereus group strain
DSM 4312 supernatant (i.e., reference for emetic strains).
5.3.7 Statistical Analysis of Cytotoxicity Data
A Welch’s test and the Games-Howell post-hoc test that are appropriate for anal-
yses of data with non-homogeneous variances were performed using results of
all 12 technical replicates of each outbreak-associated isolate, as well as the ref-
erence strains and the positive control. For the Games-Howell test, a Bonferroni
correction was applied to correct for multiple comparisons. Statistical analyses
were carried out in R version 3.4.3 (R Core Team 2018).
5.3.8 Whole-Genome Sequencing
Genomic DNA was extracted from overnight cultures (∼ 18 h) grown in BHI
at 32◦C using Qiagen DNeasy blood and tissue kits (Qiagen) or the Omega
E.Z.N.A. Bacterial DNA kit (Omega) following the manufacturers’ instructions.
For the E.Z.N.A. Bacterial DNA kit, the additional steps recommended for
146
difficult-to-lyse bacteria were taken to obtain sufficient DNA yield. Briefly, one
ml of an overnight culture was additionally treated with glass beads provided
in the E.Z.N.A. kit. DNA was quantified using Qubit 3 and used for Nextera
XT library preparation (Illumina). Pooled libraries were sequenced in two Illu-
mina sequencing runs with either 2 x 250 or 2 x 300 bp reads at the Penn State
Genomics Core Facility and at the Cornell Animal Health Diagnostic Center.
5.3.9 Initial Data Processing and Genome Assembly
Illumina adapters and low-quality bases were trimmed using Trimmo-
matic version 0.36 (Bolger, Lohse, and Usadel 2014) and the de-
fault parameters for Nextera paired-end reads, and FastQC version 0.11.5
(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to
confirm that read quality was adequate (e.g., no reads flagged as poor qual-
ity, no Illumina adapters present). Genomes listed in Supplementary Table S1
were assembled de novo using SPAdes version 3.11.0 (Bankevich et al. 2012),
and average per-base coverage was calculated using Samtools version 1.6 (H.
Li, Handsaker, et al. 2009) after mapping reads to their respective de novo assem-
blies using BWA MEM version 0.7.13 (default parameters) (H. Li and Durbin
2010).
5.3.10 In silico Typing and Virulence Gene Detection
BTyper version 2.2.0 (Carroll et al. 2017) was used to perform in silico virulence
gene detection, multi-locus sequence typing (MLST), panC group assignment (as
147
defined by Guinebretiere et al., 2010), and rpoB allelic typing, as well as to extract
the gene sequences for all detected loci (Guinebretiere, Velge, et al. 2010). For
virulence gene detection, the default settings were used (i.e., 50% amino acid
sequence identity, 70% query coverage), as these cut-offs have been shown to
correlate with PCR-based detection of virulence genes in B. cereus group isolates
(J. Kovac et al. 2016; Carroll et al. 2017). BMiner version 2.0.2 (Carroll et al. 2017)
was used to aggregate the output files from BTyper and create a virulence gene
presence/absence matrix.
5.3.11 Construction of k-mer Based Phylogeny Using Outbreak
Strains and Genomes of 18 B. cereus Group Species
kSNP version 3.1 (Gardner and Hall 2013; Gardner, Slezak, and Hall 2015) was
used to produce a set of core SNPs among the 33 outbreak genomes, plus a
type strain or RefSeq reference genome assembly from each of the 18 B. cereus
group species listed in Supplementary Table S2 (Stenfors Arnesen, Fagerlund,
and Granum 2008; Guinebretiere, Auger, et al. 2013; Jimenez et al. 2013; Miller,
Beno, et al. 2016; Liu et al. 2017), using the optimal k-mer size as determined
by Kchooser (k = 21). The resulting core SNPs were used in conjunction with
RAxML version 8.2.11 (Stamatakis 2014) to construct a maximum likelihood
(ML) phylogeny using the GTRCAT model with a Lewis ascertainment bias
correction (Lewis 2001) to account for the use of solely variant sites, and 500
bootstrap replicates. The resulting phylogenetic tree was formatted using the
phylobase (R Hackathon et al. 2019), ggtree (Yu et al. 2017), phytools (Rev-
ell 2012), and ape (Paradis, Claude, and Strimmer 2004) packages in R version
148
3.4.3.
5.3.12 Variant Calling and Phylogeny Construction Using Out-
break Isolates
Combinations of five reference-based variant calling pipelines (Table 5.1) and
reference genomes (Table 5.2), as well as one reference-free SNP calling pipeline
(Table 5.1), were used to separately identify core and total SNPs among (i) all
33 outbreak-related isolates (30 emetic group III isolates and three group IV
isolates) and (ii) the subset of 30 emetic group III isolates. For the subset of
30 emetic group III isolates, all reference-based variant calling pipelines de-
scribed below were additionally run with dustmasked versions of the refer-
ence genomes listed in Table 5.2, in which DustMasker version 1.0.0 (part of
BLAST version 2.6.0) (Morgulis et al. 2006) was used to mask low-complexity
portions (i.e., intervals with highly biased nucleotide distributions which can
bias sequence similarity searches) in each reference genome (Ye, McGinnis, and
Madden 2006).
Table 5.1: Description of variant calling pipelines and associated input data
formats tested in this study.
Pipelinea Approach Reference- Input data Read mapper Variant Reference(s) and in-depth pipeline descriptions
based (file format)b caller
CFSAN Read map- Yes PE reads Bowtie2 Varscan https://snp-pipeline.readthedocs.io/en/latest/
ping (fastq)
Freebayes Read map- Yes PE reads BWA MEM Freebayes https://github.com/lmc297/SNPBac
ping (fastq)
kSNP3 k-mer based No Contigs Not applica- kSNP3 https://sourceforge.net/projects/ksnp/files/
(fasta) ble
LYVE-SET Read map- Yes PE reads SMALT Varscan https://github.com/lskatz/lyve-SET
ping (fastq)
Parsnp Core genome Yes Contigs Not applica- Parsnp https://harvest.readthedocs.io/en/latest/content/
alignment (fasta) ble parsnp.html
Samtools Read map- Yes PE reads BWA MEM Samtools/ https://github.com/lmc297/SNPBac
ping (fastq) Bcftools
aCFSAN, U.S. Food and Drug Administration (FDA) Center for Food Safety and Applied Nutrition SNP pipeline; LYVE-SET, U.S. Centers
for Disease Control and Prevention (CDC) Listeria, Yersinia, Vibrio, and Enterobacteriaceae SNP Extraction Tool
bPE reads, Illumina paired-end reads
149
Table 5.2: Reference genomes used for reference-based variant calling in this
study.
Reference Phylogenetic Data set(s)b ANI rangec NCBI acces- Assembly Rationale for selection
genome groupa sion number level
B. cereus IV All 33 iso- 98.8-98.9 NC 004722.1 Complete B. cereus s.s. type strain; RefSeq reference genome;
strain ATCC lates from (clade IV) Genome member of panC clade IV, the same clade as the three
14579 chro- two clades 91.8-92.3 non-emetic outbreak-associated isolates sequenced in
mosome (clades III (clade III) this study
and IV)
B. cereus III All 33 iso- 92.0-92.2 NC 011658.1 Complete Human clinical isolate associated with an emetic out-
strain AH187 lates from (clade IV) Genome break in 1972 (cooked rice, United Kingdom); identical
chromosome two clades 99.8-99.9 virulotype, MLST sequence type, rpoB allelic type, and
(clades III (clade III) panC clade as 30 emetic outbreak isolates sequenced in
and IV); 30 this study
emetic clade
III isolates
B. cytotox- VII All 33 iso- 82.6-82.7 NC 009674.1 Complete Type strain of B. cytotoxicus, the most distant mem-
icus strain lates from (clade IV) Genome ber of the B. cereus group as currently defined; shares
NVH 391-98 two clades 82.5-82.9 a common ancestor with all isolates sequenced in this
chromosome (clades III (clade III) study
and IV)
FOOD 10 19 III 30 emetic 92.0-92.2 SRR6825038 Contigs Emetic isolate from the outbreak reported here; assem-
16 RSNT1 2H clade III (clade IV) bly had high per-base coverage, as well as the fewest
R9-6393 isolates 100d-100 number of contigs of all genome assemblies from iso-
(clade III) lates in this outbreak
aGroup determined via panC clade assignment function in BTyper version 2.2.0
bData set(s) in this study for which a given genome was used as a reference genome for reference-based SNP calling
cMinimum and maximum average nucleotide identity (ANI) values of reference strain relative to clade IV and clade III genomes se-
quenced in this outbreak (n = 3 and 30, respectively) calculated using FastANI
dMinimum ANI value was less than 100 prior to rounding
For the Samtools and Freebayes pipelines (Table 5.1), trimmed Illumina
paired-end reads from the queried isolates were mapped to the appropriate
reference genome using BWA mem version 0.7.13 (Heng Li 2013) and either
Samtools/Bcftools version 1.6 (H. Li, Handsaker, et al. 2009) or Freebayes
version 1.1.0 (Garrison and Marth 2012), respectively, were used to call vari-
ants. Vcftools version 0.1.14 (Danecek et al. 2011) was used to remove in-
dels and SNPs with a SNP quality score < 20, as well as to construct con-
sensus sequences. For both variant calling pipelines, Gubbins version 2.2.0
(Croucher et al. 2015) was used to remove recombination events from the con-
sensus sequences, and the Neighbor Similarity Score (NSS) (Jakobsen and East-
eal 1996), Maximum Chi-Squared (Smith 1992), and Pairwise Homoplasy In-
dex (PHI) (Bruen, Philippe, and Bryant 2006) tests implemented in PhiPack
version 1.0 (Bruen, Philippe, and Bryant 2006) were used to assess whether re-
combination and homoplasies were present in sequence alignments before and
150
after recombination was removed, using 1,000 permutations each and a win-
dow size of 100 (Supplementary Table S3). Both of these pipelines are pub-
licly available and can be reproduced in their entirety (SNPBac version 1.0.0;
https://github.com/lmc297/SNPBac).
For the CFSAN (Davis et al. 2015) and LYVE-SET (Katz et al. 2017) pipelines
(versions 1.0.1 and 1.1.4 g, respectively; Table 5.1), trimmed Illumina paired-end
reads were used as input, and all default pipeline steps were run as outlined in
the manuals. For the Parsnp pipeline (Treangen et al. 2014) (Table 5.1), as-
sembled genomes of the outbreak isolates were used as input, and Parsnp’s
implementation of PhiPack (Bruen, Philippe, and Bryant 2006) was used to fil-
ter out recombination events. For kSNP3 (Table 5.1), assembled genomes of the
outbreak isolates were used as input, and Kchooser was used to determine the
optimum k-mer size for the full 33-isolate data set and the 30 emetic group III
isolate set (k = 21 and 23, respectively).
For all variant calling and filtering pipelines, RAxML version 8.2.10
was used to construct ML phylogenies using the resulting SNPs under the
GTRGAMMA model with a Lewis ascertainment bias correction and 1,000 boot-
strap replicates. Phylogenetic trees were annotated using FigTree version 1.4.3
(http://tree.bio.ed.ac.uk/software/figtree/).
151
5.3.13 Variant Calling and Statistical Comparison of Emetic
Outbreak Isolates to Publicly Available Genomes
To compare emetic group III isolates from this outbreak to other emetic group
III isolates, BTyper version 2.2.1 was used to query all 2,156 B. cereus group
genome assemblies available in NCBI’s RefSeq database (downloaded March
2018) (Pruitt, Tatusova, and Maglott 2007) and identify all genome assemblies
that (i) belonged to group III based on panC sequence, (ii) belonged to ST 26
based on in silico MLST, and (iii) were found to possess the ces operon in its
entirety (cesABCD) at the default coverage and identity thresholds. This search
produced 25 genome assemblies in addition to the 30 emetic group III genomes
sequenced here. Only three of the 25 RefSeq genome assemblies had Sequence
Read Archive (SRA) data linked to their BioSample accession numbers, making
short read data readily available only for these three isolates. Consequently,
only Parsnp version 1.2 and kSNP version 3.1 were used to identify SNPs in all
55 group III emetic genomes (25 from NCBI RefSeq and 30 sequenced here), as
these approaches can be used with assembled genomes and do not require short
reads as input. For Parsnp, the chromosome of B. cereus AH187 was used as a
reference genome. For kSNP3, Kchooser was used to select the optimal k-mer
size (k = 21), and the chromosome of B. cereus AH187 was included for k-mer
based SNP calling.
RAxML version 8.2.10 was used to construct ML phylogenies using the re-
sulting core SNPs for each of the Parsnp and kSNP3 pipelines under the GTR-
CAT model with a Lewis ascertainment bias correction and 1,000 bootstrap
replicates. Pairwise core SNP differences between all 55 isolates were obtained
using the dist.gene function in R’s ape package. The permutest and betadisper
152
functions in R’s vegan package (Oksanen et al. 2017) were used to conduct an
ANOVA-like permutation test to test if publicly available genomes were more
variable than isolates from this outbreak based on pairwise core SNP differences
and 5 independent trials using 100,000 permutations each. Analysis of similar-
ity (ANOSIM) using the anosim function in the vegan package in R was used
to determine if the average of the ranks of within-group distances was greater
than or equal to the average of the ranks of between-group distances (Clarke
1993; Anderson and D. C. I. Walsh 2013), where groups were defined as (i) the 30
emetic isolates from this outbreak, and (ii) the 25 external emetic ST 26 isolates
(downloaded from RefSeq). ANOSIM tests were conducted using pairwise core
SNP differences and five independent runs of 10,000 permutations each. For
both the ANOVA-like permutation tests and the ANOSIM tests, Bonferroni cor-
rections were used to correct for multiple comparisons at the α = 0.05 level.
5.3.14 Statistical Comparison of Phylogenetic Trees
The Kendall-Colijn (Kendall and Colijn 2015; Kendall and Colijn 2016) test de-
scribed by Katz et al. (Katz et al. 2017) was used to compare the topologies
of trees, using the treespace (Jombart et al. 2017), ips (Heibl 2008 onwards),
phangorn (Schliep et al. 2017), docopt (de Jonge 2018), and stringr (Wick-
ham 2017) packages in R version 3.4.3. The phylogenies that underwent pair-
wise testing were constructed using (i) either core or total SNPs identified in
30 emetic group III genomes via all six SNP calling pipelines (Table 5.1), using
either an unmasked or dustmasked closed reference genome (B. cereus AH187;
Table 5.2), and (ii) SNPs identified in 55 emetic ST 26 genomes (25 publicly avail-
able genomes and the 30 emetic isolates sequenced here) using the kSNP3 (core
153
and total SNPs) and Parsnp (core SNPs, as Parsnp queries the core genome by
definition) pipelines. For all pairwise tree comparisons, a lambda value of 0 (to
give weight to tree topology rather than branch lengths) (Katz et al. 2017) was
used along with 100,000 random trees as a background distribution, and a Bon-
ferroni correction was used to correct for multiple comparisons. Pairs of trees
were considered to be more topologically similar than would be expected by
chance if a significant P-value (P < 0.05) resulted after correcting for multiple
testing (Katz et al. 2017).
5.3.15 Calculation of Average Nucleotide Identity Values
FastANI version 1.0 (Jain et al. 2018) was used to calculate average nucleotide
identity (ANI) values between assembled genomes of isolates sequenced in this
study and selected reference genomes (Table 5.2), as well as the genomes of 18
currently published B. cereus group species (Supplementary Table S2).
5.3.16 Supplementary Material and Availability of Data
Trimmed Illumina reads for all 33 isolates sequenced in this study have
been made publicly available (NCBI BioProject Accession PRJNA437714), with
NCBI BioSample and SRA accession numbers for all isolates listed in Sup-
plementary Table S1. All figures have been deposited in FigShare (DOI
https://doi.org/10.6084/m9.figshare.7001525.v1), and records of all isolates are
available in Food Microbe Tracker (Vangay et al. 2013).
154
5.4 Results
5.4.1 Both Emetic and Diarrheal Symptoms Were Reported
Among Cases Associated With the B. cereus Foodborne
Outbreak
Between September 30th and October 6th, 2016, local health departments in up-
state New York’s Niagara and Erie counties reported a total of 179 estimated
foodborne illness cases among customers of a Mexican fast-food restaurant
chain in eight towns/cities. Among these cases, laboratory results were avail-
able for ten cases. For seven of these cases, B. cereus group species were isolated
from patient stool samples. While no deaths, hospitalizations, or emergency
room visits were reported from 169 cases from which information was obtained,
4 resulted in a visit to a health care provider (not including emergency room vis-
its). More than 2/3 of 179 cases were female (69%), and 61% of cases fell within
the 20-74 age group. In 156 of 179 total cases (87%), refried beans had been
consumed.
Of 169 cases from which information was obtained, 88% reported vomiting,
and more than half reported nausea and abdominal cramps (95 and 65%, respec-
tively). However, in addition to vomiting, 38% of cases also reported diarrhea.
Additional symptoms reported included (i) weakness (43%), (ii) chills (40%),
(iii) dehydration (35%), (iv) headache (28%), (v) myalgia (muscle ache/pain;
16%), (vi) fever (16%), (vii) sweating (16%), and (viii) sore throat (3%). The in-
cubation period observed for all cases ranged from 0.25 to 24 h, with a median
of 2 h. The duration of illness ranged from 0.25 to 144 h, with a median estimate
155
of 6 h.
A traceback was conducted, with the source of the outbreak determined to be
a processing plant in Pennsylvania. The distributor in Pennsylvania packaged
the refried beans specifically for the chain establishment where the outbreak oc-
curred. The establishments where the outbreak occurred received 5 lb trays of
pre-cooked, sealed, and frozen refried beans from the production/packaging
facility. The refried beans would undergo cooking and a hot hold prior to con-
sumption at the establishments where the outbreak occurred. It was determined
that the refried beans were contaminated prior to preparation at the chain estab-
lishment.
Stool samples from suspect cases were cultured on MYP agar and B. cereus-
like colonies were isolated from seven stool samples. Additionally, B. cereus-like
colonies were isolated from nine food samples that were collected from five
restaurants. In total, seven isolates from stool samples and 26 isolates from
foods were confirmed to belong to the B. cereus group using standard microbi-
ological methods. Isolates that were large Gram-positive rods, beta-hemolytic,
and motile were presumptively identified as B. cereus-like. Additionally, spore
staining was performed to test for the presence of parasporal crystals associated
with B. thuringiensis, for which all isolates were negative. All 33 B. cereus group
isolates underwent preliminary molecular characterization by Sanger sequenc-
ing of rpoB, which revealed two distinct allelic types belonging to phylogenetic
groups III (rpoB allelic type AT 125) and IV (AT 92).
156
5.4.2 WGS Confirms Presence of Multiple B. cereus Group
Species Represented Among Outbreak Strains
rpoB allelic types (ATs) assigned in silico were identical to those obtained using
Sanger sequencing for all 33 isolates (Table 5.3). panC group assignment con-
firmed the presence of B. cereus s.l. isolates from multiple phylogenetic groups
(Table 5.3), with panC group III (n = 30) and panC group IV (n = 3) represented
among the 33 isolates. In silico MLST further resolved the group IV isolates into
two sequence types (STs): the two strains isolated from refried beans served at
two different restaurants had identical STs, while the single human isolate be-
longing to group IV had a unique ST (Table 5.3). All 30 panC group III isolates
belonged to ST 26, including the remaining six human clinical isolates (Table
5.3).
The presence of isolates from multiple B. cereus s.l. phylogenetic groups,
as suggested by the rpoB, panC, and MLST loci among isolates sequenced in
conjunction with this outbreak, was confirmed using core SNPs detected in all
outbreak isolates, as well as the genomes of 18 currently recognized B. cereus
group species (Figure 5.1). The three isolates assigned to panC group IV using
a 7-group scheme (Guinebretiere, Thompson, et al. 2008) were most closely
related to the B. cereus s.s. type strain (Figure 5.1). All three group IV B. cereus
isolates possessed diarrheal toxin genes hblABCD and cytK-2 at high identity
and coverage (Figure 5.1), which code for enterotoxins hemolysin BL (Hbl) and
cytotoxin K variant 2 (CytK-2), respectively. The 30 isolates assigned to panC
group III, however, were most closely related to the type strain of B. paranthracis
(Figure 5.1). Unlike the B. paranthracis type strain, all of the group III isolates
investigated here were motile and possessed the cesABCD operon (Figure 5.1),
157
Table 5.3: List of outbreak isolates and corresponding metadata, single- and
multi-locus sequence types, and species.
Isolate name Source Source (Spe- Isolation Production panC MLST rpoB Closest Type Strain (ANI)e
(Gen- cific) date Date/Batcha Groupb STc ATd
eral)
FOOD 10 18 16 LFTOV NA R9-6400 Food Leftovers 18-Oct Unknown III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 18 16 LFTOV NA R9-6401 Food Leftovers 18-Oct Unknown III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 18 16 LFTOV NA R9-6402 Food Leftovers 18-Oct Unknown III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 1B R9-6388 Food Restaurant 1 19-Oct 1/B III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 1B R9-6389 Food Restaurant 1 19-Oct 1/B III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 1B R9-6390 Food Restaurant 1 19-Oct 1/B III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 1B R9-6391 Food Restaurant 1 19-Oct 1/B III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2A R9-6386 Food Restaurant 1 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2A R9-6387 Food Restaurant 1 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2H R9-6392 Food Restaurant 1 19-Oct 2/H III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2H R9-6393 Food Restaurant 1 19-Oct 2/H III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2H R9-6394 Food Restaurant 1 19-Oct 2/H III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2H R9-6395 Food Restaurant 1 19-Oct 2/H III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT1 2H R9-6396 Food Restaurant 1 19-Oct 2/H III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT2 2A R9-6397 Food Restaurant 2 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT2 2A R9-6398 Food Restaurant 2 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT2 2A R9-6399 Food Restaurant 2 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.6)
FOOD 10 19 16 RSNT3 1E R9-6407 Food Restaurant 3 19-Oct 1/E III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT3 2A R9-6403 Food Restaurant 3 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT3 2A R9-6404 Food Restaurant 3 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT3 2A R9-6405 Food Restaurant 3 19-Oct 2/A III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT4 2B R9-6408 Food Restaurant 4 19-Oct 2/B III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT4 2B R9-6409 Food Restaurant 4 19-Oct 2/B III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT5 1C R9-6411 Food Restaurant 5 19-Oct 1/C III 26 125 B. paranthracis MN5 (97.5)
HUMN 10 18 16 FECAL NA R9-6384 Human Feces 18-Oct NA III 26 125 B. paranthracis MN5 (97.6)
HUMN 10 18 16 FECAL NA R9-6385 Human Feces 18-Oct NA III 26 125 B. paranthracis MN5 (97.5)
HUMN 10 18 16 FECAL NA R9-6412 Human Feces 18-Oct NA III 26 125 B. paranthracis MN5 (97.5)
HUMN 10 19 16 FECAL NA R9-6381 Human Feces 19-Oct NA III 26 125 B. paranthracis MN5 (97.5)
HUMN 10 19 16 FECAL NA R9-6382 Human Feces 19-Oct NA III 26 125 B. paranthracis MN5 (97.5)
HUMN 10 19 16 FECAL NA R9-6383 Human Feces 19-Oct NA III 26 125 B. paranthracis MN5 (97.5)
FOOD 10 19 16 RSNT3 1E R9-6406 Food Restaurant 3 19-Oct 1/E IV 24 92 B. cereus ATCC 14579 (98.9)
FOOD 10 19 16 RSNT5 1C R9-6410 Food Restaurant 5 19-Oct 1/C IV 24 92 B. cereus ATCC 14579 (98.9)
HUMN 10 26 16 FECAL NA R9-6413 Human Feces 26-Oct NA IV 142 92 B. cereus ATCC 14579 (98.8)
aProduction date is designated by either 1 or 2; batch is one of A through H
bpanC clade assigned in silico using BTyper 2.2.0
cMulti-locus sequence typing (MLST) sequence type (ST) assigned in silico using BTyper 2.2.0
drpoB allelic type (AT) determined using Sanger sequencing and verified in silico using BTyper 2.2.0
eANI, average nucleotide identity calculated using FastANI
which codes for emetic toxin-producing cereulide synthetase. In the case of
isolate HUMN 10 18 16 FECAL NA R9-6384, cesD was split onto two contigs.
Based on average nucleotide identity (ANI) values, the three diarrheal group
IV isolates were classified as B. cereus s.s. (ANI > 95; Table 5.3). The 30 emetic
group III isolates from this outbreak, however, most closely resembled the type
strain of B. paranthracis (ANI > 95; Table 5.3), indicating that the emetic group III
and diarrheal group IV isolates from this outbreak are different B. cereus group
species.
158
Figure 5.1: Maximum likelihood phylogeny of core SNPs identified in 33 iso-
lates sequenced in conjunction with a B. cereus outbreak, as well as genomes
of the 18 currently recognized B. cereus group species (shown in gray). Core
SNPs were identified in all genomes using kSNP3. Heatmap corresponds to
presence/absence of B. cereus group virulence genes detected in each sequence
using BTyper. Tip labels in maroon and teal correspond to the seven human
clinical isolates and 26 isolates from food sequenced in conjunction with this
outbreak, respectively. Phylogeny is rooted at the midpoint, and branch labels
correspond to bootstrap support percentages out of 500 replicates. Due to the
short lengths and low bootstrap support (all values < 10) of branches within
the outbreak clade, bootstrap support percentages are not shown on branches
within the outbreak clade.
5.4.3 Emetic and Diarrheal B. cereus Isolates Associated With
the Foodborne Outbreak do Not Differ in Cytotoxicity
All three diarrheal strains isolated in conjunction with the outbreak (FSL R9-
6406, FSL R9-6410, and FSL R9-6413) were found to produce Hbl, as well as
non-hemolytic enterotoxin (Nhe). Characterization of six representatives of the
emetic isolates tested (i.e., FSL R9-6381, FSL R9-6382, FSL R9-6384, FSL R9-6389,
159
FSL R9-6395, and FSL R9-6399) revealed that they produced Nhe, but not Hbl.
The supernatant of diarrheal B. cereus s.s. ATCC 14579 showed a stronger in-
hibitory effect on the viability of HeLa cells compared to supernatants of the
33 outbreak-associated isolates (Games-Howell P < 0.05; Figure 5.2). Further-
more, the viability of HeLa cells treated with 0.05% Triton X-100, the positive
control, was significantly lower compared to viability of HeLa cells treated with
bacterial supernatants (Games-Howell P < 0.05; Figure 5.2). Among all pairs
of emetic isolates, only the viabilities of HeLa cells exposed to the supernatants
of isolates FSL R9-6409 and FSL R9-6387 were found to differ (Games-Howell
P < 0.05; Figure 5.2). The differences in HeLa cell viability after treatment with
supernatants of these two emetic outbreak-associated strains are likely due to
biological variability among replicates, as outbreak-associated emetic isolates
were shown to be clonal (Figure 5.1). Taken together, the emetic group (repre-
sented by 30 emetic outbreak-associated isolates) had a mean cell viability of
97.5± 5.1%, while the diarrheal group (represented by three diarrheal outbreak-
associated isolates) gave a mean cell viability of 101.4±7.9%, as compared to the
HeLa cells treated with BHI (i.e., negative control).
160
Figure 5.2: Percentage viability of HeLa cells when treated with supernatants
of each isolate as determined by the WST-1 assay. Viability was calculated as
ratio of corrected absorbance of solution when HeLa cells were treated with
supernatants to the ratio of corrected absorbance of solution when HeLa cells
were treated with BHI (i.e., negative control), converted to percentages. The
columns represent the mean viabilities, while the error bars represent standard
deviations for 12 technical replicates. Any two bars that do not share a common
alphabetic character had significantly different percentage viability values (P <
0.05).
5.4.4 Core SNPs Identified Among B. cereus Group Outbreak
Isolates From Two Phylogenetic Groups Are Dependent
on Variant Calling Pipeline and Reference Genome Se-
lection
To simulate a scenario in which genomes from a B. cereus outbreak spanning
multiple phylogenetic groups were analyzed in aggregate, core SNPs were iden-
tified in all 33 outbreak isolates from groups III and IV (n = 30 and three iso-
lates, respectively) using (i) combinations of five reference-based variant call-
ing pipelines (Table 5.1) and three different reference genomes (Table 5.2) and
161
(ii) a reference-free SNP calling method (Table 5.1). When genomes from all
33 isolates were analyzed together, the number of core SNPs identified by each
pipeline and reference combination varied by up to several orders of magnitude
(Figure 5.3), often with little agreement between pipelines in terms of the core
SNPs they reported (Figure 5.4). Independent of reference genome, the CFSAN
pipeline was the most conservative, consistently identifying the fewest num-
ber of core SNPs when all 33 isolates were queried in aggregate (50, 27, and 0
core SNPs using reference genomes from groups III, IV, and VII, respectively)
(Figure 5.3). This can be contrasted with the Samtools, Freebayes, and Parsnp
pipelines, which produced upwards of 100,000 core SNPs when the selected
reference genome was a member of one of the groups being queried in the out-
break isolate set (group III and IV; Figure 5.3). In cases where a distant genome
was used as the reference (group VII B. cytotoxicus type strain chromosome), all
reference-based pipelines reported fewer core SNPs than kSNP3’s reference-free
k-mer based SNP calling approach (Figure 5.3).
5.4.5 Choice of Variant Calling Pipeline Has Greater Influ-
ence on Core SNP Identification Than Choice of Closely
Related Closed or Draft Reference Genome for Emetic
Group III B. cereus Group Isolates
The 30 emetic group III isolates were queried in the absence of their group
IV counterparts using combinations of five reference-based variant calling
pipelines (Table 5.1) and two reference genomes (the closed chromosome of B.
162
Figure 5.3: Number of core SNPs identified in 33 B. cereus group isolates from
two phylogenetic groups (30 and 3 isolates from groups III and IV, respectively),
sequenced in conjunction with a foodborne outbreak. Combinations of five
reference-based variant calling pipelines and three reference genomes, as well
as one reference-free SNP calling method (kSNP3), were tested.
cereus AH187, with and without dustmasking, and contigs of one of the iso-
lates identified in this outbreak, with and without dustmasking; Table 5.2) and
one reference-free SNP calling method (Table 5.1). In this scenario, the choice
of variant calling pipeline had a greater effect on the number of core SNPs
obtained than the choice of reference genome, as both reference genomes pos-
sessed the same virulence gene profile (virulotype), rpoB AT, panC group, MLST
sequence type, and were of the same species (B. paranthracis; ANI > 95) as the
30 emetic isolates (Figure 5.5A). Congruent with this, the number of pairwise
core SNP differences between emetic isolates sequenced in this outbreak varied
163
164
Figure 5.4: Comparison of core SNP positions reported by five reference-based variant-calling pipelines for 33 B. cereus
group strains isolated in association with a foodborne outbreak, with the chromosomes of (A) B. cereus AH187 (group
III), (B) B. cereus s.s. ATCC 14579 (group IV), and (C) B. cytotoxicus NVH 391-98 (group VII) used as reference genomes.
Ellipses represent each pipeline.
more with the selection of variant calling pipeline than with reference genome
(Figure 5.6A). When the unmasked closed chromosome of B. cereus AH187 was
used as a reference, pairwise core SNP differences among emetic isolates from
this outbreak ranged from 0 to 8 (mean of 2.9; CFSAN), 7 to 29 (mean of 16.1;
Freebayes), 0 to 8 (mean of 2.8; LYVE-SET), 0 to 64 (mean of 23.6; Parsnp), and
1 to 16 SNPs (mean of 8.2; Samtools) (Figure 5.5A). Using the reference-free
kSNP3 pipeline, this range was 1-46 SNPs (mean of 16.7; Figure 5.5A). The CF-
SAN and LYVE-SET pipelines produced nearly identical results in terms of the
number and identity of the core SNPs called (23 and 22 SNPs, respectively, 20 of
which were detected by both pipelines; Figure 5.7), as well as the topologies of
the phylogenies those SNPs produced: all CFSAN and LYVE-SET phylogenies
were more similar to each other than what would be expected by chance (Table
5.4 and Supplementary Table S4). Additionally, the two methods that relied on
assembled genomes rather than short reads for SNP calling (kSNP3 and Parsnp)
produced the greatest numbers of core SNPs (Figure 5.5A).
Within the emetic group III isolates associated with this outbreak, a to-
tal of 32 core SNPs were identified by two or more of the reference-based
variant calling pipelines when the unmasked B. cereus AH187 genome was
used as a reference, half of which were identified by all five pipelines (Fig-
ure 5.7). Out of these 32 SNPs, 23 were identified in protein coding genes,
14 of which produced non-synonymous amino acid changes (Supplementary
Table S5). Genes with non-synonymous changes were involved in molyb-
dopterin biosynthesis (WP 000544623.1), proteolysis (WP 000215096.1 and
WP 000857793.1), chitin binding (WP 000795732.1), iron-hydroxamate trans-
port (WP 000728195.1), DNA repair (WP 000947749.1 and WP 000867556.1),
DNA replication (WP 000867556.1 and WP 000435993.1), protein transport and
165
Figure 5.5: (A) Number of core SNPs and (B) total number of SNPs identified
in 30 emetic B. cereus group III strains isolated in association with a foodborne
outbreak. Combinations of (A) five and (B) four reference-based variant calling
pipelines and two reference genomes (either dustmasked or unmasked) were
tested, along with one reference-free SNP calling method (kSNP3). Because the
Parsnp pipeline reports core SNPs by definition, it was excluded from Figure
5.5B (total SNPs). For quantification of the total number of SNPs (Figure 5.5B),
all sites with more than one unique character were counted.
Table 5.4: Maximum likelihood phylogenies of 30 emetic group III outbreak
isolates considered to be more topologically similar than would be expected by
chance (P < 0.05).a
Reference Phylogenyb Query Phylogenyb Corrected P-Valuec
AH187 CFSAN NOdust all AH187 CFSAN NOdust core 0
AH187 CFSAN NOdust all AH187 LYVE-SET NOdust all 0
AH187 CFSAN NOdust all AH187 LYVE-SET NOdust core 0.0171
AH187 CFSAN NOdust all AH187 LYVE-SET YESdust all 0
AH187 CFSAN NOdust all AH187 LYVE-SET YESdust core 0.0171
AH187 CFSAN NOdust core AH187 LYVE-SET NOdust all 0
AH187 CFSAN NOdust core AH187 LYVE-SET NOdust core 0.0171
AH187 CFSAN NOdust core AH187 LYVE-SET YESdust all 0
AH187 CFSAN NOdust core AH187 LYVE-SET YESdust core 0.0171
AH187 Freebayes NOdust core AH187 Freebayes YESdust core 0.0342
AH187 LYVE-SET NOdust all AH187 LYVE-SET NOdust core 0.0171
AH187 LYVE-SET NOdust all AH187 LYVE-SET YESdust all 0
AH187 LYVE-SET NOdust all AH187 LYVE-SET YESdust core 0.0171
AH187 LYVE-SET NOdust core AH187 LYVE-SET YESdust core 0
AH187 LYVE-SET YESdust all AH187 LYVE-SET YESdust core 0.0171
AH187 Parsnp NOdust core AH187 Parsnp YESdust core 0.0171
aObtained from pairwise tests of tree topologies using a Z test based on the Kendall-Colijn metric; see Supplementary Table S4 for full
table of comparisons
bNames of reference and query phylogenies denote reference genome (”AH187” for reference-based pipelines, ”NOREF” for reference-
free kSNP pipeline), pipeline (”CFSAN”, ”Freebayes”, ”kSNP”, ”LYVE-SET”, ”Parsnp”, or ”Samtools”), reference genome masking
(”NOdust” for an unmasked reference genome, ”YESdust” for a dustmasked reference genome, or ”NAdust” for reference-free kSNP
pipeline, for which dustmasking is not applicable), and SNPs used to construct the phylogeny (”core” for core SNPs, or ”all” for core and
accessory SNPs), separated by an underscore (” ”)
cBonferroni-corrected P-values for all tests that were significant at the α = 0.05 level
166
Figure 5.6: Ranges of pairwise (A) core SNP differences and (B) total SNP differ-
ences between 30 emetic group III B. cereus group strains isolated in conjunction
with a foodborne outbreak. Combinations of (A) five and (B) four reference-
based variant calling pipelines and two reference genomes (either dustmasked
or unmasked), as well as one reference-free SNP calling method (kSNP3) were
tested. Lower and upper box hinges correspond to the first and third quartiles,
respectively. Lower and upper whiskers extend from the hinge to the smallest
and largest values no more distant than 1.5 times the interquartile range from
the hinge, respectively. Points represent pairwise distances that fall beyond the
ends of the whiskers. Because the Parsnp pipeline reports core SNPs by defini-
tion, it was excluded from Figure 5.6B (pairwise differences in total SNPs). For
quantification of pairwise differences in the total number of SNPs (Figure 5.6B),
all sites with more than one unique character were included.
insertion into the membrane (WP 000727745.1), and glyoxylase/bleomycin re-
sistance (WP 000800664.1).
In addition to detecting core SNPs in the genomes of the 30 emetic group III
isolates, total (core and accessory) SNPs were detected in the 30 emetic group III
genomes using combinations of four reference-based variant calling pipelines
(Parsnp, which only reports core SNPs, was excluded; Table 5.1) and two ref-
erence genomes (the closed chromosome of B. cereus AH187 and contigs of one
of the isolates identified in this outbreak, with and without dustmasking; Ta-
167
Figure 5.7: Comparison of core SNP positions reported by five variant-calling
pipelines for 30 emetic group III B. cereus group outbreak isolates. Ellipses rep-
resent each pipeline, all of which used the chromosome of emetic group III B.
cereus AH187 as a reference for variant calling.
ble 5.2) and one reference-free SNP calling method (Table 5.1). When total SNPs
were accounted for, rather than solely core SNPs, all pipeline/reference genome
combinations showed increases in the number of SNPs detected and the range
of pairwise SNP differences between genomes (Figures 5.5B, 5.6B). Whether
the addition of accessory SNPs translated into a significant difference in phy-
logenetic topology, however, depended on the variant calling pipeline used.
When the B. cereus AH187 closed chromosome was used as a reference, SNPs
detected using the LYVE-SET pipeline produced phylogenies considered to be
168
more topologically similar than would be expected by chance (Kendall-Colijn
test P < 0.05), regardless of whether core SNPs or total SNPs were used to con-
struct the phylogeny, and regardless of whether the B. cereus AH187 reference
genome was dustmasked or not (Table 5.4 and Supplementary Table S4). Addi-
tionally, all phylogenies produced using the LYVE-SET pipeline and the B. cereus
AH187 reference genome (i.e., each combination of core SNPs, total SNPs, dust-
masked reference, and unmasked reference) were topologically similar to those
produced using the CFSAN pipeline and the unmasked B. cereus AH187 refer-
ence genome, regardless of whether all SNPs were included or solely core SNPs
(Table 5.4 and Supplementary Table S4). Other topologically similar phylogeny
pairs included phylogenies constructed using (i) core SNPs identified with Free-
bayes, regardless of whether a dustmasked reference genome was used or not,
and (ii) core SNPs identified with Parsnp, regardless of whether a dustmasked
reference was used or not (Kendall-Colijn test P < 0.05; Table 5.4 and Supple-
mentary Table S4).
5.4.6 Phylogenies Constructed Using Core SNPs Identified in
55 Emetic ST 26 B. cereus Genomes by kSNP3 and Parsnp
Yield Similar Topologies
To compare the 30 emetic strains from this outbreak to other emetic group III
isolates, all emetic group III assembled genomes with ST 26 were downloaded
from NCBI. This produced a total of 55 emetic group III isolates with ST 26 (30
isolates from this outbreak and 25 from NCBI RefSeq). Among the 55 emetic ST
26 genomes, Parsnp identified almost twice as many core SNPs as kSNP3 (4,597
169
Figure 5.8: Maximum likelihood phylogenies of 30 emetic group III isolates (ST
26) sequenced in conjunction with a B. cereus outbreak, as well as all other emetic
group III ST 26 genomes available in NCBI (n = 25; shown in black). Trees were
constructed using core SNPs identified using (A) kSNP3 or (B) Parsnp. Tip la-
bels in maroon and teal correspond to the six human clinical isolates and 24
isolates from food sequenced in conjunction with this outbreak, respectively.
Branch labels correspond to bootstrap support percentages out of 1,000 repli-
cates. Due to the short lengths and low bootstrap support of branches within
the outbreak clade, bootstrap support percentages are not shown on branches
within the outbreak clade.
and 2,593 core SNPs, respectively). However, the topologies of phylogenies pro-
duced using the core SNPs identified by each pipeline were found to be more
similar than would be expected by chance (Kendall-Colijn test P < 0.05; Figure
5.8).
Based on pairwise core SNP differences, the publicly available genomes
showed greater variability than the outbreak isolates described here, regard-
less of whether kSNP3 or Parsnp was used for variant calling (ANOVA-like
permutation test P < 0.05; Supplementary Figure S1). Pairwise core SNP dif-
ferences of the 30 emetic group III isolates from this outbreak ranged from 0 to
25 SNPs (mean of 8.3) and 0 to 44 SNPs (mean of 11.9) when the kSNP3 and
Parsnp pipelines were used, respectively (Supplementary Figure S1). For exter-
nal ST 26 isolates not associated with this outbreak, pairwise core SNP differ-
170
ences ranged from 0 to 1,474 SNPs (mean of 425.7) and 0 to 3,111 SNPs (mean of
828.3) when kSNP3 and Parsnp were used, respectively (Supplementary Figure
S1). Between these two groups (the 30 emetic isolates from this outbreak and the
25 external emetic ST 26 isolates), pairwise core SNP differences ranged from 73
to 1,258 SNPs (mean of 301.7; kSNP3) and 74 to 2,709 SNPs (mean of 528.0;
Parsnp) (Supplementary Figure S1). Reflecting this, the average of the ranks of
pairwise SNP distances within emetic isolates from this outbreak was less than
the average of the ranks of pairwise SNP distances between the emetic isolates
from this outbreak and the external ST 26 isolates (ANOSIM P < 0.05). This is
likely a result of the differences in variance between the outbreak and external
ST 26 isolates, as supported by the results of the ANOVA-like permutation test
(Anderson and D. C. I. Walsh 2013).
5.5 Discussion
While B. cereus causes a considerable number of foodborne illness cases annu-
ally, outbreaks are rarely investigated with the methodological rigor (e.g., use
of WGS) that is increasingly used for surveillance and outbreak investigations
targeting other foodborne pathogens. A specific challenge in the U.S. is that,
unlike for some other diseases, disease cases caused by B. cereus are typically
not reportable, even though foodborne illnesses, regardless of etiology, are re-
portable in some states, including NY. This, combined with the typically mild
course of B. cereus infection, means that human B. cereus isolates are rarely avail-
able for WGS. Furthermore, even if clinical B. cereus group isolates are available,
WGS may not be used for isolate characterization in cases where infections are
mild. Due to the availability of B. cereus isolates for seven human cases, the out-
171
break reported here presented a unique opportunity to pilot the use of WGS for
investigation of B. cereus outbreaks. The data and approaches presented here
will not only facilitate future investigation of other B. cereus outbreaks but will
also help with application of WGS for investigation of other foodborne disease
outbreaks where limited reference WGS data and information on genomic di-
versity are available.
5.5.1 Addressing the Microbiological and Epidemiological
Challenges Associated With Determining the Causative
Agent of an Emetic Foodborne Outbreak
The agar MYP used for isolation of strains from food and human clinical sam-
ples in the outbreak reported here is one of the two selective differential agars
recommended in the FDA BAM protocol for the isolation of B. cereus group
strains (Tallent, Rhodehamel, et al. 1998). The second recommended agar,
Bacara, has been shown to be more selective and more effective in suppressing
the growth of other Gram-positive microorganisms that may be present in tested
samples (e.g., other Bacillus species, Listeria, Staphylococcus) (Tallent, Kotewicz,
et al. 2012; Kabir et al. 2017). Since Bacara medium has a proprietary formula
and cannot be purchased in a dehydrated powder form (Tallent, Rhodehamel,
et al. 1998), it is less likely to be readily available for use in labs that do not
routinely test for B. cereus group species. Use of both types of media may in-
crease the success of B. cereus group isolation from food and clinical samples,
especially isolation of emetic strains (Ehling-Schulz, Svensson, et al. 2005; Ce-
uppens, Boon, and Uyttendaele 2013). Furthermore, the isolation of B. cereus
172
group strains associated with this outbreak was carried out at 37◦C, which is
higher than the temperature of 30◦C that is recommended by the FDA BAM
(Tallent, Rhodehamel, et al. 1998). Nevertheless, while incubation at this tem-
perature may inhibit the growth of psychrotolerant species of the B. cereus group
(e.g., B. weihenstephanensis), it is not expected to interfere with the isolation of B.
cereus group strains that are able to grow at human body temperature and cause
toxicoinfection. It is also not expected to compromise isolation of emetic isolates
with the capacity to cause intoxication, as emetic strains have been previously
found primarily in phylogenetic group III, which does not contain psychrotoler-
ant strains (Carroll et al. 2017). Overall, use of both types of isolation media and
a moderate incubation temperature of 30◦C may minimize the isolation bias.
While the isolation of B. cereus group strains from food and clinical samples
is essential for linking them to a potential foodborne outbreak, further informa-
tion is needed to definitively prove that an outbreak was caused by B. cereus.
Emetic disease caused by members of the B. cereus group can be attributed to
the production of the highly heat- and pH-resistant toxin cereulide in food prior
to ingestion (Ehling-Schulz, Fricker, and Scherer 2004; Ehling-Schulz, Frenzel,
and Gohar 2015; Stenfors Arnesen, Fagerlund, and Granum 2008). Because
cereulide is produced within the food matrix itself, prior to consumption, the
mere presence of emetic B. cereus group strains in food or human clinical sam-
ples cannot definitively prove that an outbreak was caused by a member of
the B. cereus group; rather, the presence of cereulide itself is essential for link-
ing food and clinical samples to an outbreak with high confidence (Anders-
son et al. 2004; Stenfors Arnesen, Fagerlund, and Granum 2008). For this out-
break, the presence of cereulide in food and human clinical samples linked to
the outbreak was not assessed, as testing for cereulide is not currently included
173
in the BAM protocol as a routine method for the detection and enumeration of
B. cereus in food. Ergo, there is no definitive proof that the outbreak was caused
by cereulide-producing emetic group III B. cereus and not a similar foodborne
pathogen (e.g., enterotoxins produced by Staphylococcus aureus, which manifest
in similar symptoms to those associated with cereulide) (Messelhausser et al.
2014). However, due to the presence of highly clonal, ces-positive group III ST
26 B. cereus group isolates among food and clinical samples linked to the out-
break, as well as epidemiological data that support this, the emetic strain is the
most probable causative agent. While it is not currently included in the BAM
protocol for B. cereus isolation (Tallent, Rhodehamel, et al. 1998), testing for the
presence of cereulide in food and clinical samples linked to potential outbreaks
caused by emetic B. cereus can aid in providing a definitive link between illness
and causative agent.
5.5.2 Considerations for Addressing the Unique Challenges
Associated With Characterization of Foodborne Out-
breaks Linked to the B. cereus Group Using WGS
In B. cereus outbreaks, interpretation of WGS data can be challenging, especially
in cases where strains of multiple closely related species or subtypes appear to
be associated with an outbreak. B. cereus outbreaks, particularly emetic out-
breaks caused by cereulide-producing B. cereus group isolates, are often associ-
ated with improper handling of food (e.g., temperature abuse) (Ehling-Schulz,
Fricker, and Scherer 2004; Stenfors Arnesen, Fagerlund, and Granum 2008).
This, and their ubiquitous presence in the environment, make it important to
174
consider the possibility of a multi-strain or multi-species outbreak in addition
to a single-source outbreak caused by a single strain. In the outbreak charac-
terized here, B. cereus group strains from two phylogenetic groups, III and IV,
were isolated from both human clinical stool samples, as well as refried beans
linked to the outbreak. The separation of outbreak-related isolates into three di-
arrheal group IV isolates (representing two distinct STs) and 30 emetic isolates
may be explained by one of the following scenarios: (i) the outbreak was caused
by refried beans contaminated with multiple B. cereus group species (isolates
from groups III and IV), both of which caused illness in humans, (ii) in addi-
tion to housing emetic outbreak strains that belonged to group III, samples of
refried beans and patient stool samples harbored group IV B. cereus s.l. isolates
that were not part of the outbreak but were incidentally isolated from stool and
food samples, or (iii) a subset of patient stool samples and food samples did
not harbor B. cereus s.l. group III isolates belonging to the outbreak, but did
harbor group IV strains that were isolated and sequenced. In order to deter-
mine which of these scenarios explains the presence of multiple B. cereus group
species among isolates sequenced in conjunction with a foodborne outbreak,
additional epidemiological and microbiological data are needed.
Valuable metrics for inclusion/exclusion of B. cereus group cases in a food-
borne outbreak include patient exposure, patient symptoms (e.g., vomiting, di-
arrhea, onset and duration of illness), levels of B. cereus present in implicated
food and patient samples (CFU/g or CFU/ml), cytotoxicity of isolates, and the
approach used to select bacterial colonies to undergo WGS (Glasset et al. 2016).
However, some of these data may be more valuable than others. In their char-
acterization of 564 B. cereus group strains associated with 140 ”strong-evidence”
foodborne outbreaks in France between 2007 and 2014, Glasset et al. (Glasset et
175
al. 2016) found that patient symptoms could not be associated with the presence
of emetic and diarrheal strains. More than half (57%) of the B. cereus outbreaks
queried in their study included patients exhibiting both emetic and diarrheal
symptoms. Similar results were observed here, as emetic and diarrheal symp-
toms were reported in 88 and 38% of cases, respectively, with both vomiting
and diarrhea reported by multiple patients. All emetic isolates associated with
this outbreak carried nhe genes and also produced Nhe enterotoxin, as deter-
mined using the immunoassay. While it has been proposed that a combination
of emetic and diarrheal symptoms may be due to the fact that emetic group
III isolates have been shown to produce diarrheal enterotoxin Nhe at high lev-
els (Glasset et al. 2016), incongruences between isolate virulotype and patient
symptoms may still exist. Importantly, this indicates the need for further inves-
tigation of factors affecting the expression of B. cereus group virulence genes,
as well as their potential synergistic activities (Doll, Ehling-Schulz, and Vogel-
mann 2013).
Another metric that can be used for determining whether B. cereus group iso-
lates are part of an outbreak or not is the level of B. cereus present in the impli-
cated food. Like patient symptoms, B. cereus counts from implicated foods may
aid in an outbreak investigation, but likely cannot definitively prove whether
an isolate is part of an outbreak or not. For example, outbreaks caused by im-
plicated foods with B. cereus counts of < 103 CFU/g and as low as 400 CFU/g
for diarrheal and emetic diseases, respectively, have been described (Glasset
et al. 2016), despite levels of at least 105 CFU/g often being detected in impli-
cated foods (Stenfors Arnesen, Fagerlund, and Granum 2008). The levels of B.
cereus present in refried beans in the outbreak described here were not deter-
mined. However, like patient symptoms, B. cereus count data may be a useful
176
supplemental metric for investigating B. cereus group outbreaks in the future.
In addition to patient symptoms and pathogen load in the food, incubation
period can be used to determine whether an isolate is part of an outbreak or not,
as it is significantly shorter for emetic strains than diarrheal strains (Ehling-
Schulz, Fricker, and Scherer 2004; Stenfors Arnesen, Fagerlund, and Granum
2008; Glasset et al. 2016). In the outbreak described here, the patient from which
a non-emetic group IV B. cereus group strain was isolated reported an incubation
time of 1 h, the lowest incubation time of all seven confirmed human clinical
cases. However, this is still within the observed range of incubation times for
emetic B. cereus disease (0.5-6 h) (Stenfors Arnesen, Fagerlund, and Granum
2008). Although no emetic group III B. cereus s.l. strain was isolated from the
clinical sample, it is possible that the patient could have been intoxicated with
cereulide produced in the food by the emetic B. cereus strain that caused the
outbreak. However, it is also possible that a pathogen which causes similar
symptoms to foodborne illness caused by emetic B. cereus was responsible for
the patient’s illness (e.g., Staphylococcus aureus).
Lastly, cytotoxicity data may also be leveraged to include/exclude outbreak-
associated B. cereus group isolates. In the outbreak described here, the patient
from which a non-emetic group IV B. cereus group strain was isolated reported
vomiting and nausea and no diarrheal symptoms, despite the clinical isolate’s
possession of multiple diarrheal toxin genes and no emetic toxin genes. This
could suggest that the patient was intoxicated with the cereulide, but the isolate
itself did not survive the passage through the patient’s gastrointestinal tract,
or that it survived in a low concentration that resulted in failure of isolation on
MYP. It is also possible that our understanding of the specific virulence genes re-
177
sponsible for different B. cereus-associated disease symptoms is still incomplete
and that the diarrheal isolate obtained from the clinical sample was in fact re-
sponsible for symptoms of vomiting and nausea. To further investigate this, we
carried out immunoassay-based detection of Hbl and Nhe enterotoxins, as well
as a WST-1 proliferation assay with HeLa cells exposed to bacterial supernatants
presumably containing toxins. The results of Hbl and Nhe immunodetection
and cytotoxicity revealed that diarrheal isolates only had mild detrimental ef-
fects on HeLa cell viability, despite the fact that they produced both hemolysin
BL and non-hemolytic enterotoxins. This can be contrasted with the B. cereus s.s.
type strain, which substantially reduced the viability of the HeLa cells.
For the outbreak described here, results obtained using a combination of
microbiological, epidemiological, and bioinformatic methods indicate that hy-
pothesis (i), in which the diarrheal strains were part of a multi-species outbreak,
can likely be excluded. Evidence supporting the conclusion that the human
clinical diarrheal isolate was not part of the outbreak described here include:
(i) the emetic symptoms reported by the patient were incongruent with the vir-
ulotype of the isolate, (ii) the incubation time was typical for intoxication, (iii)
the human clinical diarrheal isolate had a different MLST ST compared to all
other isolates sequenced in this outbreak, and (iv) the human diarrheal isolate
did not exhibit substantial cytotoxicity against HeLa cells (Figure 5.2). This may
be due to the fact that this case was not part of the outbreak and was due to
an infection or intoxication caused by another pathogen that leads to disease
symptoms similar to B. cereus (e.g., Staphylococcus aureus), or that a group IV B.
cereus strain was isolated and sequenced in lieu of the group III emetic outbreak
isolate. There is limited evidence as to whether humans can be asymptomatic
carriers of group IV B. cereus (Ghosh 1978; Turnbull and Kramer 1985), making
178
it likely that isolation and sequencing of a group IV B. cereus strain could be due
to the use of MYP agar as the sole selective agar, which has been shown to hin-
der detection of emetic B. cereus group isolates (Ehling-Schulz, Svensson, et al.
2005; Ceuppens, Boon, and Uyttendaele 2013). In future outbreaks, the use of
additional selective media (e.g., Bacara agar), enrichment media, and isolation
temperatures may aid in isolation of the causative B. cereus group strain.
While we have shown here that WGS data can be a valuable tool for char-
acterizing B. cereus group isolates from a foodborne outbreak, our results also
showcase the importance of supplementing WGS data with epidemiological
and microbiological metadata to draw meaningful conclusions from B. cereus
group genomic data. Furthermore, the availability of WGS and cytotoxicity data
from a larger set of B. cereus isolates from symptomatic patients may also pro-
vide an opportunity to use comparative genomics approaches to further explore
virulence genes that are linked to different disease outcomes in the future.
5.5.3 Recommendations for Analyzing Illumina WGS Data
From B. cereus Group Isolates Potentially Linked to a
Foodborne Outbreak
WGS is being used increasingly to characterize isolates associated with food-
borne disease cases and outbreaks, and rightfully so, as it offers the ability
to characterize foodborne pathogens at unprecedented resolution, and it has
been able to improve outbreak and cluster detection for numerous foodborne
pathogens (Allard et al. 2017; Jasna Kovac et al. 2017; Moran-Gilad 2017;
179
Taboada et al. 2017), including Salmonella enterica (Taylor et al. 2015; Hoffmann
et al. 2016; Gymoese et al. 2017), Escherichia coli (Grad et al. 2012; Holmes et
al. 2015; Rusconi et al. 2016), and Listeria monocytogenes (Jackson et al. 2016;
Kwong et al. 2016; Chen, Luo, Pettengill, et al. 2017; Chen, Luo, Curry, et al.
2017; Moura et al. 2017). However, as demonstrated here and elsewhere, variant
calling pipelines and the various mapping/alignment, SNP calling, and SNP fil-
tering practices that they employ (e.g., removal of recombination and clustered
SNPs) can influence the identification of SNPs in WGS data and, thus, the topol-
ogy of a resulting phylogeny (Pightling, Petronella, and Pagotto 2014; Pightling,
Petronella, and Pagotto 2015; Croucher et al. 2015; Hwang et al. 2015; Katz et al.
2017; Sandmann et al. 2017). This can be particularly problematic for outbreak
and cluster detection in bacterial pathogen surveillance: pairwise SNP thresh-
olds are currently widely used to make initial decisions regarding the inclusion
or exclusion of isolates in a given outbreak (Taylor et al. 2015; Gymoese et al.
2017; Mair-Jenkins et al. 2017; McCloskey and Poon 2017; Walker et al. 2018).
In such scenarios, just a few SNPs can be the deciding factor in whether a bac-
terial pathogen is included or excluded as part of an outbreak or cluster (Katz
et al. 2017), rendering the choice of variant calling method as non-trivial. Fur-
thermore, choosing an appropriate variant calling pipeline can be particularly
challenging for pathogens where there are limited data and expertise with WGS,
as is currently the case with B. cereus.
As demonstrated here, the choice of variant calling pipeline can greatly in-
fluence the number of core SNPs identified in B. cereus group isolates associ-
ated with a foodborne outbreak. In the case of a multi-group outbreak, this
effect can be magnified. Naively calling variants in isolates that span multi-
ple B. cereus s.l. phylogenetic groups in aggregate can lead to orders of magni-
180
tudes of difference in the number of core SNPs identified by different variant
calling pipelines/reference genome combinations. In a multi-group outbreak
scenario, it is essential to note that one is effectively dealing with genomic data
from multiple species (i.e., ANI < 95), making it impossible to find a reference
genome that is closely related to all isolates in a putative outbreak. In the case of
some reference-based pipelines that are specifically tailored to identify variants
in bacterial isolates from outbreaks (e.g., CFSAN, which is not suited for bacteria
differing by more than a few hundred SNPs), calling variants in multiple groups
or within a distant reference genome is inappropriate (Davis et al. 2015). Thus,
querying outbreak isolates from multiple groups in aggregate using reference-
based variant calling methods should be avoided. Furthermore, the results pre-
sented here showcase the value of employing single- and/or multi-locus typ-
ing approaches prior to variant calling, either via Sanger sequencing or in silico
using tools, such as BTyper, as they can aid the design of downstream bioinfor-
matics analyses, including reference genome selection and data partitioning by
phylogenetic group.
When the three phylogenetic group IV isolates were excluded from analy-
ses, leaving only the emetic group III isolates, the selection of reference genome
caused fewer core SNP discrepancies than choice of variant calling pipeline, pro-
vided the reference genome was ”similar” to the genomes analyzed. While the
selection of a reference genome for reference-based variant calling is not trivial
(Pightling, Petronella, and Pagotto 2014; Olson et al. 2015), reference-based vari-
ant calling using a closed chromosome (B. cereus AH187) and a draft genome
(FOOD 10 19 16 RSNT1 2H R9-6393) from two isolates that were closely re-
lated to, or among the emetic group III isolates sequenced in this outbreak
produced nearly identical results in terms of the number and identity of core
181
SNPs detected. Both reference genomes were identical to the emetic group III
outbreak isolates sequenced here in terms of panC group, rpoB AT, MLST ST,
and virulotype. Additionally, the closed chromosome and draft genome had
ANI values of > 99.8 and 99.9, respectively, relative to all emetic group III out-
break isolates in this study, which can be considered highly similar. Comparable
findings have been observed in analyses of Salmonella enterica serovar Heidel-
berg WGS data (Usongo et al. 2018), suggesting that either closed genomes or
high-quality draft genomes are adequate for reference-based SNP calling, pro-
vided both are similar enough to the outbreak strains being queried. While the
thresholds at which reference genomes become ”similar enough” and of suffi-
cient quality for reference-based SNP calling for outbreak detection warrant fur-
ther investigation, we have demonstrated here that, for emetic group III ST 26
B. cereus group genomes, the publicly available closed chromosome of B. cereus
AH187 can serve as an adequate standard.
With regard to differences in the number of core SNPs identified in the 30
emetic group III isolates using different variant calling pipelines, the pipelines
that used assembled genomes as input (kSNP3 and Parsnp) produced higher
numbers of core SNPs than their counterparts that relied on short Illumina
reads. Additionally, when used to query core SNPs in 55 emetic group III ST
26 B. cereus group genomes, both kSNP3 and Parsnp produced core SNPs that
yielded topologically similar phylogenies. kSNP3 employs a reference-free k-
mer based SNP calling approach (Gardner and Hall 2013; Gardner, Slezak,
and Hall 2015), while Parsnp uses a reference-based core genome alignment
approach (Treangen et al. 2014), and both are useful for calling variants in large
data sets. These approaches are also valuable when reads are not available for
SNP calling (Olson et al. 2015), as demonstrated here by the comparison of
182
outbreak genomes with publicly available genomes: core SNPs obtained using
both kSNP3 and Parsnp were able to consistently produce phylogenies in which
the 30 emetic isolates from this outbreak formed a well-supported clade among
all emetic group III ST 26 B. cereus group genomes. However, kSNP3 has been
shown to lack specificity relative to other pipelines (i.e., CFSAN, LYVE-SET)
when differentiating outbreak isolates from non-outbreak isolates for L. monocy-
togenes, E. coli, and S. enterica (Katz et al. 2017). Here, the CFSAN and LYVE-SET
pipelines identified similar SNPs that produced highly congruent phylogenies.
This is unsurprising, considering that both the CFSAN and LYVE-SET pipelines
were designed specifically for identifying SNPs in closely related strains from
outbreaks (Katz et al. 2017), and both employ the most stringent filtering crite-
ria of all pipelines tested here.
5.5.4 As WGS Becomes Routinely Integrated Into Food Safety,
Clinical, and Epidemiological Realms, It Is Likely That
the Number of Illnesses Attributed to B. cereus Will In-
crease
Here, we offer the first description of a foodborne outbreak caused by B. cereus
group species to be characterized using WGS, and we provide a glimpse into
the genomic variation one might expect within an emetic group III B. cereus out-
break using several different variant calling pipelines. However, our ability to
query emetic group III genomes outside of this outbreak is limited by the lack of
publicly available genomic data and metadata from emetic isolates. Of the 2,156
183
B. cereus group genomes available in NCBI’s RefSeq database as of March 2018,
only 29 were from group III and possessed the cesABCD operon, 25 of which
belonged to MLST ST 26. While not ideal, this is an improvement, as there
were only 19 emetic group III genomes available in NCBI’s Genbank database
in April 2017 (Carroll et al. 2017). As more B. cereus group WGS data, particu-
larly, data from emetic B. cereus group isolates, become publicly available, more
outbreaks and clusters are likely to be resolved in tandem, a phenomenon that
has been observed for L. monocytogenes (Jackson et al. 2016). Additionally, vari-
ant calling and cluster/outbreak detection methods for characterizing B. cereus
group isolates from foodborne outbreaks can be further refined and optimized
as more WGS, metadata and epidemiological data become available for clinical
and non-clinical isolates.
5.6 Acknowledgments
This material is based on work supported by the National Science Foundation
Graduate Research Fellowship Program under grant no. DGE-1144153. This
work was supported also by the USDA National Institute of Food and Agricul-
ture Hatch Appropriations under Project #PEN04646 and Accession #1015787,
and Penn State Huck Institutes of the Life Sciences that supported the whole-
genome sequencing through the Penn State Genomics Core Facility. The authors
would like to acknowledge the Wadsworth Center Tissue Culture & Media Core
for providing the media used in this work, and Dr. Joshua Lambert from The
Pennsylvania State University for providing tissue culture laboratory facility
and advising.
184
5.7 References
Allard, M. W. et al. (2017). “Genomics of foodborne pathogens for microbial
food safety”. In: Curr Opin Biotechnol 49, pp. 224–229. DOI: 10.1016/j.
copbio.2017.11.002.
Anderson, M. J. and D. C. I. Walsh (2013). “PERMANOVA, ANOSIM, and the
Mantel test in the face of heterogeneous dispersions: What null hypothesis
are you testing?” In: Ecological Monographs 83.4, pp. 557–574. DOI: 10.1890/
12-2010.1.
Andersson, M. A. et al. (2004). “Sperm bioassay for rapid detection of cereulide-
producing Bacillus cereus in food and related environments”. In: Int J Food
Microbiol 94.2, pp. 175–83. DOI: 10.1016/j.ijfoodmicro.2004.01.
018.
Ashton, Philip et al. (2015). “Revolutionising Public Health Reference Micro-
biology using Whole Genome Sequencing: Salmonella as an exemplar”. In:
bioRxiv. DOI: 10.1101/033225.
Bankevich, A. et al. (2012). “SPAdes: a new genome assembly algorithm and
its applications to single-cell sequencing”. In: J Comput Biol 19.5, pp. 455–77.
DOI: 10.1089/cmb.2012.0021.
Bennett, S. D., K. A. Walsh, and L. H. Gould (2013). “Foodborne disease out-
breaks caused by Bacillus cereus, Clostridium perfringens, and Staphylococcus
aureus–United States, 1998-2008”. In: Clin Infect Dis 57.3, pp. 425–33. DOI:
10.1093/cid/cit244.
Bolger, A. M., M. Lohse, and B. Usadel (2014). “Trimmomatic: a flexible trimmer
for Illumina sequence data”. In: Bioinformatics 30.15, pp. 2114–20. DOI: 10.
1093/bioinformatics/btu170.
Bruen, T. C., H. Philippe, and D. Bryant (2006). “A simple and robust statistical
test for detecting the presence of recombination”. In: Genetics 172.4, pp. 2665–
81. DOI: 10.1534/genetics.105.048975.
185
Carroll, L. M., J. Kovac, R. A. Miller, and M. Wiedmann (2017). “Rapid,
high-throughput identification of anthrax-causing and emetic Bacillus cereus
group genome assemblies using BTyper, a computational tool for virulence-
based classification of Bacillus cereus group isolates using nucleotide se-
quencing data”. In: Appl Environ Microbiol. DOI: 10.1128/AEM.01096-
17.
Castiaux, V., X. Liu, L. Delbrassinne, and J. Mahillon (2015). “Is Cytotoxin K
from Bacillus cereus a bona fide enterotoxin?” In: Int J Food Microbiol 211,
pp. 79–85. DOI: 10.1016/j.ijfoodmicro.2015.06.020.
Ceuppens, S., N. Boon, and M. Uyttendaele (2013). “Diversity of Bacillus cereus
group strains is reflected in their broad range of pathogenicity and diverse
ecological lifestyles”. In: FEMS Microbiol Ecol 84.3, pp. 433–50. DOI: 10 .
1111/1574-6941.12110.
Chen, Y., Y. Luo, P. Curry, et al. (2017). “Assessing the genome level diversity
of Listeria monocytogenes from contaminated ice cream and environmental
samples linked to a listeriosis outbreak in the United States”. In: PLoS One
12.2, e0171389. DOI: 10.1371/journal.pone.0171389.
Chen, Y., Y. Luo, J. Pettengill, et al. (2017). “Singleton Sequence Type 382, an
Emerging Clonal Group of Listeria monocytogenes Associated with Three
Multistate Outbreaks Linked to Contaminated Stone Fruit, Caramel Apples,
and Leafy Green Salad”. In: J Clin Microbiol 55.3, pp. 931–941. DOI: 10.1128/
JCM.02140-16.
Clarke, K. R. (1993). “Non-parametric multivariate analyses of changes in com-
munity structure”. In: Australian Journal of Ecology 18.1, pp. 117–143. DOI:
10 . 1111 / j . 1442 - 9993 . 1993 . tb00438 . x. eprint: https : / /
onlinelibrary.wiley.com/doi/pdf/10.1111/j.1442-9993.
1993.tb00438.x.
Croucher, N. J. et al. (2015). “Rapid phylogenetic analysis of large samples of
recombinant bacterial whole genome sequences using Gubbins”. In: Nucleic
Acids Res 43.3, e15. DOI: 10.1093/nar/gku1196.
Danecek, P. et al. (2011). “The variant call format and VCFtools”. In: Bioinformat-
ics 27.15, pp. 2156–8. DOI: 10.1093/bioinformatics/btr330.
186
Davis, Steve et al. (2015). “CFSAN SNP Pipeline: an automated method for con-
structing SNP matrices from next-generation sequence data”. In: PeerJ Com-
puter Science 1, e20. DOI: 10.7717/peerj-cs.20.
de Jonge, Edwin (2018). docopt: Command-Line Interface Specification Language. R
package version 0.6.1.
Doll, V. M., M. Ehling-Schulz, and R. Vogelmann (2013). “Concerted action
of sphingomyelinase and non-hemolytic enterotoxin in pathogenic Bacillus
cereus”. In: PLoS One 8.4, e61404. DOI: 10.1371/journal.pone.0061404.
Ehling-Schulz, M., E. Frenzel, and M. Gohar (2015). “Food-bacteria interplay:
pathometabolism of emetic Bacillus cereus”. In: Front Microbiol 6, p. 704. DOI:
10.3389/fmicb.2015.00704.
Ehling-Schulz, M., M. Fricker, and S. Scherer (2004). “Bacillus cereus, the
causative agent of an emetic type of food-borne illness”. In: Mol Nutr Food
Res 48.7, pp. 479–87. DOI: 10.1002/mnfr.200400055.
Ehling-Schulz, M., B. Svensson, et al. (2005). “Emetic toxin formation of Bacillus
cereus is restricted to a single evolutionary lineage of closely related strains”.
In: Microbiology 151.Pt 1, pp. 183–97. DOI: 10.1099/mic.0.27607-0.
Fisichella, M. et al. (2009). “Mesoporous silica nanoparticles enhance MTT for-
mazan exocytosis in HeLa cells and astrocytes”. In: Toxicol In Vitro 23.4,
pp. 697–703. DOI: 10.1016/j.tiv.2009.02.007.
Gardner, S. N. and B. G. Hall (2013). “When whole-genome alignments just
won’t work: kSNP v2 software for alignment-free SNP discovery and phylo-
genetics of hundreds of microbial genomes”. In: PLoS One 8.12, e81760. DOI:
10.1371/journal.pone.0081760.
Gardner, S. N., T. Slezak, and B. G. Hall (2015). “kSNP3.0: SNP detection
and phylogenetic analysis of genomes without genome alignment or ref-
erence genome”. In: Bioinformatics 31.17, pp. 2877–8. DOI: 10 . 1093 /
bioinformatics/btv271.
187
Garrison, Erik and Gabor Marth (2012). “Haplotype-based variant detection
from short-read sequencing”. In: arXiv 1207.3907v2.
Ghosh, A. C. (1978). “Prevalence of Bacillus cereus in the faeces of healthy
adults”. In: J Hyg (Lond) 80.2, pp. 233–6.
Glasset, B. et al. (2016). “Bacillus cereus-induced food-borne outbreaks in France,
2007 to 2014: epidemiology and genetic characterisation”. In: Euro Surveill
21.48. DOI: 10.2807/1560-7917.ES.2016.21.48.30413.
Grad, Y. H. et al. (2012). “Genomic epidemiology of the Escherichia coli O104:H4
outbreaks in Europe, 2011”. In: Proc Natl Acad Sci U S A 109.8, pp. 3065–70.
DOI: 10.1073/pnas.1121491109.
Granum, P. E. and T. Lund (1997). “Bacillus cereus and its food poisoning toxins”.
In: FEMS Microbiol Lett 157.2, pp. 223–8.
Guinebretiere, M. H., S. Auger, et al. (2013). “Bacillus cytotoxicus sp. nov. is a
novel thermotolerant species of the Bacillus cereus Group occasionally asso-
ciated with food poisoning”. In: Int J Syst Evol Microbiol 63.Pt 1, pp. 31–40.
DOI: 10.1099/ijs.0.030627-0.
Guinebretiere, M. H., F. L. Thompson, et al. (2008). “Ecological diversification
in the Bacillus cereus Group”. In: Environ Microbiol 10.4, pp. 851–65. DOI: 10.
1111/j.1462-2920.2007.01495.x.
Guinebretiere, M. H., P. Velge, et al. (2010). “Ability of Bacillus cereus group
strains to cause food poisoning varies according to phylogenetic affiliation
(groups I to VII) rather than species affiliation”. In: J Clin Microbiol 48.9,
pp. 3388–91. DOI: 10.1128/JCM.00921-10.
Gymoese, P. et al. (2017). “Investigation of Outbreaks of Salmonella enterica
Serovar Typhimurium and Its Monophasic Variants Using Whole-Genome
Sequencing, Denmark”. In: Emerg Infect Dis 23.10, pp. 1631–1639. DOI: 10.
3201/eid2310.161248.
Heibl, C. (2008 onwards). PHYLOCH: R language tree plotting tools and interfaces to
diverse phylogenetic software packages. http://www.christophheibl.de/Rpackages.html.
188
Hoffmann, M. et al. (2016). “Tracing Origins of the Salmonella Bareilly Strain
Causing a Food-borne Outbreak in the United States”. In: J Infect Dis 213.4,
pp. 502–8. DOI: 10.1093/infdis/jiv297.
Holmes, A. et al. (2015). “Utility of Whole-Genome Sequencing of Escherichia
coli O157 for Outbreak Detection and Epidemiological Surveillance”. In: J
Clin Microbiol 53.11, pp. 3565–73. DOI: 10.1128/JCM.01066-15.
Hwang, S., E. Kim, I. Lee, and E. M. Marcotte (2015). “Systematic comparison of
variant calling pipelines using gold standard personal exome variants”. In:
Sci Rep 5, p. 17875. DOI: 10.1038/srep17875.
Ivy, R. A. et al. (2012). “Identification and characterization of psychrotolerant
sporeformers associated with fluid milk production and processing”. In:
Appl Environ Microbiol 78.6, pp. 1853–64. DOI: 10.1128/AEM.06536-11.
Jackson, Brendan R. et al. (2016). “Implementation of Nationwide Real-time
Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and
Investigation”. In: Clinical Infectious Diseases 63.3, pp. 380–386. DOI: 10 .
1093 / cid / ciw242. eprint: http : / / oup . prod . sis . lan / cid /
article-pdf/63/3/380/8039807/ciw242.pdf.
Jain, C., R. Lm Rodriguez, A. M. Phillippy, K. T. Konstantinidis, and S. Aluru
(2018). “High throughput ANI analysis of 90K prokaryotic genomes reveals
clear species boundaries”. In: Nat Commun 9.1, p. 5114. DOI: 10.1038/
s41467-018-07641-9.
Jakobsen, I. B. and S. Easteal (1996). “A program for calculating and display-
ing compatibility matrices as an aid in determining reticulate evolution in
molecular sequences”. In: Comput Appl Biosci 12.4, pp. 291–5.
Jimenez, Guillermo, Anicet R. Blanch, Javier Tamames, and Ramon Rossello-
Mora (2013). “Complete Genome Sequence of Bacillus toyonensis BCT-7112T,
the Active Ingredient of the Feed Additive Preparation Toyocerin”. In:
Genome announcements 1.6, e01080–13. DOI: 10.1128/genomeA.01080-
13.
189
Joensen, K. G. et al. (2014). “Real-time whole-genome sequencing for routine
typing, surveillance, and outbreak detection of verotoxigenic Escherichia
coli”. In: J Clin Microbiol 52.5, pp. 1501–10. DOI: 10.1128/JCM.03617-13.
Jombart, Thibaut, Michelle Kendall, Jacob Almagro-Garcia, and Caroline Col-
ijn (2017). “treespace: Statistical Exploration of Landscapes of Phylogenetic
Trees”. In: Molecular Ecology Resources 17 (6), pp. 1385–1392.
Kabir, M. Shahjahan, Ying-Hsin Hsieh, Steven Simpson, Khalil Kerdahi, and
Irshad M. Sulaiman (2017). “Evaluation of Two Standard and Two Chro-
mogenic Selective Media for Optimal Growth and Enumeration of Isolates
of 16 Unique Bacillus Species”. In: Journal of Food Protection 80.6. PMID:
28467187, pp. 952–962. DOI: 10.4315/0362-028X.JFP-16-441. eprint:
https://doi.org/10.4315/0362-028X.JFP-16-441.
Katz, L. S. et al. (2017). “A Comparative Analysis of the Lyve-SET Phyloge-
nomics Pipeline for Genomic Epidemiology of Foodborne Pathogens”. In:
Front Microbiol 8, p. 375. DOI: 10.3389/fmicb.2017.00375.
Kendall, Michelle and Caroline Colijn (2015). “A tree metric using structure and
length to capture distinct phylogenetic signals”. In: arXiv 1507.05211v3. DOI:
10.1093/molbev/msw124.
— (2016). “Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolu-
tion”. In: Molecular Biology and Evolution 33.10, pp. 2735–2743. DOI: 10 .
1093/molbev/msw124. eprint: http://oup.prod.sis.lan/mbe/
article-pdf/33/10/2735/17472612/msw124.pdf.
Kovac, Jasna, Henk den Bakker, Laura M. Carroll, and Martin Wiedmann (2017).
“Precision food safety: A systems approach to food safety facilitated by
genomics tools”. In: TrAC Trends in Analytical Chemistry 96.Supplement C,
pp. 52–61.
Kovac, J. et al. (2016). “Production of hemolysin BL by Bacillus cereus group iso-
lates of dairy origin is associated with whole-genome phylogenetic clade”.
In: BMC Genomics 17, p. 581. DOI: 10.1186/s12864-016-2883-z.
190
Kwong, J. C. et al. (2016). “Prospective Whole-Genome Sequencing Enhances
National Surveillance of Listeria monocytogenes”. In: J Clin Microbiol 54.2,
pp. 333–42. DOI: 10.1128/JCM.02344-15.
Lewis, P. O. (2001). “A likelihood approach to estimating phylogeny from dis-
crete morphological character data”. In: Syst Biol 50.6, pp. 913–25.
Li, H. and R. Durbin (2010). “Fast and accurate long-read alignment with
Burrows-Wheeler transform”. In: Bioinformatics 26.5, pp. 589–95. DOI: 10.
1093/bioinformatics/btp698.
Li, H., B. Handsaker, et al. (2009). “The Sequence Alignment/Map format
and SAMtools”. In: Bioinformatics 25.16, pp. 2078–9. DOI: 10 . 1093 /
bioinformatics/btp352.
Li, Heng (2013). “Aligning sequence reads, clone sequences and assembly con-
tigs with BWA-MEM”. In: arXiv:1303.3997v1 [q-bio.GN].
Liu, Y. et al. (2017). “Proposal of nine novel species of the Bacillus cereus group”.
In: Int J Syst Evol Microbiol 67.8, pp. 2499–2508. DOI: 10.1099/ijsem.0.
001821.
Lotte, R. et al. (2017). “Virulence Analysis of Bacillus cereus Isolated after Death
of Preterm Neonates, Nice, France, 2013”. In: Emerg Infect Dis 23.5, pp. 845–
848. DOI: 10.3201/eid2305.161788.
Mair-Jenkins, J. et al. (2017). “Investigation using whole genome sequencing of
a prolonged restaurant outbreak of Salmonella Typhimurium linked to the
building drainage system, England, February 2015 to March 2016”. In: Euro
Surveill 22.49. DOI: 10.2807/1560-7917.ES.2017.22.49.17-00037.
McCloskey, R. M. and A. F. Y. Poon (2017). “A model-based clustering method to
detect infectious disease transmission outbreaks from sequence variation”.
In: PLoS Comput Biol 13.11, e1005868. DOI: 10.1371/journal.pcbi.
1005868.
Messelhausser, U. et al. (2014). “Emetic Bacillus cereus are more volatile than
thought: recent foodborne outbreaks and prevalence studies in Bavaria
191
(2007-2013)”. In: Biomed Res Int 2014, p. 465603. DOI: 10 . 1155 / 2014 /
465603.
Miller, R. A., S. M. Beno, et al. (2016). “Bacillus wiedmannii sp. nov., a psychro-
tolerant and cytotoxic Bacillus cereus group species isolated from dairy foods
and dairy environments”. In: Int J Syst Evol Microbiol 66.11, pp. 4744–4753.
DOI: 10.1099/ijsem.0.001421.
Miller, R. A., J. Jian, S. M. Beno, M. Wiedmann, and J. Kovac (2018). “Intraclade
Variability in Toxin Production and Cytotoxicity of Bacillus cereus Group
Type Strains and Dairy-Associated Isolates”. In: Appl Environ Microbiol 84.6.
DOI: 10.1128/AEM.02479-17.
Moran-Gilad, J. (2017). “Whole genome sequencing (WGS) for food-borne
pathogen surveillance and control - taking the pulse”. In: Euro Surveill 22.23.
DOI: 10.2807/1560-7917.ES.2017.22.23.30547.
Morgulis, A., E. M. Gertz, A. A. Schaffer, and R. Agarwala (2006). “A fast
and symmetric DUST implementation to mask low-complexity DNA se-
quences”. In: J Comput Biol 13.5, pp. 1028–40. DOI: 10.1089/cmb.2006.
13.1028.
Moura, A. et al. (2017). “Real-Time Whole-Genome Sequencing for Surveillance
of Listeria monocytogenes, France”. In: Emerg Infect Dis 23.9, pp. 1462–1470.
DOI: 10.3201/eid2309.170336.
Naranjo, M. et al. (2011). “Sudden death of a young adult associated with Bacil-
lus cereus food poisoning”. In: J Clin Microbiol 49.12, pp. 4379–81. DOI: 10.
1128/JCM.05129-11.
Oksanen, Jari et al. (2017). vegan: Community Ecology Package. R package version
2.4-2.
Olson, N. D. et al. (2015). “Best practices for evaluating single nucleotide variant
calling methods for microbial genomics”. In: Front Genet 6, p. 235. DOI: 10.
3389/fgene.2015.00235.
192
Paradis, E., J. Claude, and K. Strimmer (2004). “APE: Analyses of Phylogenetics
and Evolution in R language”. In: Bioinformatics 20.2, pp. 289–90.
Pightling, A. W., N. Petronella, and F. Pagotto (2014). “Choice of reference se-
quence and assembler for alignment of Listeria monocytogenes short-read se-
quence data greatly influences rates of error in SNP analyses”. In: PLoS One
9.8, e104579. DOI: 10.1371/journal.pone.0104579.
— (2015). “Choice of reference-guided sequence assembler and SNP caller for
analysis of Listeria monocytogenes short-read sequence data greatly influences
rates of error”. In: BMC Res Notes 8, p. 748. DOI: 10.1186/s13104-015-
1689-4.
Pruitt, K. D., T. Tatusova, and D. R. Maglott (2007). “NCBI reference sequences
(RefSeq): a curated non-redundant sequence database of genomes, tran-
scripts and proteins”. In: Nucleic Acids Res 35.Database issue, pp. D61–5. DOI:
10.1093/nar/gkl842.
R Core Team (2018). R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing. Vienna, Austria.
R Hackathon et al. (2019). phylobase: Base Package for Phylogenetic Structures and
Comparative Data. R package version 0.8.6.
Revell, Liam J. (2012). “phytools: An R package for phylogenetic comparative
biology (and other things).” In: Methods in Ecology and Evolution 3, pp. 217–
223.
Rusconi, B. et al. (2016). “Whole Genome Sequencing for Genomics-Guided
Investigations of Escherichia coli O157:H7 Outbreaks”. In: Front Microbiol 7,
p. 985. DOI: 10.3389/fmicb.2016.00985.
Sanaei-Zadeh, H. (2012). “Can Bacillus cereus food poisoning cause sudden
death?” In: J Clin Microbiol 50.11, 3816, author reply 3817. DOI: 10.1128/
JCM.00059-12.
193
Sandmann, S. et al. (2017). “Evaluating Variant Calling Tools for Non-Matched
Next-Generation Sequencing Data”. In: Sci Rep 7, p. 43169. DOI: 10.1038/
srep43169.
Scallan, E. et al. (2011). “Foodborne illness acquired in the United States–major
pathogens”. In: Emerg Infect Dis 17.1, pp. 7–15. DOI: 10.3201/eid1701.
P1110110.3201/eid1701.091101p1.
Schliep, Klaus, Alastair J. Potts, David A. Morrison, and Guido W. Grimm
(2017). “Intertwining phylogenetic trees and networks”. In: Methods in Ecol-
ogy and Evolution 8.10, pp. 1212–1220. DOI: 10.1111/2041-210X.12760.
eprint: https://besjournals.onlinelibrary.wiley.com/doi/
pdf/10.1111/2041-210X.12760.
Schoeni, J. L. and A. C. Wong (2005). “Bacillus cereus food poisoning and its
toxins”. In: J Food Prot 68.3, pp. 636–48.
Smith, J. M. (1992). “Analyzing the mosaic structure of genes”. In: J Mol Evol
34.2, pp. 126–9.
Stamatakis, A. (2014). “RAxML version 8: a tool for phylogenetic analysis and
post-analysis of large phylogenies”. In: Bioinformatics 30.9, pp. 1312–3. DOI:
10.1093/bioinformatics/btu033.
Stenfors Arnesen, L. P., A. Fagerlund, and P. E. Granum (2008). “From soil to
gut: Bacillus cereus and its food poisoning toxins”. In: FEMS Microbiol Rev
32.4, pp. 579–606. DOI: 10.1111/j.1574-6976.2008.00112.x.
Taboada, E. N., M. R. Graham, J. A. Carrico, and G. Van Domselaar (2017). “Food
Safety in the Age of Next Generation Sequencing, Bioinformatics, and Open
Data Access”. In: Front Microbiol 8, p. 909. DOI: 10.3389/fmicb.2017.
00909.
Tallent, S. M., K. M. Kotewicz, E. A. Strain, and R. W. Bennett (2012). “Efficient
Isolation and Identification of Bacillus cereus Group”. In: Journal of Aoac Inter-
national 95.2, pp. 446–451. DOI: 10.5740/jaoacint.11-251.
194
Tallent, S. M., E. J. Rhodehamel, S. M. Harmon, and R. W. Bennett (1998). “Bacil-
lus cereus”. In: Bacteriological analytical manual, 8th edition, 1998 and Food-
borne pathogenic microorganisms and natural toxins handbook, 1998. Ed. by FDA.
Gaithersburg, MD: AOAC International. Chap. 14.
Taylor, Angela J. et al. (2015). “Characterization of Foodborne Outbreaks of
Salmonella enterica Serovar Enteritidis with Whole-Genome Sequencing Sin-
gle Nucleotide Polymorphism-Based Analysis for Surveillance and Out-
break Detection”. In: Journal of Clinical Microbiology 53.10. Ed. by D. J.
Diekema, pp. 3334–3340. DOI: 10.1128/JCM.01280-15. eprint: https:
//jcm.asm.org/content/53/10/3334.full.pdf.
Treangen, T. J., B. D. Ondov, S. Koren, and A. M. Phillippy (2014). “The Harvest
suite for rapid core-genome alignment and visualization of thousands of in-
traspecific microbial genomes”. In: Genome Biol 15.11, p. 524. DOI: 10.1186/
PREACCEPT-2573980311437212.
Turnbull, P. C. and J. M. Kramer (1985). “Intestinal carriage of Bacillus cereus:
faecal isolation studies in three population groups”. In: J Hyg (Lond) 95.3,
pp. 629–38.
Usongo, V. et al. (2018). “Impact of the choice of reference genome on the ability
of the core genome SNV methodology to distinguish strains of Salmonella
enterica serovar Heidelberg”. In: PLoS One 13.2, e0192233. DOI: 10.1371/
journal.pone.0192233.
Vangay, P., E. B. Fugett, Q. Sun, and M. Wiedmann (2013). “Food microbe
tracker: a web-based tool for storage and comparison of food-associated mi-
crobes”. In: J Food Prot 76.2, pp. 283–94. DOI: 10.4315/0362-028X.JFP-
12-276.
Walker, T. M. et al. (2018). “A cluster of multidrug-resistant Mycobacterium tuber-
culosis among patients arriving in Europe from the Horn of Africa: a molec-
ular epidemiological study”. In: Lancet Infect Dis 18.4, pp. 431–440. DOI: 10.
1016/S1473-3099(18)30004-5.
Wickham, Hadley (2017). stringr: Simple, Consistent Wrappers for Common String
Operations. R package version 1.2.0.
195
Ye, J., S. McGinnis, and T. L. Madden (2006). “BLAST: improvements for better
sequence analysis”. In: Nucleic Acids Res 34.Web Server issue, W6–9. DOI:
10.1093/nar/gkl164.
Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-
Yuk Lam (2017). “ggtree: an r package for visualization and annotation of
phylogenetic trees with their covariates and other associated data”. In: Meth-
ods in Ecology and Evolution 8.1, pp. 28–36. DOI: doi:10.1111/2041-
210X.12628.
196
CHAPTER 6
CONCLUSION
Foodborne disease-causing agents have been estimated to cause more than
600 million illnesses and more than 400,000 deaths worldwide annually (WHO
2015). Due to their profound human and economic impact, there is incentive
to query bacterial disease agents responsible for a significant proportion of ill-
nesses, deaths, and disease burden using whole-genome sequencing (WGS);
congruent with this, the amount of publicly available sequencing data derived
from microbes has doubled in size every two years, and will likely continue
to grow increasingly (Bradley, Bakker, et al. 2019). The previous chapters de-
tail how Illumina sequencing data from thousands of bacterial isolates can be
leveraged to draw meaningful biological conclusions relevant to food safety and
quality.
6.1 NGS can be used to replicate many microbiological assays
in silico with high accuracy, speed, and throughput
As demonstrated in Chapters 2 (L. M. Carroll, M. Wiedmann, et al. 2017) and 4
(L. M. Carroll, Kovac, et al. 2017), numerous assays used to characterize food-
associated microorganisms can be replicated in silico using NGS, often with the
advantage of increased speed and throughput. In Chapter 2, whole-genome
sequencing (WGS) was used to query Salmonella enterica serotypes capable of
infecting both bovine and human hosts (i.e., serotypes Dublin, Newport, and
Typhimurium) from bovine and human sources in different geographic regions
of the United States (New York State on the east coast, and Washington State
197
on the west coast). In silico detection of antimicrobial resistance (AMR) determi-
nants was able to predict phenotypic resistance to antimicrobials used in human
and veterinary medicine with high accuracy (L. M. Carroll, M. Wiedmann, et
al. 2017). Additionally, in silico Salmonella serotype designations were consistent
with (and, sometimes even more accurate than) those assigned using traditional
serotyping (L. M. Carroll, M. Wiedmann, et al. 2017). These results further sup-
port that WGS can be used to reliably predict AMR phenotypes and Salmonella
serotype (Bradley, Gordon, et al. 2015; McDermott et al. 2016; S. Zhang et al.
2015; Yoshida et al. 2016) and attest to the robustness of these in silico assays in
not only human clinical isolates, but those of animal (i.e., bovine) origin as well
(L. M. Carroll, M. Wiedmann, et al. 2017).
In Chapter 4 (L. M. Carroll, Kovac, et al. 2017), PCR-based detection of vir-
ulence factors, as well as single- (i.e., panC and rpoB) and multi-locus sequence
typing for multiple species in the Bacillus cereus group, were shown to be read-
ily replicated in silico with high accuracy. Additionally, when implemented in a
freely available and open-source pipeline, these in silico assays could be scaled
to hundreds of genomes to gain insight into the population structure and vir-
ulence capacity of all known members of the B. cereus group. While efforts to
sequence B. cereus strains are not as well-established as those for other food-
borne pathogens (e.g., Salmonella enterica, Listeria monocytogenes), the number of
publicly available B. cereus group genomes is increasing (Laura M. Carroll, Mar-
tin Wiedmann, et al. 2019). As such, scalable, rapid in silico typing methods will
become increasingly valuable and will offer further insight into the genomics of
the group, with the potential to explore novel lineages important to food safety,
quality, and human health (e.g., as was done for proposed novel B. cereus group
species ”Bacillus clarus”) (Acevedo et al. 2019).
198
6.2 NGS can be used to identify novel genomic elements asso-
ciated with clinically relevant phenotypes
In addition to replicating existing microbiological assays, NGS can be used to
identify novel associations between genomic elements and phenotypes of inter-
est, as was demonstrated in Chapter 3 (Laura M. Carroll, Gaballa, et al. 2019):
during routine in silico screening of sequenced Salmonella enterica genomes, a
novel mobilized colistin resistance gene, mcr-9, was identified based on its simi-
larity to existing mcr homologues (Laura M. Carroll, Gaballa, et al. 2019). While
mcr-9 was confirmed to confer resistance to colistin up to and beyond the clin-
ical breakpoint when cloned into Escherichia coli, the Salmonella Typhimurium
isolate in which it was initially detected was not itself clinically resistant (Laura
M. Carroll, Gaballa, et al. 2019). This approach can be contrasted with the ”tra-
ditional” approach to mcr identification, in which a colistin-resistant bacterial
isolate is used to identify mcr homologues, as was done to identify mcr-1, -2,
-3, -4, -5, -7, and -8 (Liu et al. 2016; Xavier et al. 2016; Yin et al. 2017; Carattoli
et al. 2017; Borowiak et al. 2017; Yang et al. 2018; Wang et al. 2018) (in the case
of mcr-6, a colistin-sensitive Moraxella strain was screened for mcr-1 and mcr-
2 and was found to harbor a mcr-2-like gene, which was later renamed mcr-6)
(AbuOun et al. 2017; Partridge et al. 2018). In the case of mcr-9, the traditional
route of mcr identification (i.e., testing for bacterial resistance to colistin, and
then identifying mcr-like genes if the isolate is colistin-resistant at the clinical
breakpoint under standard testing conditions) would have left it undetected.
It is likely that routine in silico screening of Enterobactericiae genomes will yield
other mcr genes capable of conferring resistance to colistin. However, as was the
case with mcr-9, future studies to determine the conditions under which differ-
199
ent mcr homologues are transcribed and expressed are warranted. Furthermore,
the current view of colistin resistance (and antimicrobial resistance as a whole),
strictly though the lens of a susceptible-resistant dichotomy, warrants critique,
as testing conditions have been shown to influence mcr expression and colistin
minimum inhibitory concentration (MIC) (H. Zhang et al. 2017; Gwozdzinski
et al. 2018).
6.3 NGS can be used to query pathogens associated with food-
borne outbreaks at higher resolution than its predecessors
NGS technologies have been implemented in public health settings to routinely
sequence numerous foodborne pathogens, including Salmonella enterica, Liste-
ria monocytogenes, and Escherichia coli (Taylor et al. 2015; Hoffmann et al. 2016;
Gymoese et al. 2017; Grad et al. 2012; Holmes et al. 2015; Rusconi et al. 2016;
Jackson et al. 2016; Kwong et al. 2016; Chen, Luo, Pettengill, et al. 2017; Chen,
Luo, Curry, et al. 2017; Moura et al. 2017). Chapter 5 offered the first descrip-
tion of a foodborne outbreak caused by members of the Bacillus cereus group
in which WGS was used to characterize isolates (Laura M. Carroll, Martin
Wiedmann, et al. 2019). In addition to providing the level of expected diver-
sity among emetic Bacillus cereus outbreak isolates obtained via different vari-
ant calling methodologies, the study presented in Chapter 5 showcases that
WGS can reliably differentiate emetic B. cereus strains from a single-source out-
break from publicly available genomes of the same sequence type and virulo-
type, even in the absence of large amounts of genomic data from B. cereus group
genomes (Laura M. Carroll, Martin Wiedmann, et al. 2019). Additionally, the
200
value (or lack thereof) of various metrics which might serve as supplemental
metadata (e.g., patient symptoms, bacterial counts) were discussed; in the out-
break presented here, cytotoxicity data proved to be particularly useful for ex-
cluding non-emetic Bacillus cereus group isolates from the outbreak, and, thus,
the possibility of a multi-source outbreak caused by multiple species (Laura
M. Carroll, Martin Wiedmann, et al. 2019). The computational, microbiologi-
cal, and epidemiological methods presented in this study will benefit not only
Bacillus cereus researchers, but also those in public health who are working with
under-studied and under-reported pathogens, particularly those which may be
ubiquitous in the environment or varying in their virulence capacity.
Overall, NGS technologies are being used increasingly in food safety and
public health settings, with the advantage of not only replicating microbiologi-
cal assays in silico, but providing opportunities to develop novel bacterial char-
acterization schemes which query the genomes of bacterial pathogens in their
entirety. Decreasing sequencing costs and increasingly available genomic data
from food-associated microbes and communities will allow for improved bio-
logical inference from farm to fork.
6.4 References
AbuOun, M. et al. (2017). “mcr-1 and mcr-2 variant genes identified in Moraxella
species isolated from pigs in Great Britain from 2014 to 2015”. In: J Antimicrob
Chemother 72.10, pp. 2745–2749. DOI: 10.1093/jac/dkx286.
Acevedo, Marysabel Mendez et al. (2019). “Bacillus clarus sp. nov. is a new Bacil-
lus cereus group species isolated from soil”. In: bioRxiv. DOI: 10 . 1101 /
508077. eprint: https://www.biorxiv.org/content/early/2019/
01/02/508077.full.pdf.
201
Borowiak, M. et al. (2017). “Identification of a novel transposon-associated phos-
phoethanolamine transferase gene, mcr-5, conferring colistin resistance in
d-tartrate fermenting Salmonella enterica subsp. enterica serovar Paratyphi
B”. In: J Antimicrob Chemother 72.12, pp. 3317–3324. DOI: 10.1093/jac/
dkx327.
Bradley, Phelim, Henk C. den Bakker, Eduardo P. C. Rocha, Gil McVean, and
Zamin Iqbal (2019). “Ultrafast search of all deposited bacterial and viral
genomic data”. In: Nature Biotechnology 37.2, pp. 152–159. DOI: 10.1038/
s41587-018-0010-1.
Bradley, Phelim, N. Claire Gordon, et al. (2015). “Rapid antibiotic-resistance pre-
dictions from genome sequence data for Staphylococcus aureus and Mycobac-
terium tuberculosis”. In: Nat Commun 6, pp. 10063–10063. DOI: 10.1038/
ncomms10063.
Carattoli, A. et al. (2017). “Novel plasmid-mediated colistin resistance mcr-4
gene in Salmonella and Escherichia coli, Italy 2013, Spain and Belgium, 2015
to 2016”. In: Euro Surveill 22.31. DOI: 10.2807/1560-7917.ES.2017.22.
31.30589.
Carroll, L. M., J. Kovac, R. A. Miller, and M. Wiedmann (2017). “Rapid,
high-throughput identification of anthrax-causing and emetic Bacillus cereus
group genome assemblies using BTyper, a computational tool for virulence-
based classification of Bacillus cereus group isolates using nucleotide se-
quencing data”. In: Appl Environ Microbiol. DOI: 10.1128/AEM.01096-
17.
Carroll, L. M., M. Wiedmann, et al. (2017). “Whole-Genome Sequencing of
Drug-Resistant Salmonella enterica Isolates from Dairy Cattle and Humans
in New York and Washington States Reveals Source and Geographic Associ-
ations”. In: Appl Environ Microbiol 83.12. DOI: 10.1128/AEM.00140-17.
Carroll, Laura M., Ahmed Gaballa, et al. (2019). “Identification of Novel Mo-
bilized Colistin Resistance Gene mcr-9 in a Multidrug-Resistant, Colistin-
Susceptible Salmonella enterica Serotype Typhimurium Isolate”. In: mBio 10.3.
Ed. by Mark S. Turner, Gregory Siragusa, and David White. DOI: 10.1128/
mBio.00853-19. eprint: https://mbio.asm.org/content/10/3/
e00853-19.full.pdf.
202
Carroll, Laura M., Martin Wiedmann, et al. (2019). “Characterization of Emetic
and Diarrheal Bacillus cereus Strains From a 2016 Foodborne Outbreak Using
Whole-Genome Sequencing: Addressing the Microbiological, Epidemiolog-
ical, and Bioinformatic Challenges”. In: Frontiers in Microbiology 10.144. DOI:
10.3389/fmicb.2019.00144.
Chen, Y., Y. Luo, P. Curry, et al. (2017). “Assessing the genome level diversity
of Listeria monocytogenes from contaminated ice cream and environmental
samples linked to a listeriosis outbreak in the United States”. In: PLoS One
12.2, e0171389. DOI: 10.1371/journal.pone.0171389.
Chen, Y., Y. Luo, J. Pettengill, et al. (2017). “Singleton Sequence Type 382, an
Emerging Clonal Group of Listeria monocytogenes Associated with Three
Multistate Outbreaks Linked to Contaminated Stone Fruit, Caramel Apples,
and Leafy Green Salad”. In: J Clin Microbiol 55.3, pp. 931–941. DOI: 10.1128/
JCM.02140-16.
Grad, Y. H. et al. (2012). “Genomic epidemiology of the Escherichia coli O104:H4
outbreaks in Europe, 2011”. In: Proc Natl Acad Sci U S A 109.8, pp. 3065–70.
DOI: 10.1073/pnas.1121491109.
Gwozdzinski, K., S. Azarderakhsh, C. Imirzalioglu, L. Falgenhauer, and T.
Chakraborty (2018). “An Improved Medium for Colistin Susceptibility Test-
ing”. In: J Clin Microbiol 56.5. DOI: 10.1128/JCM.01950-17.
Gymoese, P. et al. (2017). “Investigation of Outbreaks of Salmonella enterica
Serovar Typhimurium and Its Monophasic Variants Using Whole-Genome
Sequencing, Denmark”. In: Emerg Infect Dis 23.10, pp. 1631–1639. DOI: 10.
3201/eid2310.161248.
Hoffmann, M. et al. (2016). “Tracing Origins of the Salmonella Bareilly Strain
Causing a Food-borne Outbreak in the United States”. In: J Infect Dis 213.4,
pp. 502–8. DOI: 10.1093/infdis/jiv297.
Holmes, A. et al. (2015). “Utility of Whole-Genome Sequencing of Escherichia
coli O157 for Outbreak Detection and Epidemiological Surveillance”. In: J
Clin Microbiol 53.11, pp. 3565–73. DOI: 10.1128/JCM.01066-15.
203
Jackson, Brendan R. et al. (2016). “Implementation of Nationwide Real-time
Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and
Investigation”. In: Clinical Infectious Diseases 63.3, pp. 380–386. DOI: 10 .
1093 / cid / ciw242. eprint: http : / / oup . prod . sis . lan / cid /
article-pdf/63/3/380/8039807/ciw242.pdf.
Kwong, J. C. et al. (2016). “Prospective Whole-Genome Sequencing Enhances
National Surveillance of Listeria monocytogenes”. In: J Clin Microbiol 54.2,
pp. 333–42. DOI: 10.1128/JCM.02344-15.
Liu, Y. Y. et al. (2016). “Emergence of plasmid-mediated colistin resistance mech-
anism MCR-1 in animals and human beings in China: a microbiological and
molecular biological study”. In: Lancet Infect Dis 16.2, pp. 161–8. DOI: 10.
1016/S1473-3099(15)00424-7.
McDermott, Patrick F. et al. (2016). “Whole-Genome Sequencing for Detect-
ing Antimicrobial Resistance in Nontyphoidal Salmonella”. In: Antimicrobial
Agents and Chemotherapy 60.9, pp. 5515–5520. DOI: 10.1128/AAC.01030-
16. eprint: https://aac.asm.org/content/60/9/5515.full.pdf.
Moura, A. et al. (2017). “Real-Time Whole-Genome Sequencing for Surveillance
of Listeria monocytogenes, France”. In: Emerg Infect Dis 23.9, pp. 1462–1470.
DOI: 10.3201/eid2309.170336.
Partridge, S. R. et al. (2018). “Proposal for assignment of allele numbers for
mobile colistin resistance (mcr) genes”. In: J Antimicrob Chemother 73.10,
pp. 2625–2630. DOI: 10.1093/jac/dky262.
Rusconi, B. et al. (2016). “Whole Genome Sequencing for Genomics-Guided
Investigations of Escherichia coli O157:H7 Outbreaks”. In: Front Microbiol 7,
p. 985. DOI: 10.3389/fmicb.2016.00985.
Taylor, Angela J. et al. (2015). “Characterization of Foodborne Outbreaks of
Salmonella enterica Serovar Enteritidis with Whole-Genome Sequencing Sin-
gle Nucleotide Polymorphism-Based Analysis for Surveillance and Out-
break Detection”. In: Journal of Clinical Microbiology 53.10. Ed. by D. J.
Diekema, pp. 3334–3340. DOI: 10.1128/JCM.01280-15. eprint: https:
//jcm.asm.org/content/53/10/3334.full.pdf.
204
Wang, X. et al. (2018). “Emergence of a novel mobile colistin resistance gene,
mcr-8, in NDM-producing Klebsiella pneumoniae”. In: Emerg Microbes Infect
7.1, p. 122. DOI: 10.1038/s41426-018-0124-z.
WHO (2015). WHO estimates of the global burden of foodborne diseases, 2007-2015.
WHO, Geneva, Switzerland.
Xavier, B. B. et al. (2016). “Identification of a novel plasmid-mediated colistin-
resistance gene, mcr-2, in Escherichia coli, Belgium, June 2016”. In: Euro
Surveill 21.27. DOI: 10.2807/1560-7917.ES.2016.21.27.30280.
Yang, Y. Q., Y. X. Li, C. W. Lei, A. Y. Zhang, and H. N. Wang (2018). “Novel
plasmid-mediated colistin resistance gene mcr-7.1 in Klebsiella pneumoniae”.
In: J Antimicrob Chemother. DOI: 10.1093/jac/dky111.
Yin, W. et al. (2017). “Novel Plasmid-Mediated Colistin Resistance Gene mcr-3
in Escherichia coli”. In: MBio 8.3. DOI: 10.1128/mBio.00543-17.
Yoshida, C. E. et al. (2016). “The Salmonella In Silico Typing Resource (SISTR):
An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft
Salmonella Genome Assemblies”. In: PLoS One 11.1, e0147101. DOI: 10 .
1371/journal.pone.0147101.
Zhang, H. et al. (2017). “Expression characteristics of the plasmid-borne mcr-1
colistin resistance gene”. In: Oncotarget 8.64, pp. 107596–107602. DOI: 10.
18632/oncotarget.22538.
Zhang, S. et al. (2015). “Salmonella serotype determination utilizing high-
throughput genome sequencing data”. In: J Clin Microbiol 53.5, pp. 1685–92.
DOI: 10.1128/JCM.00323-15.
205