Food Safety And Data-Driven Science: Developments Using Machine Learning And Databases
No Access Until
Increasing evidence suggests that persistence of Listeria monocytogenes in food processing plants has been the underlying cause of a number of human listeriosis outbreaks. The first part of this research study extracts criteria used by food safety experts in determining bacterial persistence in the environment, using retail delicatessen operations as a model. Using the Delphi Method, we conducted an expert elicitation with 10 food safety experts from academia, industry, and government to classify L. monocytogenes persistence based on environmental sampling results collected over six months for 30 retail delicatessen stores. The results were modeled using variations of random forest, support vector machine, logistic regression, and linear regression; variable importance values of random forest and support vector machine models were consolidated to rank important variables in the experts' classifications. The duration of subtype isolation ranked most important across all expert categories. Sampling site category also ranked high in importance and validation errors doubled when this covariate was removed. Support vector machine and random forest models successfully classified the data with average validation errors of 3.8% and 2.8% (n=144), respectively. Our findings indicate that (i) the frequency of isolations over time and sampling site information are critical factors for experts determining subtype persistence, (ii) food safety experts from different sectors may not use the same criteria in determining persistence, and (iii) machine learning models have potential for future use in environmental surveillance and risk management programs. Further work involving larger data sets is necessary to validate the accuracy of expert and machine classification against biological measurement of L. monocytogenes persistence. To address this need for access to larger biological datasets, we developed Food Microbe Tracker, a public web-based database that allows for archiving and exchange of a variety of molecular subtype data that can be cross-referenced with isolate source data, genetic data, and phenotypic characteristics. Data can be queried with a variety of search criteria, including DNA sequences and banding pattern data (e.g., ribotype, PFGE type). Food Microbe Tracker allows for the deposition of data on any bacterial genus and species, as well as bacteriophages and other viruses. The bacterial genera and species that currently have the most entries in this database include Listeria monocytogenes, Salmonella, Streptococcus spp., Pseudomonas spp., Bacillus spp., and Paenibacillus spp. with over 40,000 isolates present in total. The combination of pathogen and spoilage microorganism data in the database will facilitate source tracking and outbreak detection, improved discovery of emerging subtypes, and an increased understanding of transmission and ecology of these microbes. Continued addition of subtyping, genetic or phenotypic data for a variety of microbial species will broaden the database and facilitate largescale studies on the diversity of food-associated microbes, with much potential to be extended towards the control of any organism of interest in any environment.