This RenatoOrsi_RNA-SEQ_coverage_readme.txt file was generated on 2021-01-19 by Renato Hohl Orsi GENERAL INFORMATION 1. Title of Dataset: Data from: Alternative Sigma factors regulate overlapping as well as distinct stress response and metabolic functions in Listeria monocytogenes stationary phase cells 2. Author Information A. Principal Investigator Contact Information Name: Martin Wiedmann Institution: Cornell University Address: Email: mw16@cornell.edu B. Associate or Co-investigator Contact Information Name: Renato H. Orsi Institution: Cornell University Address: Email: rho2@cornell.edu C. Alternate Contact Information Name: Institution: Address: Email: 3. Date of data collection (range): 2010-06:2011-05 4. Geographic location of data collection: USA SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: 2. Links to publications that cite or use the data: Renato H. Orsi; Soraya Chaturongakul; Haley F. Oliver; Lalit Ponnala; Ahmed Gaballa; Martin Wiedmann. 2021. Alternative Sigma factors regulate overlapping as well as distinct stress response and metabolic functions in Listeria monocytogenes stationary phase cells. Pathogens. 10(4):411. https://doi.org/10.3390/pathogens10040411 3. Recommended citation for this dataset: Renato H. Orsi, Soraya Chaturongakul, Haley F. Oliver, Lalit Ponnala, Ahmed Gaballa, Martin Wiedmann. (2020) Data from: Alternative sigma factors regulate overlapping as well as distinct stress response and metabolic functions in Listeria monocytogenes stationary phase cells. [dataset] Cornell University eCommons Repository. https://doi.org/10.7298/0mjr-6c90 DATA & FILE OVERVIEW 1. File List: Coverage_files: Short description: This folder contains 21 coverage files and one assembly file (10403S.fasta). Each line in the coverage files represent one nucleotide position in the genome assembly of the strain 10403S. Each line in each coverage file has a number, which represent the amount of raw sequence reads that aligned to that specific nucleotide position. Coverage_table: Short description: This folder contains 4 Excel spreadsheets with the coverage of each genetic feature (e.g., gene, transcription unit [TU]). Each genetic feature is in a distinct row in the spreadsheet. Columns represent each sample's raw data and processed data. Column names ending with "sense_COUNT" have the raw coverage in the Forward strand (according to the assembly file) for directional data. Column names ending with "sense_RPKM" have the normalized RPKM coverage in the Forward strand (according to the assembly file) for directional data. Column names ending with "antisense_COUNT" have the raw coverage in the Reverse strand (according to the assembly file) for directional data. Column names ending with "antisense_RPKM" have the normalized RPKM coverage in the Reverse strand (according to the assembly file) for directional data. Column names with "pseudosense"or "pseudoantisense" denote RNA-SEQ samples that were originally non-directional. The sense and antisense counts for these samples were estimated based on the proportion of reads matching the "sense" and "antisense" strands in the RNA-SEQ directional samples for the same strains. The "RNA-Seq_whole_data.xlsx" file has all the coverage data in it. The other 3 files contain data specifically for (i) genes (genes_whole_data.xlsx), (ii) noncoding RNA (ncRNA_whole_data.xlsx), and (iii) transcription units (TU_whole_data.xlsx). METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: Strains and growth conditions. RNA-Seq was performed on (i) the L. monocytogenes parent strain 10403S, (ii) four isogenic triple mutants with internal non-polar deletions in three out of four alternative Sigma factors (i.e., ΔCHL [FSL C3-139], ΔBHL [FSL C3-138], ΔBCL [FSL C3-137], ΔBCH [FSL C3-128]), thus expressing only a single alternative Sigma factor (i.e., SigmaB, SigmaC, SigmaH, and SigmaL, respectively), and (iii) an isogenic quadruple mutant (i.e., ΔBCHL [FSL C3-135]), which expresses none of the four alternative Sigma factors (Table 1). Prior to RNA isolation, bacteria were grown to stationary phase (OD600 = 1.0 + 3 hours) in BHI media, as previously described [Oliver, H. F., Orsi, R. H., Ponnala, L., Keich, U., Wang, W., Sun, Q., Cartinhour, S. W., Filiatrault, M. J., Wiedmann, M., Boor, K. J. (2009) Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs. BMC Genomics 10:641]. RNA isolation, integrity and quality assessment. Cultures grown to stationary phase were treated with RNAProtect bacterial reagent (Qiagen, Valencia, CA) according to the manufacturer’s instructions. Cell pellets were suspended in 5 ml of TRI Reagent (Life Technologies, Gran Island, NY), followed by mechanical disruption (bead-beating with 0.1 mm acid-washed zirconium beads), and RNA extraction using TRI Reagent per the manufacturer’s protocol (Life Technologies). Total RNA was incubated with RQ1 DNase (Promega, Madison, WI) in the presence of RNasin (Promega) to remove remaining DNA. Subsequently, RNA was purified using two phenol-chloroform extractions and one chloroform extraction, followed by RNA precipitation and re-suspension of the RNA in RNAse free water. UV spectrophotometry (Nanodrop, Wilmington, DE) was used to quantify and assess purity of the RNA. Efficacy of the DNase treatment was assessed by TaqMan qPCR analysis of DNA levels of the housekeeping gene rpoB; all samples had DNA log copy numbers ≤ 1.5 copies per 10 ng of RNA and Ct values > 35 cycles indicating negligible levels of DNA contamination. RNA integrity was assessed using the 2100 Bioanalyzer (Agilent, Foster City, CA). mRNA enrichment. Removal of 16S and 23S rRNA from total RNA was performed using the MICROBExpressTM Bacterial mRNA Purification Kit (Life Technologies) according to the manufacturer’s protocol with the exception that no more than 5 mg total RNA was treated per enrichment reaction. Each RNA sample was divided into multiple aliquots of ≤ 5 mg RNA, which were used for separate enrichment reactions. Enriched mRNA samples were pooled and run on the 2100 Bioanalyzer (Agilent) to confirm reduction of 16S and 23S rRNA prior to preparation of cDNA fragment libraries. Preparation of cDNA fragment libraries and RNA-Seq. For the non-directional runs and the directional runs, the Illumina Genomic DNA Sample Prep kit and the Illumina Directional mRNA-Seq Library Prep kit (Illumina, Inc., San Diego, CA) were used, respectively, according to the manufacturer’s protocol, to fragment RNA followed by phosphatase and polynucleotide kinase (PNK) treatment, ligations of 3’ and 5’ adapters, and reverse transcription of adapter-ligated RNA. Purified libraries were loaded onto independent flow cells; sequencing was carried out by running 32 cycles on the Illumina Genome Analyzer. 2. Methods for processing the data: RNA-Seq alignment and coverage. The 10403S finished genome was used to align Illumina RNA-Seq reads. These alignments were performed using the Burrows-Wheeler Aligner (BWA), which allowed up to 2 mismatches. Coverage at each base position along the chromosome was calculated by enumerating the number of reads that align to a given base. RNA-Seq normalization. The RNA-Seq raw output mapped to each annotated gene was normalized by the length of the genes and for the total number of sequenced reads in each run. The normalized coverage is expressed as RPKM (reads per kilobase of gene length per million reads). In order to determine the gene coverages in the replicates where the directional protocol was not used, the proportion of reads mapping to the sense strand in the directional-run replicate was applied to the total coverage in the non-directional replicate. Fold changes (FC) were calculated for each gene as the average normalized coverage between the two replicates for a given triple mutant (i.e., ΔCHL, ΔBHL, ΔBCL, and ΔBCH strains) divided by the average normalized coverage between the two replicates for the ΔBCHL strain. 3. Instrument- or software-specific information needed to interpret the data: Spreadsheets can be viewed using Microsoft Office Excel while the assembly file and the coverage files can be viewed as text or using a genome browser, such as Artemis (https://www.sanger.ac.uk/tool/artemis/). 4. Environmental/experimental conditions: Prior to RNA isolation, bacteria were grown to stationary phase (OD600 = 1.0 + 3 hours) in BHI media 5. Describe any quality-assurance procedures performed on the data: UV spectrophotometry (Nanodrop, Wilmington, DE) was used to quantify and assess purity of the RNA. Efficacy of the DNase treatment was assessed by TaqMan qPCR analysis of DNA levels of the housekeeping gene rpoB; all samples had DNA log copy numbers ≤ 1.5 copies per 10 ng of RNA and Ct values > 35 cycles indicating negligible levels of DNA contamination. RNA integrity was assessed using the 2100 Bioanalyzer (Agilent, Foster City, CA). After 16S and 23S rRNA removal, enriched mRNA samples were pooled and run on the 2100 Bioanalyzer (Agilent) to confirm reduction of 16S and 23S rRNA prior to preparation of cDNA fragment libraries. 6. People involved with sample collection, processing, analysis and/or submission: Renato H. Orsi; Soraya Chaturongakul; Haley F. Oliver; Lalit Ponnala DATA-SPECIFIC INFORMATION FOR: *.cov files 1. Number of variables: 1 2. Number of rows: 2,903,106 3. Variable List: The only variable in each file represents the number of reads matching to each nucleotide position (rows) in the 10403S assembly. 4. Missing data codes: No missing data. 5. Specialized formats or other abbreviations used: None DATA-SPECIFIC INFORMATION FOR: *.csv files 1. Number of variables: 59 2. Number of rows: RNA-Seq_whole_data.csv: 5001 rows gene_whole_data.csv: 2825 rows ncRNA_whole_data.csv: 96 rows TU_whole_data.csv: 1398 rows 3. Variable List: FEATURE: Genetic feature could a protein-coding gene, noncoding RNA (ncRNA), operon, rRNA, tRNA, or a transcription unit (TU) START: first nucleotide position of the genetic feature (left-most edge) according to the 10403S assembly STOP: last nucleotide position of the genetic feature (right-most edge) according to the 10403S assembly LENGTH: length of the genetic feature STRAND: Either "+" for genetic features encoded in the Forward strand, or "-" for for genetic features encoded in the Reverse strand 10403S_rep1_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the Wildtype (WT) strain (replicate 1 - directional run) 10403S_rep1_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the Wildtype (WT) strain (replicate 1 - directional run) 10403S_rep1_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the Wildtype (WT) strain (replicate 1 - directional run) 10403S_rep1_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the Wildtype (WT) strain (replicate 1 - directional run) 10403S_rep2_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the Wildtype (WT) strain (replicate 2 - directional run) 10403S_rep2_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the Wildtype (WT) strain (replicate 2 - directional run) 10403S_rep2_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the Wildtype (WT) strain (replicate 2 - directional run) 10403S_rep2_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the Wildtype (WT) strain (replicate 2 - directional run) DeltaBCHL_rep1_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaBCHL strain (replicate 1 - directional run) DeltaBCHL_rep1_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaBCHL strain (replicate 1 - directional run) DeltaBCHL_rep1_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaBCHL strain (replicate 1 - directional run) DeltaBCHL_rep1_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaBCHL strain (replicate 1 - directional run) DeltaBCHL_rep2_COUNT: Number of reads matching each genetic feature in the DeltaBCHL strain (replicate 2 - non-directional run) DeltaBCHL_rep2_RPKM: RPKM-normalized number of reads matching each genetic feature in the DeltaBCHL strain (replicate 2 - non-directional run) DeltaBCHL_rep2_pseudosense_COUNT: Number of reads estimated to match each genetic feature in the Forward strand in the DeltaBCHL strain (replicate 2 - non-directional run) DeltaBCHL_rep2_pseudosense_RPKM: RPKM-normalized number of reads estimated to match each genetic feature in the Forward strand in the DeltaBCHL strain (replicate 2 - non-directional run) DeltaBCHL_rep2_pseudoantisense_COUNT: Number of reads estimated to match each genetic feature in the Reverse strand in the DeltaBCHL strain (replicate 2 - non-directional run) DeltaBCHL_rep2_pseudoantisense_RPKM: RPKM-normalized number of reads estimated to match each genetic feature in the Reverse strand in the DeltaBCHL strain (replicate 2 - non-directional run) DeltaBCH_rep1_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaBCH strain (replicate 1 - directional run) DeltaBCH_rep1_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaBCH strain (replicate 1 - directional run) DeltaBCH_rep1_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaBCH strain (replicate 1 - directional run) DeltaBCH_rep1_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaBCH strain (replicate 1 - directional run) DeltaBCH_rep2_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaBCH strain (replicate 2 - directional run) DeltaBCH_rep2_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaBCH strain (replicate 2 - directional run) DeltaBCH_rep2_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaBCH strain (replicate 2 - directional run) DeltaBCH_rep2_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaBCH strain (replicate 2 - directional run) DeltaBCL_rep1_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaBCL strain (replicate 1 - directional run) DeltaBCL_rep1_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaBCL strain (replicate 1 - directional run) DeltaBCL_rep1_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaBCL strain (replicate 1 - directional run) DeltaBCL_rep1_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaBCL strain (replicate 1 - directional run) DeltaBCL_rep2_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaBCL strain (replicate 2 - directional run) DeltaBCL_rep2_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaBCL strain (replicate 2 - directional run) DeltaBCL_rep2_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaBCL strain (replicate 2 - directional run) DeltaBCL_rep2_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaBCL strain (replicate 2 - directional run) DeltaBHL_rep1_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaBHL strain (replicate 1 - directional run) DeltaBHL_rep1_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaBHL strain (replicate 1 - directional run) DeltaBHL_rep1_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaBHL strain (replicate 1 - directional run) DeltaBHL_rep1_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaBHL strain (replicate 1 - directional run) DeltaBHL_rep2_COUNT: Number of reads matching each genetic feature in the DeltaBHL strain (replicate 2 - non-directional run) DeltaBHL_rep2_RPKM: RPKM-normalized number of reads matching each genetic feature in the DeltaBHL strain (replicate 2 - non-directional run) DeltaBHL_rep2_pseudosense_COUNT: Number of reads estimated to match each genetic feature in the Forward strand in the DeltaBHL strain (replicate 2 - non-directional run) DeltaBHL_rep2_pseudosense_RPKM: RPKM-normalized number of reads estimated to match each genetic feature in the Forward strand in the DeltaBHL strain (replicate 2 - non-directional run) DeltaBHL_rep2_pseudoantisense_COUNT: Number of reads estimated to match each genetic feature in the Reverse strand in the DeltaBHL strain (replicate 2 - non-directional run) DeltaBHL_rep2_pseudoantisense_RPKM: RPKM-normalized number of reads estimated to match each genetic feature in the Reverse strand in the DeltaBHL strain (replicate 2 - non-directional run) DeltaCHL_rep1_sense_COUNT: Number of reads matching each genetic feature in the Forward strand in the DeltaCHL strain (replicate 1 - directional run) DeltaCHL_rep1_sense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Forward strand in the DeltaCHL strain (replicate 1 - directional run) DeltaCHL_rep1_antisense_COUNT: Number of reads matching each genetic feature in the Reverse strand in the DeltaCHL strain (replicate 1 - directional run) DeltaCHL_rep1_antisense_RPKM: RPKM-normalized number of reads matching each genetic feature in the Reverse strand in the DeltaCHL strain (replicate 1 - directional run) DeltaCHL_rep2_COUNT: Number of reads matching each genetic feature in the DeltaCHL strain (replicate 2 - non-directional run) DeltaCHL_rep2_RPKM: RPKM-normalized number of reads matching each genetic feature in the DeltaCHL strain (replicate 2 - non-directional run) DeltaCHL_rep2_pseudosense_COUNT: Number of reads estimated to match each genetic feature in the Forward strand in the DeltaCHL strain (replicate 2 - non-directional run) DeltaCHL_rep2_pseudosense_RPKM: RPKM-normalized number of reads estimated to match each genetic feature in the Forward strand in the DeltaCHL strain (replicate 2 - non-directional run) DeltaCHL_rep2_pseudoantisense_COUNT: Number of reads estimated to match each genetic feature in the Reverse strand in the DeltaCHL strain (replicate 2 - non-directional run) DeltaCHL_rep2_pseudoantisense_RPKM: RPKM-normalized number of reads estimated to match each genetic feature in the Reverse strand in the DeltaCHL strain (replicate 2 - non-directional run) 4. Missing data codes: No missing data. 5. Specialized formats or other abbreviations used: None