This readme file was generated on 2023-08-07 by Shanthanu Krishna Kumar GENERAL INFORMATION Title of Dataset: Krishna Kumar 2023 PhD Thesis Supplementary Tables Author/Principal Investigator Information Name: Shanthanu Krishna Kumar ORCID: 0000-0001-5439-4621 Institution: Cornell University Address: 155 Plant Science Email: sk3256@cornell.edu Author/Associate or Co-investigator Information Name: Gregory Peck ORCID: 0000-0003-2131-8950 Institution:Cornell University Address: 132 Plant Science Email: gmp32@cornell.edu Date of data collection: The field experiment was conducted in 2021 and data was processed in 2022. Geographic location of data collection: Cornell Orchards, Ithaca, NY, 14850. Information about funding sources that supported the collection of the data: NY Hatch Grant, Cornell Atkinson Grant. SHARING/ACCESS INFORMATION Licenses/restrictions placed on the data: No restrictions Recommended citation for this dataset: Krishna Kumar, S., Fei Z., Peck, G.M. Supplementary files for the crop load experiment. Cornell University Library eCommons Digital Repository. https://doi.org/10.7298/kksr-7928 DATA & FILE OVERVIEW File List:1. KKumar_2023PhD_SuppTab1_DEGs.xlsx 2. KKumar_2023PhD_SuppTab_2_EnrichedGenes.xlsx 3. KKumar_2023PhD_SupplementaryFiles_ArchivalBundle.zip METHODOLOGICAL INFORMATION Description of methods used for collection/generation of data: Please refer to the methods section of the crop load chapter in the dissertation submitted by Shanthanu Krishna Kumar 2023. These data files are a supplement in support of a thesis with the following abstract: Methods for processing the data: Pooled libraries were sequenced using HiSeqX 150 bp Pair End sequencing (Psomagen Inc, Rockville MD). Raw RNA-Seq reads were processed to remove adaptors and low-quality sequences using Trimmomatic (version 0.36; Bolger et al. 2014) with parameters ‘SLIDINGWINDOW:4:20 LEADING:3 TRAILING:3 MINLEN:40’ and to remove polyA/T tails using PRINSEQ++ [v1.2; (Cantu et. al. 2019) with parameters ‘-min_len 40 -trim_tail_left 10 -trim_tail_right 10’]. The remaining cleaned reads were aligned to the ribosomal RNA database (Quast et al. 2013) using Bowtie (version 1.1.2; Langmead, 2010) allowing up to three mismatches, and those aligned were discarded. The final cleaned reads were aligned to the ‘Golden Delicious’ double haploid (GDDH13) genome (v1.1; Daccord et al. 2017) using HISAT2 (version 2.1.0; Kim et al. 2019) with default parameters. Based on the alignments, raw read counts for each gene were calculated and then normalized to fragments per kilobase of exon model per million mapped fragments (FPKM). Raw read counts were then fed to DESeq2 to identify differentially expressed genes (DEGs) using a cutoff of adjusted P value < 0.05 and fold change ≥ 2. Gene ontology terms enriched in the lists of genes were identified using Blast2GO (Conesa et al. 2005) with a cutoff of adjusted P value < 0.05. Transcription factors were identified by using the blast tool in the genome database for Rosaceae and homolog comparisons with the Arabidopsis genome (Sook et al. 2018). DATA-SPECIFIC INFORMATION FOR: KKumar_2023PhD_SuppTab1_DEGs.xlsx This file provides the differentially expressed genes in the experiment at the various time points measured - 27, 81, and 160 Days After Full Bloom for flesh and peel tissue separately. There are two treatments - Unthinned Control (UTC) and Low crop load. The columns are the Gene ID, Name of Gene, Mean of UTC gene expression, Mean of Low crop load gene expression, the ratio, and the adjusted p value Specialized formats or other abbreviations used: DAFB - Days after full bloom DATA-SPECIFIC INFORMATION FOR: KKumar_2023PhD_SuppTab_2_Enrichedgenes.xlsx This file provides the Enriched genes in the experiment at the various time points measured - 27, 81, and 160 Days After Full Bloom. It provides separate lissts for up- down- and entire lists for each of the time points for peel and flesh tissue separately There are two treatments - Unthinned Control (UTC) and Low crop load. The columns are the GO ID, GO Name, GO Category, FDR, P-Value, Nr Test, Nr Reference, Non Annot Test, Non Annot Reference, TestSet Sequences DATA-SPECIFIC INFORMATION FOR: KKumar_2023PhD_SupplementaryFiles_ArchivalBundle.zip This bundle contains comma separate values (CSV) versions both the above .xlsx data files. Specialized formats or other abbreviations used: DAFB - Days after full bloom, GO - Gene Ontology, FDR - False Discovery Rate, Annot - Annotated.