This readme file was generated on 2024-07-01 by Connor Mitchell GENERAL INFORMATION Title of Dataset: Analyzing Americans with Disabilities Act Judicial Opinions Using Latent Dirichlet Allocation Author Information Name: Connor Mitchell ORICD: 0009-0005-9145-129X Email: connormitchell80@gmail.com Thesis Advisor Name: Matt Saleh Email: mcs378@cornell.edu Date of data collection: 2023-09-16 SHARING/ACCESS INFORMATION Licenses/restrictions placed on the thesis: CC BY 4.0 - https://creativecommons.org/licenses/by/4.0/legalcode.en License/restrictions placed on this dataset: GNU GPLv3 - https://choosealicense.com/licenses/gpl-3.0/ SUGGESTED CITATIONS Thesis: Mitchell, Connor. “Analyzing Americans with Disabilities Act Judicial Opinions Using Latent Dirichlet Allocation.” Undergraduate Honors Thesis, Cornell University, 2024. Dataset: Mitchell, Connor. “Analyzing Americans with Disabilities Act Judicial Opinions Using Latent Dirichlet Allocation.” Cornell University Library eCommons, 2024. doi:10.7298/831p-dx97. DATA & FILE OVERVIEW File List: CaseCitations.txt - Complete list of cases analyzed as part of this project. /code/Mitchell_ADALDA24_OpinionDocToText.py - Converts data received as .doc file into separate .txt files for further processing. /code/Mitchell_ADALDA24_StopwordTFIDF.R - Creates a list of stopwords using TF-IDF /code/Mitchell_ADALDA24_TTests.R - Runs student T-Test to determine the relationship between topic probabilities and the court level (district or appellate). /code/Mitchell_ADALDA24_NestedMixedEffects.R - Runs mixed effects regressions between the date of publishing and venue case was heard at. Then creates violin plots for each regression. /code/Mitchell_ADALDA24_ExtractMetadata.py - Extracts venue and date information from court opinion .txt files and exports to a .csv Each folder in /data/ contains the statistical results for each ngram test as described in thesis. These results include the results of the t-test, mixed effects regression, topic words, and doc-topic probabilities. /data/Mitchell_ADALDA24_ExcelWork.csv is the .csv file used in the scripts listed above. It is a file outputted from MALLET. Each row is a court case processed by MALLET. Each column starting at Column H is the most probable topic for that document followed by its probability. These columns lack a heading due to how the data is outputted by MALLET. The columns before H are appropriately titled. METHODOLOGICAL INFORMATION Cases were downloaded as .docs, each containing one hundred cases. Cases were split into individual .txt files for further processing with MALLET. The cases themselves are not included due to copyright restrictions. MALLET returned topic probabilities for each .txt file which were compiled into Mitchell_ADALDA24_ExcelWork.csv. File system information was redacted from data and code using "[...]". As such, locally edit my code to match your filesystem as needed.