INTEGRATING GRAPH AND LANGUAGE DATA IN MACHINE LEARNING MODELS: APPLICATIONS FOR COMPUTATIONAL SOCIAL SCIENCE
MetadataShow full item record
Ruch, Alexander Martin
This dissertation presents three papers demonstrating how integrating graph (network) and language (text) data in machine learning models can enhance computational social science models. These two types of data are ubiquitous across many contexts in which computational social scientists work (e.g., social media platforms, online markets, and the Web as a whole). Relatively little research has analyzed how to model network and text data together at scale, partly since models for these data are often computationally expensive but also because statistical models for them require expert-driven decisions on feature engineering and how they are related within models. The first paper in this dissertation combines node and text embeddings in a downstream classification model to study mental health dynamics on Reddit. The second paper cascades knowledge graph classifications to a text clustering model to study how demographic confounding causes extreme instances of lifestyle politics using aggregated Facebook interest data. Finally, the third paper uses graph and language data from Amazon to study the spread of political and lifestyle polarization in the large online market and tests how network and morality features explain the presence of lifestyle polarization. Together, the three studies show how integrating graph and language data in machine learning models can facilitate computational social science not only by improving such models’ power, efficiency, and ease of use but also by allowing us to test new hypotheses and explain black box models. The conclusion contextualizes findings for academia and industry.
Computational Social Science; Graphs; Machine Learning; Natural Language Processing; Networks; Text Analysis
Macy, Michael W.
Mimno, David; Gilovich, Tom
Ph. D., Sociology
Doctor of Philosophy
Attribution 4.0 International
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution 4.0 International