Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. INTEGRATING GRAPH AND LANGUAGE DATA IN MACHINE LEARNING MODELS: APPLICATIONS FOR COMPUTATIONAL SOCIAL SCIENCE

INTEGRATING GRAPH AND LANGUAGE DATA IN MACHINE LEARNING MODELS: APPLICATIONS FOR COMPUTATIONAL SOCIAL SCIENCE

File(s)
Ruch_cornellgrad_0058F_12453.pdf (3.8 MB)
Permanent Link(s)
https://doi.org/10.7298/pyqe-xp82
https://hdl.handle.net/1813/109793
Collections
Cornell Theses and Dissertations
Author
Ruch, Alexander Martin
Abstract

This dissertation presents three papers demonstrating how integrating graph (network) and language (text) data in machine learning models can enhance computational social science models. These two types of data are ubiquitous across many contexts in which computational social scientists work (e.g., social media platforms, online markets, and the Web as a whole). Relatively little research has analyzed how to model network and text data together at scale, partly since models for these data are often computationally expensive but also because statistical models for them require expert-driven decisions on feature engineering and how they are related within models. The first paper in this dissertation combines node and text embeddings in a downstream classification model to study mental health dynamics on Reddit. The second paper cascades knowledge graph classifications to a text clustering model to study how demographic confounding causes extreme instances of lifestyle politics using aggregated Facebook interest data. Finally, the third paper uses graph and language data from Amazon to study the spread of political and lifestyle polarization in the large online market and tests how network and morality features explain the presence of lifestyle polarization. Together, the three studies show how integrating graph and language data in machine learning models can facilitate computational social science not only by improving such models’ power, efficiency, and ease of use but also by allowing us to test new hypotheses and explain black box models. The conclusion contextualizes findings for academia and industry.

Description
173 pages
Date Issued
2021-05
Keywords
Computational Social Science
•
Graphs
•
Machine Learning
•
Natural Language Processing
•
Networks
•
Text Analysis
Committee Chair
Macy, Michael W.
Committee Member
Mimno, David
Gilovich, Tom
Degree Discipline
Sociology
Degree Name
Ph. D., Sociology
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/15049513

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance