Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Contributions to Fairness and Transparency

Contributions to Fairness and Transparency

File(s)
Baer_cornellgrad_0058F_12623.pdf (1.44 MB)
Permanent Link(s)
https://doi.org/10.7298/c6tm-bc44
https://hdl.handle.net/1813/110502
Collections
Cornell Theses and Dissertations
Author
Baer, Benjamin R.
Abstract

This dissertation presents three varied topics. The first topic concerns text mining. GloVe and Skip-gram word embedding methods learn word vectors by decomposing a denoised matrix of word co-occurrences into a low-rank matrix. In this work, we propose an iterative algorithm for computing word vectors based on modeling word co-occurrence matrices with Symmetric Generalized Low Rank Models. Our algorithm generalizes both Skip-gram and GloVe as well as giving rise to other embedding methods based on the specified co-occurrence matrix, distribution of co-occurences, and the number of iterations in the iterative algorithm. For example, using a Tweedie distribution with one iteration results in GloVe and using a Multinomial distribution with full-convergence mode results in Skip-gram. The second topic concerns algorithmic fairness. A substantial portion of the literature on fairness in algorithms proposes, analyzes, and operationalizes simple formulaic criteria for assessing fairness. Two of these criteria, Equalized Odds and Calibration by Group, have gained significant attention for their simplicity and intuitive appeal, but also for their incompatibility. This chapter provides a perspective on the meaning and consequences of these and other fairness criteria using graphical models which reveals Equalized Odds and related criteria to be ultimately misleading. An assessment of various graphical models suggests that fairness criteria should ultimately be case-specific and sensitive to the nature of the information the algorithm processes. The third topic concerns the fragility index. In recent years there has been a renewed conversation concerning interpretable and proper techniques for statistical hypothesis testing. In the medical literature on clinical trials, the count of patients who must have a different outcome to reverse statistical significance in a 2 by 2 contingency table (the fragility index) has been proposed as a more interpretable supplement to classical p value based testing. We studied the sampling distribution of the fragility index and created a sample size calculation strategy which simultaneously designs for p values and fragility indices. Then, we extended the fragility index to only incorporate sufficiently likely outcome modifications. Next, we redefined what it means for an outcome modification to be sufficiently likely and studied a variant of the fragility index tailored for patients who are lost to follow up. Finally, we generalized the fragility index to any data type and any statistical test.

Description
249 pages
Date Issued
2021-08
Keywords
algorithmic fairness
•
fragility index
•
hypothesis testing
•
interpretability
•
natural language processing
•
word embeddings
Committee Chair
Wells, Martin Timothy
Committee Member
Basu, Sumanta
Booth, James
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/15160084

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance