Probabilistic and Generative Models for Causality and Decision-Making
Probabilistic models are used to produce estimates of uncertainty over the value of the predicted outcome. Accurate estimation of uncertainty is important when using the output of probabilistic models for making critical decisions. For example, if we use a probabilistic model to predict the likelihood of developing heart disease in an individual based on their personal medical information, accurately predicted likelihood can be used to decide whether they should undergo certain medical investigation. Since the medical investigation may be expensive and painful for the patient, we would like to make sure that we are selecting all the patients that are truly at a higher risk of developing heart disease. Hence, the reliability of uncertainty predicted by a probabilistic model is crucial in safety-critical applications. In this thesis, we introduce simple techniques to ensure different kinds of calibration guarantees on uncertainty in discrete and continuous outcomes as predicted by deep probabilistic models, thus improving their reliability and accuracy. We show improved sequential decision-making under uncertainty by enforcing simple calibration guarantees on the uncertainty of the probabilistic outcome model that guides the decision-making process. By balancing exploration and exploitation decisions in the case of Bayesian optimization, we demonstrate faster discovery of unknown optima in black-box functions using calibrated uncertainty estimates. Additionally, we explore calibrated uncertainty estimation in the more challenging online setting where data does not follow a fixed distribution, necessitating novel techniques that account for worst case deviations in the distribution of data that arrives sequentially. The estimation of the genetic risk for a given disease depends on accurate identification of causal effects corresponding to genetic variants. Hidden confounding between a potentially causal factor and the outcome of interest makes the task of causal effect estimation challenging. For example, we can assess the impact of smoking on the risk of heart disease if we observe all possible factors that affect the action of taking up smoking and the risk of heart disease. However, it is difficult to guarantee that all confounders were observed. Modern health records and biobanks contain large amount of multimodal, unstructured information including genomic sequences, images and text, which can be used to extract potential confounders. This data is unused in classical causal inference pipelines due to nonlinear, complex relationships between high-dimensional covariates together with possible missing modalities. We propose deep generative models to incorporate these unstructured modalities into causal inference, so that we can extract useful signal contained within this rich source of information to estimate the causal impact of actions like smoking on the risk of heart disease. If all the confounders are observed, the individual probability of receiving treatment given those confounders can be used to perform causal effect estimation using observational dataset. We propose uncertainty calibration of models that predict this probability and demonstrate the effectiveness of such causal effect estimation techniques on high-dimensional Genome-Wide Association Studies.