Combining Reasoning and Learning for Multi-stage Inference in Computational Sustainability and Scientific Discovery
Problems at the intersection of reasoning, optimization, and learning often involve multi-stage inference. For example, making decisions based on machine learning models often leads to multi-stage inference problems where probabilistic models learned from data are embedded as first-stage subproblems within a global second-stage problem for decision-making. Multi-agent reasoning also involves multi-stage inference, since the reasoning of any given agent has to incorporate the goals of the other agents. As a result, the decision processes of the other agents are embedded as first-stage subproblems within the overall decision-making problem of that agent. With formal complexities beyond NP, multi-stage inference problems are often highly intractable. In this thesis, I introduce a novel computational framework, based on embeddings, to tackle multi-stage inference problems. Our embedding technique approximates the intractable sub-problems of a multi-stage inference problem through a series of novel representations. We then embed these representations into the global problem, effectively reducing a multi-stage inference to a single-stage inference. As one example, I present a novel way to encode the reward allocation problem for a two-stage organizer--agent game as a single-stage optimization. The encoding embeds an approximation of the agents' decision-making process into the organizer's problem in the form of linear constraints. We apply this methodology to eBird, a well-established citizen-science program for collecting bird observations, in a game called Avicaching. Our AI-based reward allocation was shown to be highly effective, surpassing the expectations of the eBird organizers and bird conservation experts. As another example, I present a novel constant approximation algorithm to solve stochastic optimization problems which identify the optimal policy that maximizes the expectation of a stochastic objective. To tackle this problem, I propose the embedding of its intractable counting subproblems as queries to NP oracles subject to additional XOR constraints. As a result, the entire problem is encoded as a single NP-equivalent optimization. This approach outperforms state-of-the-art solvers based on variational inference and MCMC sampling, on probabilistic inference benchmarks, deep learning applications, and a novel decision-making application in network design for wildlife conservation. I also apply the embedding technique to automated reasoning and machine learning for dimensionality reduction in scientific discovery. As one example, I propose the use of embeddings based on Fourier analysis as a compact representation of high-dimensional probability distributions. As a second example, I show how human computation, crowdsourcing, and parallel computation can identify key backdoor information, thereby drastically reducing the computation time from days to minutes in a dimensionality reduction application with complex physical constraints. Our novel integration of reasoning and learning has led to the discovery of new solar light absorbers by solving a dimensionality reduction problem to characterize the crystal structure of metal oxide materials using X-ray diffraction data.
Gomes, Carla P.; Joachims, Thorsten; Hopcroft, John E.
Ph. D., Computer Science
Doctor of Philosophy
Attribution-NonCommercial 4.0 International
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution-NonCommercial 4.0 International