JavaScript is disabled for your browser. Some features of this site may not work without it.
Getting the Most Out of Your Data: Multitask Bayesian Network Structure Learning, Predicting Good Probabilities and Ensemble Selection

Author
Niculescu-Mizil, Alexandru
Abstract
First, I consider the problem of simultaneously learning the
structures of multiple Bayesian networks from multiple related
datasets. I present a multitask Bayes net structure learning
algorithm that is able to learn more accurate network structures by
transferring useful information between the datasets. The algorithm
extends the score and search techniques used in traditional structure
learning to the multitask case by defining a scoring function for
sets of structures (one structure for each task) and an
efficient procedure for searching for a high scoring set of
structures. I also address the task selection problem in the context
of multitask Bayes net structure learning. Unlike in other multitask
learning scenarios, in the Bayes net structure learning setting there
is a clear definition of task relatedness: two tasks are related if
they have similar structures. This allows one to automatically select
a set of related tasks to be used by multitask structure learning.
Second, I examine the relationship between the predictions made by
different supervised learning algorithms and true posterior
probabilities. I show that quasi-maximum margin methods such as
boosted decision trees and SVMs push probability mass away from 0 and
1 yielding a characteristic sigmoid shaped distortion in the predicted
probabilities. Naive Bayes pushes probabilities toward 0 and 1. Other
models such as neural nets, logistic regression and bagged trees
usually do not have these biases and predict well calibrated
probabilities. I experiment with two ways of correcting the biased
probabilities predicted by some learning methods: Platt Scaling and
Isotonic Regression. I qualitatively examine what distortions these
calibration methods are suitable for and quantitatively examine how
much data they need to be effective.
Third, I present a method for constructing ensembles from libraries of
thousands of models. Model libraries are generated using different
learning algorithms and parameter settings. Forward stepwise
selection is used to add to the ensemble the models that maximize its
performance. The main drawback of ensemble selection is that it
builds models that are very large and slow at test time. This
drawback, however, can be overcome with little or no loss in
performance by using model compression.
Sponsorship
The work in this dissertation was supported by NSF grants 0347318, 0412930, 0427914, and 0612031.
Date Issued
2008-07-30Subject
transfer learning; Bayesian network structure learning; probability calibration; ensemble learning
Type
dissertation or thesis