JavaScript is disabled for your browser. Some features of this site may not work without it.
Modeling Additive Structure and Detecting Interactions with Groves of Trees

Author
Sorokina, Daria
Abstract
Discovery of additive structure is an important step towards understanding a complex multi-dimensional function, because it allows for expressing this function as the sum of lower-dimensional or otherwise simpler components. Modeling additive structure also opens up opportunities for learning better regression models.
The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers.
This dissertation analyzes benefits of modelling additive structure for prediction and interaction detection problems. It describes a new learning algorithm called Groves, which is an ensemble of additive regression trees. Groves is based on such existing techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. Regression version of the algorithm, Additive Groves, and its classification counterpart, Gradient Groves, yield consistently high performance across a variety of problems, outperforming on average a large number of other algorithms.
Additive nature of Groves makes it particularly useful for interaction detection. This dissertation introduces a new approach to interaction detection: it is based on comparing the performance of restricted and unrestricted predictive models. Groves of trees allow variable interactions to be carefully controlled and therefore are especially useful for this framework.
The details of proposed practical approach to interaction detection analysis are demonstrated on real data describing the abundance of different species of birds in the prairies east of the southern Rocky Mountains.
Date Issued
2008-07-30Subject
Computer Science; Machine Learning; Data Mining; Statistical Interactions; Additive Groves; Gradient Groves; Ecology; Ornithology
Type
dissertation or thesis