JavaScript is disabled for your browser. Some features of this site may not work without it.
Mixtures of Exponential Distributions to Describe the Distribution of Poisson Means in Estimating the Number of Unobserved Classes

Author
Barger, Kathryn Jo-Anne
Abstract
In many fields of study scientists are interested in estimating the
number of unobserved classes. A biologist may want to find the
number of rare species of an animal population in order to conserve,
manage, and monitor biodiversity; a library manager may want to know
how many non-circulating items are present in a library system; or a
clinical investigator may want to determine the number of unseen
disease occurrences. A traditional way of estimating an unknown
number of classes is by using a negative binomial model (Fisher,
Corbet, and Williams 1943). The negative binomial model is based on
assuming that the numbers of individuals from each class are
independent Poisson samples, and that the means of these Poisson
random variables follow a Gamma distribution. This thesis proposes
a parametric model where the law of the mean frequency of classes is
a finite mixture of exponential distributions. The proposed model
has the following advantages: model simplicity, efficient
computation using the EM algorithm, and a straightforward
interpretation of the fitted model. Also, model assessment by way
of a chi-squared goodness of fit procedure can be used, a benefit
this parametric model has over other commonly used non-parametric
methods.
A main accomplishment of this thesis is providing an efficient
computation of maximum likelihood (ML) estimates for the proposed
model. Without use of the EM algorithm, finding ML estimates for
this model can be difficult and time consuming. The likelihood
function is complicated due to high dimensionality and
non-identifiability of certain parameters. Within the M step of our
algorithm we embed another EM, which can effortlessly maximize the
parameters in the finite mixture. We refer to the algorithm as a
nested EM. Aitken's acceleration is used to increase speed of the
algorithm.
Microbial samples from the coast of Massachusetts Bay near Nahant,
Massachusetts are used to demonstrate data analysis using three
different numbers of components in the finite mixture of the model
described. It is shown that the model produces reasonable estimates
and fits the data satisfactorily. This model has recently been
premiered in species richness estimation (Hong et al. 2006),
and its many advantages show promise for continued use
in estimating the number of unobserved classes.
Date Issued
2006-05-04Subject
EM algorithm; finite mixtures; species richness; Aitken's acceleration; microorganisms
Type
dissertation or thesis