Mixtures of Exponential Distributions to Describe the Distribution of Poisson Means in Estimating the Number of Unobserved Classes
Barger, Kathryn Jo-Anne
In many fields of study scientists are interested in estimating the number of unobserved classes. A biologist may want to find the number of rare species of an animal population in order to conserve, manage, and monitor biodiversity; a library manager may want to know how many non-circulating items are present in a library system; or a clinical investigator may want to determine the number of unseen disease occurrences. A traditional way of estimating an unknown number of classes is by using a negative binomial model (Fisher, Corbet, and Williams 1943). The negative binomial model is based on assuming that the numbers of individuals from each class are independent Poisson samples, and that the means of these Poisson random variables follow a Gamma distribution. This thesis proposes a parametric model where the law of the mean frequency of classes is a finite mixture of exponential distributions. The proposed model has the following advantages: model simplicity, efficient computation using the EM algorithm, and a straightforward interpretation of the fitted model. Also, model assessment by way of a chi-squared goodness of fit procedure can be used, a benefit this parametric model has over other commonly used non-parametric methods. A main accomplishment of this thesis is providing an efficient computation of maximum likelihood (ML) estimates for the proposed model. Without use of the EM algorithm, finding ML estimates for this model can be difficult and time consuming. The likelihood function is complicated due to high dimensionality and non-identifiability of certain parameters. Within the M step of our algorithm we embed another EM, which can effortlessly maximize the parameters in the finite mixture. We refer to the algorithm as a nested EM. Aitken's acceleration is used to increase speed of the algorithm. Microbial samples from the coast of Massachusetts Bay near Nahant, Massachusetts are used to demonstrate data analysis using three different numbers of components in the finite mixture of the model described. It is shown that the model produces reasonable estimates and fits the data satisfactorily. This model has recently been premiered in species richness estimation (Hong et al. 2006), and its many advantages show promise for continued use in estimating the number of unobserved classes.
EM algorithm; finite mixtures; species richness; Aitken's acceleration; microorganisms
dissertation or thesis