Show simple item record

dc.contributor.authorSandler, Mark Moiseevich
dc.date.accessioned2006-07-28T20:54:45Z
dc.date.available2006-07-28T20:54:45Z
dc.date.issued2006-07-28T20:54:45Z
dc.identifier.otherbibid: 6475857
dc.identifier.urihttps://hdl.handle.net/1813/3386
dc.description.abstractMixture models form one of the most fundamental classes of generative models for clustered data. Specific application examples include text classification problems, image segmentation and motion detection, collaborative filtering and many others. However, quite surprisingly, very little had been known about algorithms which have provable performance guarantees within the framework of mixture models. This is the topic we study in this work. Our contribution is twofold. First, for the canonical problem of separating mixtures of continuous distributions in the high-dimensional Euclidean space, we provide the first algorithm that can learn distributions with heavy tails, including those with infinite variance and expectation. We formulate necessary conditions and provide an algorithm which guarantees that the underlying mixture model can be learned by observing only polynomially many samples. We also show that for many classes of distributions, our separation conditions are necessary for {\em any} algorithm which guarantees accurate reconstruction. Second for the case of \emph{discrete mixture models} we give an efficient polynomial time algorithm with provable performance guarantees. Recasting of our algorithm for the text classification problem immediately results in a very fast unsupervised learning method, with an excellent classification accuracy.en_US
dc.format.extent2113545 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_USen_US
dc.subjectMixture Modelsen_US
dc.subjectCollaborative Filteringen_US
dc.subjectTheoretical Computer Scienceen_US
dc.titleAlgorithms for Mixture Modelsen_US
dc.typedissertation or thesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Statistics