JavaScript is disabled for your browser. Some features of this site may not work without it.
Sub-Linear Algorithms For Non-Homogeneous Large Alphabet Source Classification

Author
Xu, Yang
Abstract
Suppose we have several unknown distributions the same discrete countable sample space, namely, {1, 2, 3, 4, ..., n}. and given sequences of samples generated i.i.d from one of the distributions, where the sequence length is smaller than n, known as the sparse sample case. One interesting fundamental question we want to ask is to figure out which distribution the sequence is generated from. Can be viewed as a supervised classification problem using generic model in machine learning. In this thesis, we formulate the problem in an asymptotic way and study the existing algorithms on homogeneous classification problem and closeness testing problem, and extend it to a classification algorithm, mixed 2 distance classifier, using O(n 3 ). Details and theorems of performance guarantees on some specific class of i.i.d distributions is proved in Chapter2. In following chapters we give the performance tables and figures when implementing this idea on synthetic data and real text datasets and outperforms in some of them.
Date Issued
2015-08-17Subject
Classification; Large Alphabet; Learning Theories
Committee Chair
Wagner,Aaron B.
Committee Member
Doerschuk,Peter
Degree Discipline
Electrical Engineering
Degree Name
M.S., Electrical Engineering
Degree Level
Master of Science
Type
dissertation or thesis