Sub-Linear Algorithms For Non-Homogeneous Large Alphabet Source Classification

Other Titles
Authors
Abstract
Suppose we have several unknown distributions the same discrete countable sample space, namely, {1, 2, 3, 4, ..., n}. and given sequences of samples generated i.i.d from one of the distributions, where the sequence length is smaller than n, known as the sparse sample case. One interesting fundamental question we want to ask is to figure out which distribution the sequence is generated from. Can be viewed as a supervised classification problem using generic model in machine learning. In this thesis, we formulate the problem in an asymptotic way and study the existing algorithms on homogeneous classification problem and closeness testing problem, and extend it to a classification algorithm, mixed 2 distance classifier, using O(n 3 ). Details and theorems of performance guarantees on some specific class of i.i.d distributions is proved in Chapter2. In following chapters we give the performance tables and figures when implementing this idea on synthetic data and real text datasets and outperforms in some of them.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
2015-08-17
Publisher
Keywords
Classification; Large Alphabet; Learning Theories
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Wagner,Aaron B.
Committee Co-Chair
Committee Member
Doerschuk,Peter
Degree Discipline
Electrical Engineering
Degree Name
M.S., Electrical Engineering
Degree Level
Master of Science
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record