Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Sub-Linear Algorithms For Non-Homogeneous Large Alphabet Source Classification

Sub-Linear Algorithms For Non-Homogeneous Large Alphabet Source Classification

File(s)
yx287.pdf (345.1 KB)
Permanent Link(s)
https://hdl.handle.net/1813/41179
Collections
Cornell Theses and Dissertations
Author
Xu, Yang
Abstract

Suppose we have several unknown distributions the same discrete countable sample space, namely, {1, 2, 3, 4, ..., n}. and given sequences of samples generated i.i.d from one of the distributions, where the sequence length is smaller than n, known as the sparse sample case. One interesting fundamental question we want to ask is to figure out which distribution the sequence is generated from. Can be viewed as a supervised classification problem using generic model in machine learning. In this thesis, we formulate the problem in an asymptotic way and study the existing algorithms on homogeneous classification problem and closeness testing problem, and extend it to a classification algorithm, mixed 2 distance classifier, using O(n 3 ). Details and theorems of performance guarantees on some specific class of i.i.d distributions is proved in Chapter2. In following chapters we give the performance tables and figures when implementing this idea on synthetic data and real text datasets and outperforms in some of them.

Date Issued
2015-08-17
Keywords
Classification
•
Large Alphabet
•
Learning Theories
Committee Chair
Wagner,Aaron B.
Committee Member
Doerschuk,Peter
Degree Discipline
Electrical Engineering
Degree Name
M.S., Electrical Engineering
Degree Level
Master of Science
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance