Representation Learning For Sequence And Comparison Data

Chen, Shuo

Representation Learning For Sequence And Comparison Data

Files

sc2247.pdf (1.64 MB)

Permanent Link(s)

https://hdl.handle.net/1813/43697

Collections

Cornell Theses and Dissertations

Full item page

Author(s)

Chen, Shuo

Abstract

The core idea of representation learning is to learn semantically more meaningful features (usually represented by a vector or vectors for each data point) from the dataset, so that they contain more discriminative information and make the given prediction task easier. It often provides better generalization performance and data visualization. In this thesis work, we improve the foundation and practice of representation learning methods for two types of data, namely sequences and comparisons: 1. Using music playlist data as an example, we propose Logistic Markov Embedding method that learns from sequence of songs and yields vectorized representations of songs. We demonstrate its better generalization performance in predicting the next song to play in a coherent playlist, as well as its capability in producing meaningful visualization for songs. We also propose an accompanying scalable training method that can be easily parallelized for learning representations on sequences. 2. Motivated by modeling intransitivity (rock-paper-scissors relation) in competitive matchup (two-player games or sports) data, we propose the blade-chest model for learning vectorized representations of players. It is then extended to a general framework that predicts the outcome of pairwise comparisons, making use of both object and context features. We see its successful application in matchup and preference prediction. The two lines of works have the same underlying theme: the object we study is first represented by a parameter vector or vectors, which are used to explain the interac- tions in the proposed models. These parameter vectors are learned by training on the datasets that contain interactions. The learned vectors can be used to predict any future interaction by simply plugging them back into the proposed models. Also, when the dimensionality of the vector is small (e.g. 2), plotting them gives interesting insight into the data.

Date Issued

2016-02-01

Keywords

Representation learning; Sequence; Pairwise comparison

Committee Chair

Joachims,Thorsten

Committee Member

Van Loan,Charles Francis
Bindel,David S.

Degree Discipline

Computer Science

Degree Name

Ph. D., Computer Science

Degree Level

Doctor of Philosophy

Types

dissertation or thesis

Representation Learning For Sequence And Comparison Data

Files

No Access Until

Permanent Link(s)

Collections

Other Titles

Author(s)

Abstract

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record