Learning to Represent and Recognize Multimodal Videos
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
In today's digital landscape, the staggering growth of video resources has resulted in a wealth of visual, auditory, and textual information readily available on the internet. To fully harness the potential of learning from multimodal videos, it is crucial to develop efficient techniques for processing and analyzing this information. This dissertation delves into two primary areas: label-efficient representation learning from videos and multimodal video recognition. We leverage the advances in large-scale deep learning to fully exploit the potential of internet videos. Label-efficient representation learning from videos is motivated by the fact that internet videos often lack high-quality labels. We first propose a contrastive learning-based framework to learn features in an unsupervised manner. This approach pulls together feature representations from the same video and pushes apart those from different videos. We then investigate how to learn more refined temporal features for time-sensitive tasks, such as temporal event classification and detection, and more precise spatial features for location-aware tasks, including tracking and detection. Lastly, we explore the use of stronger transformer backbones to integrate multimodal data and form a unified, robust representation. Regarding multimodal recognition for videos, we initially examine the challenging case of fine-grained video recognition to determine if different modalities can assist each other in ambiguous scenarios. We create a new expert-curated audiovisual benchmark and discover that multimodal fusion significantly benefits performance. We subsequently investigate open-vocabulary multimodal video recognition and propose a framework that can effectively classify videos from any given category.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Lee, Clarence