Enhanced Representations and Efficient Analysis of Syntactic Dependencies Within and Beyond Tree Structures

Other Titles
Abstract
As a fundamental task in natural language processing, dependency-based syntactic analysis provides useful structural representations of textual data. It is supported by an abundance of multilingual annotations and statistical parsers. A common representation format widely adopted by contemporary computational dependency-based syntactic analysis is single-rooted directed trees, where each edge represents a dependency relation. These governor-dependent relations capture bilexical syntactic modifications and facilitate efficient parsing algorithms that break down the analysis of the whole trees into identifications of individual dependency edges. However, it is known that edge-focused dependency-tree representations face practical challenges to properly handle certain linguistic phenomena involving multiple dependency edges, such as valency patterns and certain types of multi-word expressions. Further, dependency tree structures fall short in explicitly representing coordination structures, argument sharing in control and raising constructions, and so on. This thesis aims at addressing the aforementioned issues and improving dependencybased syntactic analysis via augmented and enhanced representations within and beyond tree structures, which involves new challenges in the designs of computational models, learning regimes from empirical data, and inferencing procedures to derive the desired structures. To guide parsers to consider wider structural contexts and to recognize linguistic constructions as a whole, in addition to predicting individual dependency relations, this thesis introduces two parser designs that combine parsing and tagging modules. In the first parser, taggers are trained to predict valency patterns, which encode the number, types, and linear orderings of each word’s dependent syntactic relations (e.g., a transitive verb in English has a subject to its left and a direct object to its right). This method is demonstrated to improve precision on the selected subsets of dependency relations used in the valency patterns. The second effort focuses on headless multi-word expressions (MWEs), which are typically identified with taggers, when full syntactic analysis is not required. By integrating a tagging view of the MWEs into decoding processes, the parsers become more accurate in MWE identification. Certain syntactic constructions, such as coordination, pose extra representational challenges for dependency trees, and this thesis explores two types of enhanced structures beyond dependency trees and presents methods to analyze natural language texts into those formats. Enhanced Universal Dependencies format removes the tree constraint and the target structures become connected graphs. This thesis details the design of a tree-graph integrated-format parser, which serves as the basis of the winning solution at the IWPT 2021 shared task, in combination with other techniques including a two-stage finetuning strategy and text pre-processing pipelines powered by pre-training. Finally, this thesis revisits Kahane’s (1997) idea of bubble trees, which marks span boundaries on top of otherwise dependency-based structures, to provide an explicit mechanism to represent coordination structures. The transition-based system developed to parse into such bubble tree structures shows improvement on the task of coordination structure prediction.
Journal / Series
Volume & Issue
Description
185 pages
Sponsorship
Date Issued
2021-08
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Lee, Lillian
Committee Co-Chair
Committee Member
Sridharan, Karthik
Rooth, Mats
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Attribution 4.0 International
Types
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record