Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell Computing and Information Science
  3. Computer Science
  4. Computer Science Technical Reports
  5. Automatic Hypertext Construction

Automatic Hypertext Construction

File(s)
95-1484.ps (1.96 MB)
95-1484.pdf (855.6 KB)
Permanent Link(s)
https://hdl.handle.net/1813/7143
Collections
Computer Science Technical Reports
Author
Allan, James
Abstract

The unprecedented growth of the World Wide Web illustrates the importance of hypertext as a method for organizing the rapidly expanding amount of on-line text. As document collections become larger and more dynamic, however, it is not feasible to construct more than an occasional hypertext manually. This thesis presents entirely automatic methods for gathering documents for a hypertext, linking them, and annotating those connections with a description of the type or nature of the link. The problem of automatically collecting related documents is addressed in Chapter 2, where robust Information Retrieval methods are applied to form high-quality links between documents. A local context check identifies links where ambiguous vocabulary erroneously suggests a relationship. Dynamic part retrieval is employed to select the portions of documents which are most related, allowing parts to be linked when it is more appropriate to link subtopics than entire documents. Chapter 3 presents a taxonomy of hypertext link types and defines the following three classes of links: "pattern-matching" links can be found using simple string-matching methods, "manual" links require substantial application of natural language understanding methods (which are currently beyond the state of the art), and "automatic" links are those which can be found using the methods of this thesis. Chapter 4 begins the work of automatic link typing by describing two novel graphical techniques for visualizing the relationship between two or more documents. "Uniform" visuals display the relationship between documents or document parts without regard to their relative sizes, whereas "varying" visuals include information about sizes and locations. Both methods highlight relationships between documents and motivate the automatic techniques of Chapter 5. Chapter 5, thus, demonstrates automatic methods for identifying the relationships depicted in the visualizations. Using an approach based upon graph simplification, this method automatically identifies revision, summary, expansion, equivalence, comparison, contrast, tangential, and aggregate links. Chapter 6 discusses an informal evaluation of the link typing. Though somewhat inconclusive, the evaluation demonstrates that automatic document linking performs well, but also indicates that much work remains to be done toward understanding automatic link typing.

Date Issued
1995-02
Publisher
Cornell University
Keywords
computer science
•
technical report
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1484
Type
technical report

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance