Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell Computing and Information Science
  3. Computer Science
  4. Computer Science Technical Reports
  5. Using White Space for Automated Document Structuring

Using White Space for Automated Document Structuring

File(s)
94-1452.pdf (1.14 MB)
94-1452.ps (1.27 MB)
Permanent Link(s)
https://hdl.handle.net/1813/6242
Collections
Computer Science Technical Reports
Author
Rus, Daniela
Summers, Kristen
Abstract

We present and analyze efficient algorithms for the automated recognition and interpretation of layout structures in electronic documents. The key idea is to use the patterns in the distribution of white space in a document to recognize and interpret its components. The recognition algorithm divides the document into a hierarchy of logical elements; the interpretation algorithms classify these divisions as base-text, tables, indented lists, polygonal drawings, and graphs. We present experimental data and discuss an information access application. Our methodology allows the automatic markup of documents\footnote{For instance in the SGML format} and the creation of multi-level indices and browsing tools for electronic libraries.

Date Issued
1994-09
Publisher
Cornell University
Keywords
computer science
•
technical report
Previously Published as
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR94-1452
Type
technical report

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance