Generalized Bottom-Up Parsing
Szymanski, Thomas G.
In this thesis we present a decision procedure for testing the correctness of a broad class of bottom-up parsing machines. Motivated by the work of Colmerauer and Williams, we allow our parsers to find any simple phrase (not necessarily the leftmost) in the input string. Such a parser can be implemented on an automaton using two puchdown stores and can in fact produce a complete parse for an input string in linear time with respect to the length of the input. Certain restrictions are necessary in order for the method to work, but nevertheless it is sufficiently general to handle most existing classes of parsers. Moreover, the adjustment of certain parameters to this decision procedure gives rise in a natural way to such classes of grammars as the LR(k) class of Knuth, the BRC class of Floyd and the BCP class of Williams. Further adjustment of these parameters suggests other, more general, classes of parsable grammars which we investigate here for the first time. Among these new classes of grammars is one first suggested by Knuth and given the name LR(k,t). This class is a generalization of the LR(k) method and intuitively is that class of grammar for which it is possible, in any sentential form, to find one of the t leftmost simple phrases given only that portion of the string to the left of the phrase and the first k characters to its right. We give an exact construction for parsers of this class and present the surprising fact that these grammars can be parsed using a deterministic pushdown automaton. We also investigate a class herein called $LR(k,\infty)$ in which we completely relax the condition that the selected phrase be in any certain location. This latter class, which represents the ultimate left to right bottom-up parser, is shown to be too general to have "nice" decidability properties. A final class of grammars investigated is designated FPFAP(k), that is, the class of grammars which are parsable in a left to right fashion by a finite state automaton using k characters of lookahead. This class is shown to lie strictly between the LR(k,t) and $LR(k,\infty)$ classes. We conclude by demonstrating the relationship between these and other classes of grammars, not only from the point of view of the grammars themselves, but also with regard to the classes of languages induced by the grammars.
computer science; technical report
Previously Published As