Floating-Point Arithmetic and Program Correctness Proofs

Other Titles


This thesis develops tight upper and lower bounds on the relative error in various schemes for performing floating-point arithmetic, proposes axioms for characterizing the significant properties embodied by these schemes, and gives examples to illustrate how these axioms may be used to reason about the correctness of floating-point programs. Three addition schemes are considered: (1) chopped addition, (2) addition with both pre and post-adjustment rounding, and (3) addition with pre-adjustment chopping and post-adjustment rounding. Schemes for performing both rounded and chopped multiplication and division are also considered. Our tight bounds are consistent with the commonly held opinion that a binary base minimizes the maximum relative errors in floating-point arithmetic. Also, these bounds show that one guard digit is optimal for minimizing the maximum relative errors in chopped addition. The bounds derived for each of the addition schemes considered are as tight as possible. One guard digit and two guard bits are shown to be sufficient to round the result of an exact addition to the nearest floating-point number. We show how this scheme can be implemented using a single post-adjustment shift, no rounding overflow, and (for certain implementations) requiring no more time than an addition that chops instead of rounds. Two approaches are considered for axiomatizing floating-point arithmetic. In one approach, a set of floating-point numbers is associated with each floating-point expression, and the assignment statement is modeled as a nondeterministic selector of one of the members in the set. In the alternative approach, the floating-point operations are modeled in terms of two cropping functions whose significant properties are characterized by a small set of axioms. In both cases, the axioms characterizing floating-point arithmetic are used with Dijkstra's weakest pre-condition calculus to provide an axiomatic framework for reasoning about floating-point programs. Finally, the commom practice of modelling the floating-point operations by a single function that chops or rounds the result of the corresponding exact operation is shown to be invalid for many implementations of floating-point arithmetic.

Journal / Series

Volume & Issue



Date Issued



Cornell University


computer science; technical report


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


technical report

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record