eCommons

 

Floating-Point Arithmetic and Program Correctness Proofs

Other Titles

Abstract

This thesis develops tight upper and lower bounds on the relative error in various schemes for performing floating-point arithmetic, proposes axioms for characterizing the significant properties embodied by these schemes, and gives examples to illustrate how these axioms may be used to reason about the correctness of floating-point programs. Three addition schemes are considered: (1) chopped addition, (2) addition with both pre and post-adjustment rounding, and (3) addition with pre-adjustment chopping and post-adjustment rounding. Schemes for performing both rounded and chopped multiplication and division are also considered. Our tight bounds are consistent with the commonly held opinion that a binary base minimizes the maximum relative errors in floating-point arithmetic. Also, these bounds show that one guard digit is optimal for minimizing the maximum relative errors in chopped addition. The bounds derived for each of the addition schemes considered are as tight as possible. One guard digit and two guard bits are shown to be sufficient to round the result of an exact addition to the nearest floating-point number. We show how this scheme can be implemented using a single post-adjustment shift, no rounding overflow, and (for certain implementations) requiring no more time than an addition that chops instead of rounds. Two approaches are considered for axiomatizing floating-point arithmetic. In one approach, a set of floating-point numbers is associated with each floating-point expression, and the assignment statement is modeled as a nondeterministic selector of one of the members in the set. In the alternative approach, the floating-point operations are modeled in terms of two cropping functions whose significant properties are characterized by a small set of axioms. In both cases, the axioms characterizing floating-point arithmetic are used with Dijkstra's weakest pre-condition calculus to provide an axiomatic framework for reasoning about floating-point programs. Finally, the commom practice of modelling the floating-point operations by a single function that chops or rounds the result of the corresponding exact operation is shown to be invalid for many implementations of floating-point arithmetic.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

1980-08

Publisher

Cornell University

Keywords

computer science; technical report

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR80-436

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

technical report

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record