STRUCTURED AUTOMATIC DIFFERENTIATION
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Differentiation is one of the fundamental problems in numerical mathemetics. The solution of many optimization problems and other applications require knowledge of the gradient, the Jacobian matrix, or the Hessian matrix of a given function. Many large scale optimization applications (e.g., inverse problems) are very complex in nature. It becomes impractical to consider the function evaluation of such problems as a ``black-box'' function, since the computation is structured in some manner, going through a set of defined structured steps, i.e., problem structure. It pays to expose the problem structure in the computation to be able to compute the derivatives efficiently thus making the solution procedure practical. Automatic differentiation (AD) can compute fast and accurate derivatives of any degree computationally via propagating Taylor series coefficients using the chain rule. AD doesn't incur any truncation error and to compute the derivatives efficiently thus making the problem solution practical. would yield exact results if the calculations were done in real arithmetic; in other words the derivatives obtained are accurate to machine precision. This thesis is concerned with the efficient application of AD to large (and complex) optimization problems. The major theme is the structure exploitation of the user problem. We present methodologies which allow AD to exploit problem structure. An important idea is the exploitation of sparsity in the Jacobian matrices: We present a scheme which combines the forward and reverse modes of AD. Problem structure can be viewed in many different ways; one way is to look at the granularity of the operations involved. For example, differentiation carried out at the matrix-vector operations can lead to great savings in the time as well as space requirements. Figuring out the {\em kind} of computation is another way to view structure, e.g., partially separable or composite functions whose structur e can be exploited to get performance gains. In this thesis we develop a general structure framework which can be viewed hierarchically and allows for structure exploitation at various levels. For example, for time integration schemes employing stencils it is possible to exploit structure at both the stencil level and the timestep level. We also present some advanced structure exploitation ideas, e.g., parallelism in structured computations and using structure in implicit computations. The use of AD as a derivative computing engine naturally automates all the methodologies presented in this work -- we present ways to make the design of numerical optimization software very transparent, and the presentation of problems by the user as easy as possible.