Center for Advanced Computing
https://hdl.handle.net/1813/569
2020-07-14T14:11:50ZAccurate Solution of Weighted Least Squares by Iterative Methods
https://hdl.handle.net/1813/5598
Accurate Solution of Weighted Least Squares by Iterative Methods
Bobrovnikova, Elena Y.; Vavasis, Stephen A.
We consider the weighted least-squares (WLS) problem with a very ill-conditioned weight matrix. Weighted least-squares problems arise in many applications including linear programming, electrical networks, boundary value problems, and structures. Because of roundoff errors, standard iterative methods for solving a WLS problem with ill-conditioned weights may not give the correct answer. Indeed, the difference between the true and computed solution (forward error) may be large. We propose an iterative algorithm, called MINRES-L, for solving WLS problems. The MINRES-L method is the application of MINRES, a Krylov-space method due to Paige and Saunders, to a certain layered linear system. Using a simplified model of the effects of round off error, we prove that MINRES-L gives answers with small forward error. We present computational experiments for some applications.
1997-02-06T00:00:00ZLocal correlation energies of two-electron atoms and model systems
https://hdl.handle.net/1813/5597
Local correlation energies of two-electron atoms and model systems
Huang, Chien-Jung; Umrigar, C.J.
We present nearly-local definitions of the correlation energy
density, and its potential and kinetic components, and evaluate them for several two-electron systems. This information should provide valuable guidance in constructing better correlation functionals than those in common use. In addition, we demonstrate that the quantum chemistry and the density functional definitions of the correlation energy rapidly approach one another with increasing atomic number.
1997-01-01T00:00:00ZQuality Mesh Generation in Higher Dimensions
https://hdl.handle.net/1813/5596
Quality Mesh Generation in Higher Dimensions
Mitchell, Scott A.; Vavasis, Stephen A.
We consider the problem of triangulating a d-dimensional region. Our mesh generation algorithm, called QMG, is a qradtree-based algorithm that can triangulate any polyhedral region including nonconvexregions with holes. Furthermore, our algorithm guarantees a bounded aspect ratio triangulation provided that the input domain itself has no sharp angles. Finally, our algorithm is guaranteed never to over refine the domain in the sense that the number of simplices produced by QMG is bounded above by a factor times the number produced by any competing algorithm, where the factor depends on the aspect ratio bound satisfied by the competing algorithm. The QMG algorithm has been implemented in C++ and is used as a mesh generator for the finite element method.
1996-12-01T00:00:00ZModel fermion Monte Carlo with correlated pairs II
https://hdl.handle.net/1813/5595
Model fermion Monte Carlo with correlated pairs II
Kalos, M.H.; Schmidt, K.E.
Correlated dynamics can produce stable algorithms for excited states of quantum many-body problems. We study a variety of harmonic oscillator problems to demonstrate the kinds of correlations needed. We show that marginally correct dynamics that produce a stable overlap with an antisymmetrictrial function give the correct fermion ground state.
1996-12-01T00:00:00ZSymmetry, Nonlinear Bifurcation Analysis, and Parallel Computation
https://hdl.handle.net/1813/5594
Symmetry, Nonlinear Bifurcation Analysis, and Parallel Computation
Wohlever, J.C.
In the natural and engineering sciences the equations which model physical systems with symmetry often exhibit an invariance with respect to a particular group "G" of linear transformations. "G" is typically a linear representation of a symmetry group "g" which characterizes the symmetry of the physical system. In this work, we will discuss the natural parallelism which arises while seeking families of solutions to a specific class of nonlinear vector equations which display a special type of group invariance, referred to as equivariance. The inherent parallelism stems for a global de-coupling, due to symmetry, of the full nonlinear equations which effectively splits the original problem into a set of smaller problems. Numerical results from asymmetry-adapted numerical procedure, (MMcontcm.m), written in MultiMATLAB are discussed.
1996-10-01T00:00:00ZMonte Carlo Optimization of Trial Wave Functions in Quantum Mechanics and Statistical Mechanics
https://hdl.handle.net/1813/5593
Monte Carlo Optimization of Trial Wave Functions in Quantum Mechanics and Statistical Mechanics
Nightingale, M.P.; Umrigar, C.J.
This review covers applications of quantum Monte Carlo methods to quantum mechanical problems in the study of electronic and atomic structure, as well as applications to statistical mechanical problems both of static and dynamic nature. The common thread in all these applications is optimization of many-parameter trial states, which is done by minimization of the variance of the local energy or, more generally for arbitrary eigenvalue problems, minimization of the variance of the configurational eigenvalue.
1996-10-01T00:00:00ZA critical assessment of the Self-Interaction Corrected Local Density Functional method and its algorithmic implementation
https://hdl.handle.net/1813/5592
A critical assessment of the Self-Interaction Corrected Local Density Functional method and its algorithmic implementation
Goedecker, S.; Umrigar, C.J.
We calculate the electronic structure of several atoms and small molecules by direct minimization of the Self-Interaction Corrected Local Density Approximation (SIC-LDA) functional. To do this we first derive an expression for the gradient of this functional under the constraint that the orbitals be orthogonal and show that previously given expressions do not correctly incorporate this constraint. In our atomic calculations the SIC-LDA yields total energies, ionization energies and charge densities that are superior to results obtained with the Local Density Approximation (LDA). However, for molecules SIC-LDA gives bond lengths and reaction energies that are inferior to those obtained from LDA. The nonlocal BLYP functional, which we include as a representative GGA functional, out performs both LDA and SIC-LDA forall ground state properties we considered.
1996-10-01T00:00:00ZEuropean Option Pricing with Fixed Transaction Costs
https://hdl.handle.net/1813/5591
European Option Pricing with Fixed Transaction Costs
Aiyer, Ajay Subramanian
In this paper, we study the problem of European option pricing in the presence of fixed transaction costs. The problems of optimal portfolio selection and option pricing in the presence of proportional transaction costs has been extensively studied in the mathematical finance literature. However, much less is known when we have fixed transaction costs. In this paper, we show that calculating the price of an European optioninvolves calculating the value functions of two stochastic impulse control problems and we obtain the explicit expressions for the resultant quasi-variational ine qualities satisfied by the value functions and then carry out a numerical calculation of the option price.
1996-09-01T00:00:00ZOptimal Portfolio Selection with Fixed Transactions Costs in the presence of Jumps and Random Drift
https://hdl.handle.net/1813/5590
Optimal Portfolio Selection with Fixed Transactions Costs in the presence of Jumps and Random Drift
Aiyer, Ajay Subramanian
In this paper, we study the general problem of optimal portfolio selection with fixed transactions costs in the presence of jumps. We extend the analysis of Morton and Pliska to this setting by modeling the return processes of the risky assets in the investor's portfolio as jump-diffusion processes and derive the expression for the related optimal stopping time problem of a Markov process with jumps and explicitly solve it in the situation when the portfolio consists only of one risky asset. We also provide an asymptotic analysis of our model with one risky asset following the ideas of Wilmott and Atkinson. In the process, we also obtain a solution for the "Merton problem" generalized to the situation when there is credit risk. Finally, we consider the case where the drift of the stockprice process is random and unobservable and obtain expressions for the optimal trading policies.
1996-09-01T00:00:00ZNon-normal Dynamics and Hydrodynamic Stability
https://hdl.handle.net/1813/5589
Non-normal Dynamics and Hydrodynamic Stability
Baggett, Jeffrey Scott
This thesis explores the interaction of non-normality and nonlinearity incontinuous dynamical systems. A solution beginning near a linearly stable fixed point may grow large by a linear mechanism, if the linearization is non-normal, until it is swept away by nonlinearities resulting in a much smaller basin of attraction than could possibly be predicted by the spectrum of the linearization. Exactly this situation occurs in certain linearly stable shear flows, where the linearization about the laminar flow may be highly non-normal leading to the transient growth of certain small disturbances by factors which scale with the Reynolds number. These issues are brought into focus in Chapter 1 through the study of atwo-dimensional model system of ordinary differential equations proposed by Trefethen, et al. [Science, 261, 1993].
In Chapter 2, two theorems are proved which show that the basin of attraction of a stable fixed point, in systems of differential equations combining a non-normal linear term with quadratic nonlinearities, can decrease rapidly as the degree of non-normality is increased, often faster than inverse linearly.
Several different low-dimensional models of transition to turbulence are examined in Chapter 3. These models were proposed by more than a dozen authors for a wide variety of reasons, but they all incorporate non-normal linear terms and quadratic nonlinearities. Surprisingly, in most cases, the basin of attraction of the "laminar flow" shrinks much faster than the inverse Reynolds number.
Transition to turbulence from optimally growing linear disturbances, streamwise vortices, is investigated in plane Poiseuille and plane Couette flows in Chapter4. An explanation is given for why smaller streamwise vortices can lead to turbulence in plane Poiseuille flow. In plane Poiseuille flow, the transient linear growth of streamwise streaks caused by non-normality leads directly to a secondary instability.
Certain unbounded operators are so non-normal that the evolution of infinitesimal perturbations to the fixed point is entirely unrelated to the spectrum, even as i to infinity. Two examples of this phenomenonare presented in Chapter 5.
1996-08-01T00:00:00ZStructure and Efficient Hessian Calculation
https://hdl.handle.net/1813/5588
Structure and Efficient Hessian Calculation
Coleman, Thomas F.; Verma, Arun
Modern methods for numerical optimization calculate (or approximate) the matrix of second derivatives, the Hessian matrix, at each iteration. The recent arrival of robust software for automatic differentiation allows for the possibility of automatically computing the Hessian matrix, and the gradient, given a code to evaluate the objective function itself. However, for large-scale problems direct application of automatic differentiation may be unacceptably expensive. Recent work has shown that this cost can be dramatically reduced in the presence of sparsity. In this paper we show that for structured problems it is possible to apply automatic differentiation tools in an economical way - even in the absence of sparsity in the Hessian.
1996-08-01T00:00:00ZStable and Efficient Solution of Weighted Least-Squares Problems with Applications in Interior Point Methods
https://hdl.handle.net/1813/5587
Stable and Efficient Solution of Weighted Least-Squares Problems with Applications in Interior Point Methods
Hough, Patricia D.
In this thesis, we consider two closely related problems. The first is a full-rank weighted least-squares problem with a weight matrix that is positive definite, diagonal, and extremely ill conditioned. The ill-conditioning can cause standard algorithms to compute solutions with, in some cases, no digits of accuracy. Theory suggests the existence of an algorithm that will compute an accurate solution despite the ill-conditioning in the weight matrix. We describe a new algorithm, the Complete Orthogonal Decomposition (COD) Algorithm, for solving the weighted least-squares problem and show that it has this desirable property. In addition, the COD Algorithm is based on standard, well-understood techniques and is straightforward to implement. A natural application for the weighted least-squares problem described in the previous paragraph is interior point methods for linear programming. We discuss the problem in this context, and describe how the COD algorithm can be extended and used in this setting. Unlike other algorithms, this one is stable for interior point methods without assuming nondegeneracy in the linear programming instance. Computational experiments indicate that it is more reliable than other algorithms when the problem is near degenerate. The second problem involves a particular interior point algorithm. In 1994, Vavasis and Ye proposed a new primal-dual path-following interior point method, the layered-step interior point (LIP) method. This algorithm interleaves traditional steps with longer, layered least-squares (LLS) steps. Computation of a LLS step requires solving a weighted least-squares problem similar to the one described above, but the weight matrix also has the property that the weights fall into well-separated groups. This additional structure allows the problem to be broken down into smaller, constrained problems with well-conditioned weight matrices. The smaller problems can then be solved stably with standard algorithms, and the LLS step can be computed. Vavasis and Ye did not propose a particular algorithm for solving the LLS problem. In this thesis, we present an algorithm based on Cholesky factorization. The algorithm is such that a modified version of the sparse Cholesky code of Ng and Peyton of Oak Ridge National Laboratories can be used. Thus, the theoretical results are straight forward, and this algorithm proves to be accurate and efficient in practice.
1996-08-01T00:00:00ZThe Procrustes Problem for Orthogonal Stiefel Matrices
https://hdl.handle.net/1813/5586
The Procrustes Problem for Orthogonal Stiefel Matrices
Bojanczyk, A.W.; Lutoborski, A.
(This abstract contains mathematical symbols that may not reproduce wellin ASCII text.) In this paper we consider the Procrustes problem on the manifold of orthogonal Stiefel matrices. That is, given matrices A epsilon R(m*k), B epsilon R(m*p), m greater tahn or equal to p greater than or equal to k, we seek the minimum of ||A - BQ||2 for all matrices Q epsilon R(p*k), QTQ = I(k*k). We introduce a class of relaxation methods for generating minimizing sequences and offer a geometric interpretation of these methods. Results of numerical experiments illustrating the convergence of the methods are given.
1996-08-01T00:00:00ZScalable Parallel Electronic Structure Calculations on the IBM SP2
https://hdl.handle.net/1813/5585
Scalable Parallel Electronic Structure Calculations on the IBM SP2
Goedecker, Stefan; Hoisie, Adolfy
We have developed a highly efficient and scalable electronic
structure code for parallel computers using message passing. The algorithm takes advantages of the natural parallelism in quantum chemistry problems to obtain very high performance even on a large number of processors. Most of the terms which scale cubically with respect to the number of atoms have been eliminated allowing the treatment of very large systems. It uses one of the most precise versions of Density Functional Theory, namely Self-Interaction Corrected Density Functional Theory.
1996-08-01T00:00:00ZOn the Efficient Methods to Solve ODEs and BVPs Using Automatic Differentiation
https://hdl.handle.net/1813/5584
On the Efficient Methods to Solve ODEs and BVPs Using Automatic Differentiation
Verma, Arun
A large number of physical phenomena are modeled by a system of ODEs
or a system of implicit ODEs. We demonstrate applicability of automatic differentiation (AD) for solving: (1) Boundary value problems in ODEs and implicit ODEs. (2) Initial state and parameter estimation problems. The impact of using AD is two fold. Firstly, efficient methods for computing the gradient vectors and Jacobian matrices have been developed using AD. Secondly the process of getting derivatives via AD is robust, more user friendly, and provides error free derivatives. Furthermore, techniques using AD have been developed which exploit structure in the user's computation, and particularly the structure we observe in boundary value problems or state/parameter estimation problems. We demonstrate by a few experiments the efficiency gained by the usage of AD in solving these problems.
1996-08-01T00:00:00ZPiecewise Differentiable Minimization for Ill-posed Inverse Problems
https://hdl.handle.net/1813/5583
Piecewise Differentiable Minimization for Ill-posed Inverse Problems
Li, Yuying
Based on minimizing a piece wise differentiable lp function subject
to a single inequality constraint, this paper discusses algorithms for a discretized regularization problem for ill-posed inverse problems. We examine computational challenges of solving this regularization problem. Possible minimization algorithms such as the steepest descent method, iteratively weighted least squares (IRLS) method and a recent globally convergent affine scaling Newton approach are considered. Limitations and efficiency of these algorithms are demonstrated using the geophysical travel time tomographic inversion and image restoration applications.
1996-08-01T00:00:00ZTransformation Techniques for Toeplitz and Toeplitz-plus-Hankel Matrices Part II. Algorithms
https://hdl.handle.net/1813/5582
Transformation Techniques for Toeplitz and Toeplitz-plus-Hankel Matrices Part II. Algorithms
Bojanczyk, A.W.; Heinig, George
In the first part of the paper transformations mapping Toeplitz
and Toeplitz-plus-Hankel matrices into generalized Cauchy matrices were studied. In this second part fast algorithms for LU-factorization and inversion of generalized Cauchy matrices are discussed. It is shown that the combination of transformation pivoting techniques leads to algorithms for indefinite Toeplitz and Toeplitz-plus-Hankel matrices that are more stable than the classical ones Special attention is paid to the symmetric and hermitian cases.
1996-08-01T00:00:00ZTransformation Techniques for Toeplitz and Toeplitz-plus-Hankel Matrices Part I. Transformations
https://hdl.handle.net/1813/5581
Transformation Techniques for Toeplitz and Toeplitz-plus-Hankel Matrices Part I. Transformations
Bojanczyk, A.W.; Heinig, George
Transformations of the form A to C1AC2 are investigated that
transform Toeplitz and Toeplitz-plus-Hankel matrices into generalized Cauchy matrices. C1 and C2 are matrices related to the discrete Fourier transformation or to various real trigonometric transformations. Combining these results with pivoting techniques, in part II algorithms for Toeplitz and Toeplitz-plus-Hankel systems will be presented that are more stable than classical algorithms.
1996-08-01T00:00:00ZGeneralized gradient approximations to density functional theory: comparison with exact results
https://hdl.handle.net/1813/5580
Generalized gradient approximations to density functional theory: comparison with exact results
Filippi, Claudia; Gonze, Xavier; Umrigar, C.J.
In order to assess the accuracy of commonly used approximate exchange-correlation density functionals, we present a comparison of accurate exchange and correlation potentials, exchange energy densities and energy components with the corresponding approximate quantities. Four systems are used as illustrative examples: the model system of two electrons in a harmonic potential and the De, Be and Ne atoms. A new ingredient in the paper is the separation of the exchange-correlation potential into exchange and correlation according to the density functional theory definition.
1996-06-01T00:00:00ZSeparation of the Exchange-Correlation Potential into Exchange plus Correlation: an Optimized Effective Potential Approach
https://hdl.handle.net/1813/5579
Separation of the Exchange-Correlation Potential into Exchange plus Correlation: an Optimized Effective Potential Approach
Filippi, Claudia; Umrigar, C.J.; Gonze, Xavier
Most approximate exchange-correlation functionals used within density functional theory are constructed as the sum of two distinct contributions for exchange and correlation. Separating the exchange component from the entire functional is useful since, for exchange, exact relations exist under uniform density scaling and spin scaling. In the past, accurate exchange-correlation potentials have been generated from essentially exact densities but they have not been correctly decomposed into their separate exchange and correlation components (except for two-electron systems). Using a recently proposed method, equivalent to the solution of an optimized effective potential problem with the corresponding orbitals replaced by the exact Kohn-Sham orbitals, we obtain the separation according to the density functional theory definition. We compare the results for the Ne and Be atoms with those obtained by the previously used appromixate separation scheme.
1996-06-01T00:00:00ZSelf-organized criticality, evolution, and extinction
https://hdl.handle.net/1813/5578
Self-organized criticality, evolution, and extinction
Newman, M.E.J.
Statistical analysis indicates that the fossil exctinction record is compatible with a distribution of extinction events whose frequency is related to their size by a power law with exponent tau approx. = 2. This result is in agreement with preductions based on self-organized critical models of extinction, and might well be taken as evidence for self-organizing behavior in terrestrial evolution. We argue however that there is a much simpler explanation for the appearance of a power law in terms of extinctions caused by stresses (either biotic or abiotic) to which species are subjected by their environment. We give an explicit model of this process and discussits properties and implications for the interpretation of the fossil record.
1996-06-01T00:00:00ZMatrix Iterations: The Six Gaps Between Potential Theory and Convergence
https://hdl.handle.net/1813/5577
Matrix Iterations: The Six Gaps Between Potential Theory and Convergence
Driscoll, Tobin A.; Toh, Kim-Chuan; Trefethen, Lloyd N.
The theory of the convergence of Krylov subspace iterations for linear systems of equations (conjugate gradients, biconjugate gradients, GMRES, QMR, Bi-CGSTAB, ...) is reviewed. For a computation of this kind, an estimated asymptotic convergence factor rho less than 1 can be derived by solving a problem of potential theory or conformal mapping. Six approximations are involved in reducing the actual computation to this scalar estimate. These six approximations are discussed in a systematic way and illustrated by a sequence of examples computed with tools of numerical conformal mapping and semidefinite programming.
1996-06-01T00:00:00ZWorkload Evolution on the Cornell Theory Center IBM SP2
https://hdl.handle.net/1813/5576
Workload Evolution on the Cornell Theory Center IBM SP2
Hotovy, Steve
The Cornell Theory Center (CTC) put a 512-node IBM SP2 system into production in early 1995, and extended traces of batch jobs began to be collected in June of that year. An analysis of the workload shows that it has not only grown, but that its characteristics have changed over time. In particular, job duration increased with time, indicative of an expanding production workload. In addition, there was increasing use of parallelism. As the load has increased and larger jobs have become more frequent, the batch management software (IBM's LoadLeveler) has had difficulty in scheduling the requested resources. New policies were established to improve the situation. This paper will profile how the workload has changed over time and give anin-depth look at the maturing workload. It will examine how frequently certain resources are requested and analyze user submittal patterns. It will also describe the policies that were implemented to improve the scheduling situation and their effect on the workload.
1996-06-01T00:00:00ZToward a Portable Parallel Library for Space-Time Adaptive Methods
https://hdl.handle.net/1813/5575
Toward a Portable Parallel Library for Space-Time Adaptive Methods
Lebak, James M.; Durie, Robert C.; Bojancyk, Adam W.
Space-time adaptive processing (STAP) refers to a class of methods for detecting targets using an array of sensors. The output of the array is weighted using data collected from the sensors over a given period of time. An optimal method of calculating weights exists; however, this method is usually computationally impractical. Therefore, various heuristic methods are used that approximate the optimal method. These heuristics use many of the same operations and are computationally demanding. We are in the process of constructing a portable, parallel library of subroutines useful for constructing STAP heuristics. As a first step in this process, we implemented one STAP heuristic, higher-order post-Doppler processing, using three different parallel methods on the IBM SP2 and the Intel Paragon: these methods characterize different parallel approaches to the STAP problem. From implementing these algorithms, we have been able to identify components for our parallel library. We propose models for some of the components and give preliminary timing results for the parallel methods.
1996-06-01T00:00:00ZDomain Decomposition Methods for Conformal Mapping and Eigenvalue Problems
https://hdl.handle.net/1813/5574
Domain Decomposition Methods for Conformal Mapping and Eigenvalue Problems
Driscoll, Tobin Allen
Domain decomposition is widely used in the numerical solution of elliptic boundary value problems. It is appealing in part because of improved efficiency and straightforward parallelization. Conceptually, domain decomposition often exploits a natural feature of elliptic problems: data at one point may have an exceedingly weak influence on the solution at a far-removed point. The application of domain decomposition to other elliptic problems is less fully developed. One such area is numerical conformal mapping. We introduce the SC Toolbox for Matlab, an interactive graphical software package for the Schwarz-Christoffel mapping of polygons. The Toolbox can be used for interior and exterior mapping from several fundamental domains. Elongations in the polygon lead to crowding, in which the preimages of affected vertices are exponentially close together. Such regions are candidates for decomposition. We describe CRDT, an overlapping subdomain method developed with Vavasis for numerical Schwarz-Christoffel mapping. The method uses Delaunay triangulation to decompose the polygon into overlapping quadrilaterals, which in turn define cross-ratios that form the basis of a nonlinear system. Each quadrilateral induces an embedding of the prevertices so that locally, the map can be computed accurately. Apparently CRDT can deal with any degree of crowding, as is demonstrated by examples. Another application in conformal mapping is in Symm's integral equation. An important feature of existing software for Symm's equation is the efficient treatment of corner singularities. Careful generalization to multiple domains allows this treatment to be preserved and extended. An onoverlapping formulation leads to a linear system that is ideal for Schur complementation. The resulting method asymptotically requires a fraction of the single-domain work and is easily parallelized. We also consider a domain decomposition algorithm for the Laplace eigenvalue problem on polygons. This method, an improvement on one described by Descloux and Tolley, searches for the matching of Fourier-Bessel expansions at each corner to locate eigenvalues. We apply the algorithm to the "isospectral drums" discovered by Gordon, Webb, and Wolpert to find 25 eigenvalues to 12 digits. The method is far more accurate and efficient than standard methods for this problem.
1996-05-01T00:00:00ZThe Chebyshev Polynomials of a Matrix
https://hdl.handle.net/1813/5573
The Chebyshev Polynomials of a Matrix
Toh, Kim-Chuan; Trefethen, Lloyd N.
A Chebyshev polynomial of a square matrix A is a monic polynomial of specified degree that minimizes ||p(A)||(sub2). The study of such polynomials is motivated by the analysis of Krylov subspace iterations in numerical linear algebra. An algorithm is presented for computing these polynomials based on reduction to a semidefinite program which is then solved by a primal-dual interior point method. Examples of Chebyshev polynomials of matrices are presented, and it is noted that if A is far from normal, the lemniscates of these polynomials tend to approximate pseudospectra of A.
1996-05-01T00:00:00ZMultiMATLAB: MATLAB on Multiple Processors
https://hdl.handle.net/1813/5572
MultiMATLAB: MATLAB on Multiple Processors
Trefethen, Anne E.; Menon, Vijay S.; Chang, Chi-Chao; Czajkowski, Grezgorz J.; Myers, Chris; Trefethen, Lloyd N.
MATLAB(R), a commercial product of The MathWorks, Inc., has become one of the principal languages of desktop scientific computing. A system is described that enables one to run MATLAB conveniently on multiple processors. Using short, MATLAB-style commands like Eval, Send, Recv, Bcast, Min, and Sum, the user operating within one MATLAB session can start various processes in a fashion that maintains MATLAB's traditional user-friendliness. Multi-processor graphics is also supported. The system currently runs under MPICH on an IBM SP2 or a network of Unix workstations, and extensions are planned to networks of PCs. MultiMATLAB is potentially useful for education in parallel programming, for prototyping parallel algorithms, and for fast and convenient execution of easily parallelizable numerical computations on multiple processors.
1996-05-01T00:00:00ZStructure and Efficient Jacobian Calculation
https://hdl.handle.net/1813/5571
Structure and Efficient Jacobian Calculation
Coleman, Thomas F.; Verma, Arun
Many computational tasks require the determination of the Jacobian matrix, at a given argument, for a large nonlinear system of equations. Calculation or approximation of a Newton step is a related task. The development of robust automatic differentiation (AD) software allows for "painless" and accurate calculation of these quantities; however, straight forward application of AD software on large-scale problems can require an inordinate amount of computation. Fortunately, large-scale systems of nonlinear equations typically exhibit either sparsity or structure in their Jacobian matrices. In this paper we proffer general approaches for exploiting sparsity and structure to yield efficient ways to determine Jacobian matrices (and Newton steps) via automatic differentiation.
1996-03-01T00:00:00ZAvalanches, Scaling, and Coherent Noise
https://hdl.handle.net/1813/5570
Avalanches, Scaling, and Coherent Noise
Newman, M.E.J.; Sneppen, Kim
We present a simple model of a dynamical system driven slowly by externally-imposed coherent noise. Although the system never becomes critical in the sense of possessing spatial correlations of arbitrarily long range, it does organize into a stationary state characterized by avalanches with a universal power-law size distribution. We explain the behavior of the model within a time-averaged approximation, and discuss its connection to the dynamics of earthquakes, the Gutenberg-Richter law, and to recent experiments on avalanches in rice piles.
1996-03-01T00:00:00ZLow-dimensional models of subcritical transition to turbulence
https://hdl.handle.net/1813/5569
Low-dimensional models of subcritical transition to turbulence
Baggett, Jeffrey S.; Trefethen, Lloyd N.
In the past five years, working largely independently, five groups of researchers have proposed low-dimensional models of the behavior of parallel shear flows at high Reynolds numbers. These models are compared, and it is found that they are more similar than their authors have recognized. Among other similarities, most of them exhibit a threshold amplitude c=O(R**alpha) as R to infinity for some alpha less than -1, where R is the Reynolds number, for perturbations of the laminar state that may excite transition to turbulence.
1996-02-01T00:00:00ZAnalysis of the Early Workload on the Cornell Theory Center IBM SP2
https://hdl.handle.net/1813/5568
Analysis of the Early Workload on the Cornell Theory Center IBM SP2
Hotovy, Steven; Schneider, David; O'Donnell, Timothy
Parallel computers have matured to the point where they are capable of running a significant production workload. Characterizing this workload, however, is far more complicated than for the single-processor case. Besides the varying number of processors that may be invoked, the nodes themselves may provide differing computational resources (memory size, for example). In addition, the batch schedulers may introduce further categories of service which must be considered in the analysis. The Cornell Theory Center (CTC) put a 512-node IBM SP2 system into production in early 1995. Extended traces of batch jobs began to be collected in mid-1995 when the usage base became sufficiently large. This paper offers an analysis of this early batch workload.
1996-01-01T00:00:00ZNumerical Conformal Mapping Using Cross-ratios and Delaunay Triangulation
https://hdl.handle.net/1813/5567
Numerical Conformal Mapping Using Cross-ratios and Delaunay Triangulation
Driscoll, Tobin; Vavasis, Stephen A.
We propose a new algorithm for computing the Riemann mapping of the unit disk to a polygon, also known as the Schwarz-Christoffel transformation. The new algorithm, CRDT, is based on cross-ratios of the prevertices, and also on cross-ratios of quadrilaterals in a Delaunay triangulation of the polygon. The CRDT algorithm produces an accurate representation of the Riemann mapping even in the presence of arbitrary long, thin regions in the polygon, unlike any previous conformal mapping algorithm. We believe that CRDT can never fail to converge to the correct Riemann mapping, but the correctness and convergence proof depend on conjectures that we have so far not been able to prove. We demonstrate convergence with computational experiments. The Riemann mapping has applications to problems in two-dimensional potential theory and to finite-difference mesh generation. We use CRDT to produce a mapping and solve a boundary value problem on long, thin regions for which no other algorithm can solve these problems.
1996-02-01T00:00:00ZMulticonfiguration Wavefunctions for Quantum Monte Carlo Calculations of First-row Diatomic Molecules
https://hdl.handle.net/1813/5566
Multiconfiguration Wavefunctions for Quantum Monte Carlo Calculations of First-row Diatomic Molecules
Filippi, Claudia; Umrigar, C.J.
We use the variance minimization method to determine accurate wavefunctions for first-row homonuclear diatomic molecules. The form of the wave function is a product of a sum of determinants and a generalized Jastrow factor. One of the important features of the calculation is that we are including low-lying determinants corresponding to single and double excitations from the Hartree-Fock configuration within the space of orbitals whose atomic principal quantum numbers do not exceed those occurring in the Hartree-Fock configuration. The idea is that near-degeneracy correlation is most effectively described by a linear combination of low-lying determinants whereas dynamic correlation is well described by the generalized Jastrow factor. All the parameters occurring in both the determinantal and the Jastrow parts of the wave function are optimized. The optimized wave functions recover 77-94% of the correlation energy in variational Monte Carlo and 91-99% of the correlation energy in diffusion Monte Carlo.
1996-01-01T00:00:00ZLocality-Conscious Load Balancing: Connectionist Architectural Support
https://hdl.handle.net/1813/5565
Locality-Conscious Load Balancing: Connectionist Architectural Support
Tumuluri, Chaitanya; Choudhary, Alok N.; Mohan, Chilukuri K.
Traditionally, in distributed memory architectures, locality maintenance and load balancing are seen as user level activities involving compiler and runtime system support in software. Such software solutions require an explicit phase of execution, requiring the application to suspend its activities. This paper presents the first (to our knowledge) architecture-level scheme for extracting locality concurrent with the application execution. An artificial neural network coprocessor is used for dynamically monitoring processor reference streams to learn temporally emergent utilities of data elements in ongoing local computations. This facilitates use of kernel-level load balancing schemes thus, easing the user programming burden. The kernel-level scheme migrates data to processor memories evincing higher utilities during load-balancing. The performance of an execution-driven simulation evaluating the proposed coprocessor is presented for three applications. The applications chosen represent the range of load and locality fluxes encounted in parallel programs, with (a) static locality and load characteristics, (b) slowly varying localities for fixed datasetsizes and (c) rapidly fluctuating localities among slowly varying datasetsizes. The performance results indicate the viability and success of the coprocessor in concurrently extracting locality for use in load balancing activities.
1996-01-01T00:00:00ZSolving Unconstrained Discrete-time Optimal Control Problems Using Trust Region Method
https://hdl.handle.net/1813/5564
Solving Unconstrained Discrete-time Optimal Control Problems Using Trust Region Method
Liao, Aiping
Trust region method for a class of large-scale minimization problems,
the unconstrained discrete-time optimal control (DTOC) problems, is considered. Although the trust region algorithms developed in [4] and [13] are very economical they lack the ability to handle the so-called hard case. In this paper, we show that the trust region subproblem can be solved within an acceptable accuracy without forming the Hessian explicitly. The new approach is based on the inverse power method for eigenvalue problem and possesses the ability to handle the hard case. Our proposed approach leads to more efficient algorithms for DTOC problems.
1995-12-01T00:00:00ZA Comparison of Optimization Heuristics for the Data Mapping Problem
https://hdl.handle.net/1813/5563
A Comparison of Optimization Heuristics for the Data Mapping Problem
Chrisochoides, Nikos; Mansour, Nashat; Fox, Geoffrey
In this paper we compare the performance of six heuristics with
suboptimal solutions for the data mapping problem of two dimensional meshes that are used for the numerical solution of Partial Differential Equations(PDEs) on multicomputers. The data mapping heuristics are evaluated with respect to seven criteria covering load balancing, interprocessor communication,flexibility and ease of use for a class of single-phase iterative PDE solvers. Our evaluation suggests that the simple and fast block distribution heuristic can be as effective as the other five complex and computationally expensive algorithms.
1995-12-01T00:00:00ZDealing with Dense Rows in the Solution of Sparse Linear Least Squares Problems
https://hdl.handle.net/1813/5562
Dealing with Dense Rows in the Solution of Sparse Linear Least Squares Problems
Sun, Chunguang
Sparse linear least squares problems containing a few relatively dense rows occur frequently in practice. Straightforward solution of these problems could cause catastrophic fill and delivers extremely poor performance. This paper studies a scheme for solving such problems efficiently by handling dense rows and sparse rows separately. How a sparse matrix is partitioned into dense rows and sparse rows determines the efficiency of the overall solution process. A new algorithm is proposed to find a partition of a sparse matrix which leads to satisfactory or even optimal performance. Extensive numerical experiments are performed to demonstrate the effectiveness of the proposed scheme. A MATLAB implementation is included.
1995-12-01T00:00:00ZPseudospectra of Linear Operators
https://hdl.handle.net/1813/5561
Pseudospectra of Linear Operators
Trefethen, Lloyd N.
The following contains mathematical formulae and symbols that may become distorted in ASCII text format. The advent of ever more powerful computers has brought with it a new way of conceiving some of the fundamental eigenvalue problems of applied mathematics. If a matrix or linear operator "A" is far from normal, its eigenvalues or more generally its spectrum may have little to do with its behavor as measured by quantities such as ||A**N|| or ||exp(tA)||. More may be learned by examining the sets in the complex plane known as the "pseudospectra" of A, defined by level curves of the norm of the resolvent, ||(zI - A)**-1||. Five years ago, the author published a paper that presented computed pseudospectra of thirteen highly non-normal matrices arising in various applications. Since that time, analogous computations have been carried out for differential and integral operators. This paper, a companion to the earlier one, presents ten examples, each chosen to illustrate one or more mathematical or physical principles.
1995-12-01T00:00:00ZThe Efficient Computation of Sparse Jacobian Matrices Using Automatic Differentiation
https://hdl.handle.net/1813/5560
The Efficient Computation of Sparse Jacobian Matrices Using Automatic Differentiation
Coleman, Thomas F.; Verma, Arun
This paper is concerned with the efficient computation of sparse Jacobian matrices of nonlinear vector maps using automatic differentiation (AD). Specifically, we propose the use of a graph coloring technique, bi-coloring, to exploit the sparsity of the Jacobian matrix J and thereby allow for the efficient determination of J using AD software. We analyze both a direct scheme and a substitution process. We discuss the results of numerical experiments indicating significant practical potential of this approach.
1995-12-01T00:00:00ZA Newton Acceleration of the Weiszfeld Algorithm for Minimizing the Sum of Euclidean Distances
https://hdl.handle.net/1813/5559
A Newton Acceleration of the Weiszfeld Algorithm for Minimizing the Sum of Euclidean Distances
Li, Yuying
The Weiszfeld algorithm for continuous location problems can be considered as an iteratively reweighted least squares method. It exhibits linear convergence. In this paper, a Newton type algorithm with similar simplicity is proposed to solve a continuous multifacility location problem with Euclidean distance measure. Similar to the Weiszfeld algorithm, at each iteration the main computation can be solving a weighted least squares problem. A Cholesky factorization of a symmetric positive definite band matrix, typically with a relatively small band width (e.g., a band width of two for a Euclidean location problem on a plane) is required. This new algorithm can be regarded as a Newton acceleration to the Weiszfeld algorithm with fast global and local convergence. The simplicity and efficiency of the proposed algorithm makes it particularly suitable for large-scale Euclidean location problems and parallel implementation. Computational experience also suggests that the proposed algorithm performs remarkably well in the presence of degeneracy and near degeneracy. In addition, it is proven to be globally convergent. Although the local convergence analysis is still under investigation, computation results suggest that it is typically superlinearly convergent.
1995-11-01T00:00:00ZAn Aspect Ratio Bound for Triangulating a d-grid Cut by a Hyperplane
https://hdl.handle.net/1813/5558
An Aspect Ratio Bound for Triangulating a d-grid Cut by a Hyperplane
Mitchell, Scott A.; Vavasis, Stephen A.
We consider the problem of triangulating a d-dimensional uniform grid of d-cubes that is cut by a k-dimensional affine subspace. The goal is to obtain a triangulation with bounded aspect ratio. To achieve this goal, we allow some of the box faces near the affine subspace to be displaced. This problem has applications to finite element mesh generation. For general d and k, the bound on aspect ratio that we attain is double-exponential in d. For the important special case of d = 3, the aspect ratio bound is small enough that the technique is useful in practice.
1995-11-01T00:00:00ZMultithreaded model for dynamic load balancing parallel adaptive PDE computations
https://hdl.handle.net/1813/5557
Multithreaded model for dynamic load balancing parallel adaptive PDE computations
Chrisochoides, Nikos
We present a multithreaded model for the dynamic load-balancing ofnumerical, adaptive computations required for the solution of Partial Differential Equations (PDEs) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level inorder to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used as a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDEs. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
1995-10-01T00:00:00ZA Model for Evolution and Extinction
https://hdl.handle.net/1813/5556
A Model for Evolution and Extinction
Roberts, Bruce W.; Newman, M.E.J.
We present a model for evolution and extinction in large ecosystems. The model incorporates the effects of interactions between species and the influences of abiotic environmental factors. We study the properties of the model by approximate analytic solution and also by numerical simulation, and use it to make predictions about the distribution of extinctions and species lifetimes that we would expect to see in real ecosystems. It should be possible to test these predictions against the fossil record. The model indicates that a possible mechanism for mass extinction is the coincidence of a large coevolutionary avalanche in the ecosystem with a severe environmental disturbance.
1995-08-01T00:00:00ZA Method for Imaging Corrosion Damage in Thin Plates from Electrostatic Data
https://hdl.handle.net/1813/5555
A Method for Imaging Corrosion Damage in Thin Plates from Electrostatic Data
Kaup, Peter G.; Santosa, Fadil; Vogelius, Michael
The problem of quantitative nondestructive evaluation of corrosion in plates is considered. The inpection method uses boundary measurements of currents and voltages to determine the material loss caused by corrosion. The development of the method is based on linearization and the assumption that the plate is thin. The behavior of the method is examined in numerical situations.
1995-08-01T00:00:00ZMonte Carlo Study of the Random-field Ising Model
https://hdl.handle.net/1813/5554
Monte Carlo Study of the Random-field Ising Model
Newman, M.E.J.; Barkema, G.T.
Using a cluster-flipping Monte Carlo algorithm combined with a generalization of the histogram reweighting scheme of Ferrenberg and Swendsen, we have studied the equilibrium properties of the thermal random-field Ising model on a cubic lattice in three dimensions. We have equilibrated systems of L x L x L spins, with values of L up to 32, and for these systems the cluster-flipping method appears to a large extent to overcome the slow equilibration seen in single-spin-flip methods. From the results of our simulations we have extracted values for the critical exponents and the critical temperature and randomnessof the model by finite size scaling. For the exponents we find v=1.02 +- 0.06, B=0.06 +- 0.07, y=1.9 +- 0.2, and mean(y)=2.9 +- 0.2.
1995-07-01T00:00:00ZA Subspace, Interior, and Conjugate Gradient Method for Large-scale Bound-constrained Minimization Problems
https://hdl.handle.net/1813/5553
A Subspace, Interior, and Conjugate Gradient Method for Large-scale Bound-constrained Minimization Problems
Branch, Mary Ann; Coleman, Thomas F.; Li, Yuying
A subspace adaption of the Coleman-Li trust region and interior method is proposed for solving large-scale bound-constrained minimization problems. This method can be implemented with either sparse Cholesky factorization or conjugate gradient computation. Under reasonable conditions the convergence properties of this subspace trust region method are as strong as those of its full-space version. Computational performance on various large-scale test problems are reported; advantages of our approach are demonstrated. Our experience indicates our proposed method represents an efficient way to solve large-scalebound-constrained minimization problems.
1995-07-01T00:00:00ZNonlinear Wave Equations for Relativity
https://hdl.handle.net/1813/5552
Nonlinear Wave Equations for Relativity
van Putten, Maurice H.P.M.; Eardley, Douglas M.
Gravitational radiation is described by canonical Yang-Mills wave equations on the curved space-time manifold, together with evolution equations for the metric in the tangent bundle. The initial data problem is described in Yang-Mills scalar and vector potentials, resulting in Lie-constraints in addition to the familiar Gauss-Codacci relations
1995-07-01T00:00:00ZAutomatic Optimization
https://hdl.handle.net/1813/5551
Automatic Optimization
Liao, Aiping
We propose some automatic techniques for unconstrained optimization with the objective function given by some computer program. We show that the Newton step can be calculated in O(m2) operations where m is the number of stages in the function evaluation program. We also show that methods developed in Coleman and Liao [1] and Liao [8] for unconstrained discrete-time optimal control problems can be modified to handle general cases.
1995-07-01T00:00:00ZFast Compiled Logic Simulation Using Linear BDDs
https://hdl.handle.net/1813/5550
Fast Compiled Logic Simulation Using Linear BDDs
Gupta, Sudeep; Pingali, Keshav
This paper presents a new technique for compiled zero delay logic simulation, and includes extensive experiments that demonstrate its performance on standard benchmarks. Our compiler partitions the circuit into fanout-freeregions (FFRs), transforms each FFR into a linear sized BDD, and converts each BDD into executable code. In our approach, the computation is sublinear in the number of variables within each partition because only one path, from root to leaf, of the BDD is executed; therefore in many cases, substantial computation is avoided. In this way, our approach gets dome to the advantages of oblivious as well as demand-driven evaluation. We investigated the impact of the various heuristics on performance, and based on this data, we recommend good values for design parameters. A performance improvement of up to 67% over oblivious simulation is observed for our benchmarks.
1995-06-01T00:00:00ZModel Fermion Monte Carlo with Correlated Pairs
https://hdl.handle.net/1813/5549
Model Fermion Monte Carlo with Correlated Pairs
Kalos, Malvin H.
The issues that prevent the development of efficient and stable algorithms for fermion Monte Carlo in continuum systems are reexamined with special reference to the implications of the "plus/minus" symmetry. This is a property of many algorithms that use signed walkers, namely that the dynamics are unchanged when the signs of the walkers are interchanged. Algorithms that obey this symmetry cannot exhibit the necessary stability. Specifically, estimates of the overlap with any antisymmetric test function cannot be bounded away from zero in the limit of many iterations. Within the framework of a diffusion Monte Carlo treatment of the Schroedinger equation, it is shown that this symmetry is easily broken for pairs of walkers while at the same time preserving the correct marginal dynamics for each member of the pair. The key is to create different classes of correlations between members of pairs and to use (at least) two distinct correlations for a pair and for the same pair withthe signs exchanged. The ideas are applied successfully for a class of simplemodel problems in two dimensions.
1995-06-01T00:00:00ZParallel Solution of Sparse Linear Least Squares Problems on Distributed-memory Multiprocessors
https://hdl.handle.net/1813/5548
Parallel Solution of Sparse Linear Least Squares Problems on Distributed-memory Multiprocessors
Sun, Chunguang
This paper studies the solution of large-scale sparse linear least squares problems on distributed-memory multiprocessors. The method of corrected semi-normal equations is considered. New block-oriented parallel algorithms are developed for solving the related sparse triangular systems. The arithmetic and communication complexities of the new algorithms applied to regular grid problems are analyzed. The proposed parallel sparse triangular solution algorithms together with a block-oriented parallel sparse QR factorization algorithm result in a highly efficient block-oriented approach to the parallel solution of sparse linear least squares problems on distributed-memory multiprocessors. Performance of the block-oriented approach is demonstrated empirically through an implementation on an IBM Scalable POWER parallel system SP2. The largest problem solved has over two million rows and more than a quarter million columns. The execution speed for the numerical factorization of this problem achieves over 3.7 gigaflops per second on an IBM SP2 machine with 128 processors.
1995-05-01T00:00:00ZA Level-set Approach Inverse Problems Involving Obstacles
https://hdl.handle.net/1813/5547
A Level-set Approach Inverse Problems Involving Obstacles
Santosa, Fadil
An approach for solving inverse problems involving obstacles is proposed. The approach uses a level-set method which has been shown to be effective in treating problems involving moving boundaries. We develop two computational methods based on this idea. One method results in a nonlinear time-dependent partial differential equation for the level-set function whose evolution minimizes the residual in the data fit. The second method is an optimization that generates a sequence of level-set functions that reduces the residual. The methods are illustrated in two applications: a deconvolution problem, and a diffraction screen reconstruction problem.
1995-05-01T00:00:00ZEigenmodes of Isospectral Drums
https://hdl.handle.net/1813/5546
Eigenmodes of Isospectral Drums
Driscoll, Tobin A.
Recently it was proved that there exist nonisomeric planar regions that have identical Laplace spectra. That is, one cannot "hear the shape of a drum." All known examples of such regions are bounded by polygons with reentrant corners. While the isospectrality can be proven mathematically, analytical techniques are unable to produce eigenvalues themselves. Furthermore, standard numerical methods for computing the eigenvalues, such as adaptive finite elements, are highly inefficient. Physical experiments have been performed to measure the spectra, but the accuracy and flexibility of this method are limited. We describe an algorithm due to Descloux and Tolley that blends finite elements with domain decomposition, and show that, with a modification that doubles its accuracy, this algorithm can be used to compute efficiently the eigenvalues for polygonal regions. We present results accurate to twelve digits for the most famous pair of isospectral drums, as well as results for another pair.
1995-05-01T00:00:00ZMass-Extinction: Evolution and the Effects of External Influences on Unfit Species
https://hdl.handle.net/1813/5545
Mass-Extinction: Evolution and the Effects of External Influences on Unfit Species
Newman, M.E.J.; Roberts, B.W.
We present a new model for extinction in which species evolve in bursts or 'avalanches,' during which they become on average more susceptible to environmental stresses such as harsh climates and so are more easily rendered extinct. Results of simulations and analytic calculations using our model show a power-law distribution of extinction sizes which is in reasonable agreement with fossil data. We also see a number of features qualitatively similar to those seen in the fossil record. For example, we seefrequent smaller extinctions in the wake of a large mass extinction, which arise because there is reduced competition for resources in the aftermath of a large extinction event, so that species which would not normally be able to compete can get a foothold, but only until the next cold winter or bad attack of the flu comes along to wipe them out.
1995-03-01T00:00:00ZA Quasi-Newton L2-Penalty Method for Minimization Subject to Nonlinear Constraints
https://hdl.handle.net/1813/5544
A Quasi-Newton L2-Penalty Method for Minimization Subject to Nonlinear Constraints
Coleman, Thomas F.; Yuan, Wei
We present a modified L2 penalty function method for equality constrained optimization problems. The pivotal feature of our algorithm is that at every iterate we invoke a special change of variables to improve the ability of the algorithm to follow the constraint level sets. This change of variables gives rise to a suitable block diagonal approximation to the Hessian which is then used to construct a quasi-Newton method. We show that the complete algorithm is globally convergent with a local Q-superlinearly convergence rate. Preliminary results are given for a few problems.
1995-02-01T00:00:00ZA New Trust Region Algorithm for Equality Constrained Optimization
https://hdl.handle.net/1813/5543
A New Trust Region Algorithm for Equality Constrained Optimization
Coleman, Thomas F.; Yuan, Wei
We present a new trust algorithm for solving nonlinear equality constrained optimization problems. At each iterate a change of variables is performed to improve the ability of the algorithm to follow the constraint level sets. The algorithm employs L2 penalty function for obtaining global convergence. Under certain assumptions we prove that this algorithm globally converges to a point satisfying the second order necessary optimally conditions; the local convergence rate is quadratic. Results of preliminary numerical experiments are presented.
1995-01-01T00:00:00ZEnhancement of Environments for Analysis of Trace Files of Parallel Programs
https://hdl.handle.net/1813/5542
Enhancement of Environments for Analysis of Trace Files of Parallel Programs
Avula, Veena
One of the important phases of parallel programming is performance analysis. Trace data provides information about where time is spent in programs. Since this data is huge, a tool for analyzing and visualizing the trace data is convenient and necessary for performance analysis of parallel programs. Environments which provide such a faclility are many and varied. In this report, we discuss our work on the enhancement of one such environment for accessibility over more platforms and better visualization capabilities. The environment is Pablo.
1994-12-01T00:00:00ZComplete Orthogonal Decomposition for Weighted Least Squares
https://hdl.handle.net/1813/5541
Complete Orthogonal Decomposition for Weighted Least Squares
Hough, Patricia D.; Vavasis, Stephen A.
Consider a full-rank weighted least squares problem in which the weight matrix is highly ill-conditioned. Because of the ill-conditioning, standard methods for solving least-squares problems, QR factorization and the nullspace method for example, break down. G.W. Stewart established a norm bound for such a system of equations, indicating that it may be possible to find an algorithm that gives an accurate solution. S.A. Vavasis proposed a new definition of stability that is based on this result. He also defined the NSH algorithm for solving this least-squares problem and showed that it satisfies his definition of stability. In this paper, we propose a complete orthogonal decomposition algorithm to solve this problem and show that it is also stable. This new algorithm is simpler and more efficient than the NSH method.
1994-12-01T00:00:00ZARCH, An Object-Oriented Library for Asynchronous and Loosely Synchronous System Programming
https://hdl.handle.net/1813/5533
ARCH, An Object-Oriented Library for Asynchronous and Loosely Synchronous System Programming
Adamo, Jean-Marc
ARCH is a C++-based library for asynchronous and loosely synchronous system programming. The current version offers a set of programming constructs that are outlined below:
*** Threads: The construct is presented as a class from which the user can derive his own classes. The class encapsulates a small set of status variables and offers a set of functions for declaration, initialization, scheduling, priority setting, yielding and stopping.
*** Processes: A process is a more regular and structured programming construct whose scheduling and termination obey additional synchronization rules. Together with the synchronous point-to-point communication system offered in the library (see below), processes favor a parallel programming style similar to OCCAM's (actually, an extension of it that removes most static features and allows processes to share data). The semantics of this model is well understood and will undoubtedly facilitate the development of correct large asynchronous code. The library has been designed so that the C++ compiler is able to check the static semantics of programs (complete type checking, send-recv correct matching, ...).
*** Synchronous communication: Threads and processes synchronize and communicate via communication channels. There are four types of communication channels for local or remote synchronization or synchronous point-to-point communication. Inter-processor channels are essentially tools for building virtual topologies. The channel classes offer functions to send to or receive from a channel and get the size of the latest received message. More specialized synchronization-communication tools can be derived from channels.
*** Global data and pointers: Beside threads, the library offers basic tools for developing distributed data abstractions. Global data are data that can be defined at given locations in the distributed memory but are visible from all processors. Global pointers are a generalization of C++ pointers that allow for addressing global data at any place over the distributed memory. As usual pointers, global pointers are subjected to arithmetic and logic manipulations (incrementation, dereferencing, indexing, comparision...). The library provides basic operators for global data and pointer definition.
*** Global read/write functions: Global pointer expressions provide global references over the distributed memory that can subsequently be used asarguments to global read/write functions. These functions allow the processors to get access to all global data regardless of their locations over the distributed memory. In their most complete form, the read/write functions operate as remote procedure calls. At the programmer's level, global read/write functions appear as "one-sided": a read/write operationis executed on the processor that needs to read/write global data but need not be explicitly handled by the processor associated to the memory holding the data.
*** Spread and remote Arrays. Two basic distributed data structures have been built in the library. Spread arrays are arrays that have some of their dimensions spread over the distributed memory according to a given policy. Remote arrays are arrays that are defined at a given place in the distributed memory but can be accessed from any other. The spread and remote array classes (SpreadArray and RemoteArray) provide functions for global reference calculation. Global references can subsequently be used as arguments to global read/write functions. One can specialize global pointers to operate on spread or remote arrays. The global pointer class (Star class) offers distinct arithmetic and logic operator sets for unassigned, spread and remote global pointers. The library encourages parallel code writing in a style that relies on the object-oriented approach: first, build the abstractions that the application at hand relies on; next, make an efficient implementation of the abstraction; and finally, develop the application on top of them. The abstractions can be distributed data types derived from those built in the library (spread and remote arrays: see code of the segmentation algorithm provided with the library) or new distributed types built in the same way or types reused from other applications. This approach should favor parallel code production with many desirable properties such as efficiency, portability, reusability, ... . The library uses MPI as a communication interface. The current implementation runs on the IBM-SP2. Two versions of the library have currently been released.The first one is based on the IBM C++ compiler and MPI library. The second one makes use of the GNU g++ compiler and the MPICH public domain version of MPI. Porting the latter to any parallel machine supporting these two software systems should be straightforward.
1995-12-01T00:00:00ZAn Affine Scaling Algorithm for Minimizing Total Variation in Image Enhancement
https://hdl.handle.net/1813/5532
An Affine Scaling Algorithm for Minimizing Total Variation in Image Enhancement
Li, Yuying; Santosa, Fadil
A computational algorithm is proposed for image enhancement based on total variation minimization with constraints. This constrained minimization problem is introduced by Rudin et al [13,14,15] to enhance blurred and noisy images. Our computational algorithm solves the constrained minimization problem directly by adapting the affine scaling method for the unconstrained l 1 problem [3]. The resulting computational scheme, when viewed as an image enhancement process, has the feature that it can be used in an interactive manner in situations where knowledge of the noise level is either unavailable or unreliable. This computational algorithm can be implemented with a conjugate gradient method. It is further demonstrated that the interactive enhancement process is efficient.
1994-12-01T00:00:00ZA Trust Region and Affine Scaling Method for Nonlinearly Constrained Minimization
https://hdl.handle.net/1813/5531
A Trust Region and Affine Scaling Method for Nonlinearly Constrained Minimization
Li, Yuying
(The following contains mathematical formulae and symbols that may become distorted in ASCII text.) A nonlinearly constrained optimization problem can be solved by the exact penalty approach involving non differentiable functions (summation(i)of |ci(x)|) and (summation(i) of max(0,ci(x))). In the paper, a trust region affine scaling approach based on a 2-norm subproblem is proposed for solving a nonlinear l 1 problem. The (quadratic) approximation and the trust region subproblem are defined using affine scaling techniques. Explicit sufficient decrease conditions based on the approximations are suggested for obtaining a limit point satisfying complementarity, Kuhn-Tucker conditions, and second order necessary conditions. In global convergence analysis of the method is presented in [4].
1994-11-01T00:00:00ZOn Global Convergence of a Trust Region and Affine Scaling Method for Nonlinearly Constrained Minimization
https://hdl.handle.net/1813/5530
On Global Convergence of a Trust Region and Affine Scaling Method for Nonlinearly Constrained Minimization
Li, Yuying
(The following contains mathematical formulae and symbols that may become distorted in ASCII text.) A nonlinearly constrained optimization problem can be solved by the exact penalty approach involving non differentiable functions (summation(i)of |ci(x)|) and (summation(i) of max(0,ci(x))). In [11], a trust region affine scaling approach based on a 2-norm subproblem is proposed for solving a nonlinear l 1 problem. The (quadratic) approximation and the trust region subproblem are defined using affine scaling techniques. Explicit sufficient decrease conditions are proposed to obtain a limitpoint satisfying complementarity, dual feasibility, and second order optimality. In this paper, we present the global convergence properties of this new approach.
1994-11-01T00:00:00ZSlip Complexity in a Crustal-Plane Model of an Earthquake Fault
https://hdl.handle.net/1813/5529
Slip Complexity in a Crustal-Plane Model of an Earthquake Fault
Myers, Christopher R.; Shaw, Bruce E.; Langer, J. S.
We study numerically the behavior of a two-dimensional elastic plate (acrustal plane) that terminates along one of its edges at a homogeneous fault boundary. Slip-weakening friction at the boundary, inertial dynamics in the bulk, and uniform slow loading via elastic coupling to a substrate combine to produce a complex, deterministically chaotic sequence of slipping events. We observe a power-law distribution of small to moderately large events and an excess of very large events. For the smaller events, the moments scale with the rupture length in a manner that is consistent with seismological observations. For the largest events, rupture occurs in the form of narrow propagating pulses.
1994-10-01T00:00:00ZAdvanced Computing Research Institute Annual Research Activity Report September 1993 - September 1994
https://hdl.handle.net/1813/5528
Advanced Computing Research Institute Annual Research Activity Report September 1993 - September 1994
Pingali, Keshav
The Advanced Computing Research Institute (ACRI) is a unit of the Cornell Theory Center and is affiliated with the Cornell Computer Science Department. The ACRI is concerned with research in scientific computation and its application to engineering and scientific problems with the emphasis on the use and potential of advanced computer architecture and environments. Research areas include restructuring compilers for scientific computation, and the design of algorithms for numerical linear algebra, optimization, and differential equations. Currently, ACRI researchers are collaborating on several large-scale applications in the computational sciences, including: protein-folding and related molecular chemistry problems, structural optimization and biomechanics, particle methods for turbulent combustion, discrete-control problems, and the application of boundary elementmethods. The parallel computers available to ACRI for research include the Theory Center machines - a 128-node KSR computer, a 64-node IBM SP, and a network of IBM RS/6000s. This report contains a short summary of the progress made in the last year on each of the four main projects: parallelizing compilers, computational linear algebra, computational optimization, and numerical methods for partial differential equations. Included also are a list of ACRI researchers and their research interests, a list of technical reports produced this last year, and a list of ACRI seminars.
1994-10-01T00:00:00ZGetting CUTE with Matlab
https://hdl.handle.net/1813/5527
Getting CUTE with Matlab
Branch, Mary Ann
CUTE is a testing environment for nonlinear programming algorithms developed by Bongartz, Conn, Gould, and Toint ([CUTE: Constrained and unconstrained testing environment , Research Report RC 18860, IBM T.J. Watson Research Center, Yorktown Heights, USA, 1993]). The test problems in this environment are encoded in standard input format (SIF) and accessed by a set of FORTRAN subroutines. An extension to this environment was developed to allow fast access to the CUTE test problems from Matlab. This report describes this new Matlab interface to CUTE and how to use it.
1994-09-01T00:00:00ZPull the Weighted Center Towards the Solution of LP
https://hdl.handle.net/1813/5526
Pull the Weighted Center Towards the Solution of LP
Liao, A.
In the paper of Liao and Todd [3] two weighted centers are introduced and used to design algorithms for solving systems of linear inequalities. The linear programming problems can be solved via the weighted centers of a sequence of linear inequalities formed by letting the objective be an extra constraint and increasing the lower bound corresponding to the objective function as long as it is possible. In this paper we study the second kind of weighted center of [3] which ismore computationally oriented and show that, under a regularity assumption, the weighted center of the linear inequality with the objective as an extra constraint converges to the solution of the linear programming problem under consideration as the upper bound corresponding to the objective function is pulled towards the infinity. We propose a relaxed version of one of the algorithms of [3]. This modified version does not try to find an accurate center during each iteration; instead, an approximate center which is the k-th feasible iterate is determined in the k-th iteration. We show that this modified algorithm finds an e-solution in finity many iterations. Some limited numerical results are presented to compare our algorithm with the simplex method and indicate that our algorithm is promising.
1994-08-01T00:00:00ZA New Parallel Algorithm for Global Optimization with Application to the Molecular Cluster Problem
https://hdl.handle.net/1813/5525
A New Parallel Algorithm for Global Optimization with Application to the Molecular Cluster Problem
Liao, Aiping
In this paper we present a simple algorithm for global optimization. This algorithm combines random searches with efficient local minimization algorithms. The proposed algorithm begins with an initial "local minimizer." In each iteration, a search direction is generated randomly, along which some points are chosen as the initial points for the local optimization algorithm and several "local minimizers" are obtained. The next iteration is determined by comparing these localminimizers. We will discuss the expected number of iterations for finding a global minimizer with this algorithm. Several variants of the algorithm that take advantage of the partially separable structure are proposed for the Lennard-Jones cluster problem and tested on the IBM SP1 parallel computer. Our numerical results show that our algorithms are promising.
1994-08-01T00:00:00ZThe Lambda Loop Transformation Toolkit (User's Reference Manual)
https://hdl.handle.net/1813/5524
The Lambda Loop Transformation Toolkit (User's Reference Manual)
Li, Wei; Pingali, Keshav
Loop transformations are becoming critical to exploiting parallelism and data locality in parallelizing and optimizing compilers. This document describes the Lambda loop transformation toolkit, an implementation of the non-singular matrix transformation theory, which can represent any linear one-to-one transformation. Lambda has a simple interface, and is independent of any compiler intermediate representation. It has been used in parallelizing compilers for multiprocessor machines as well as optimizing compilers for uniprocessor machines.
1994-08-01T00:00:00ZAll-Electron Study of Gradient Corrections to the Local Density Functional in Metallic Systems
https://hdl.handle.net/1813/5523
All-Electron Study of Gradient Corrections to the Local Density Functional in Metallic Systems
Khein, Alexander; Singh, D. J.; Umrigar, C. J.
Using the all-electron Linearized Augmented Plane Wave (LAPW) method, we calculate the effect of including gradient corrections to the exchange correlation functional on the structural properties of the simple metal Al, transition metals Ta, W, Pt, and noble metals Cu, Ag, Au. For all the systems studied, the local density approximation (LDA) yields bond-lengths that are too short and bulk moduli that are too large. The generalized gradient functional introduced by Perdew and Wang (PW91) yields corrections that are in the right direction (larger bond-lengths and smaller bulk moduli), but it frequently over compensates, particularly for the heavier elements. The PW 91 functional predicts the lattice constant and bulk modulus of Al and Cu more accurately than the LDA but yields values thatare less accurate than the LDA for W, Pt, Au.
1994-08-01T00:00:00ZSoftware Management Tools
https://hdl.handle.net/1813/5522
Software Management Tools
Toll, Thomas F.
This report comprises the final report for a Cornell Theory Center intership during the Spring 1994 semester. The goal of the internship was to organize a major software package for a research group in chemistry to be portable, flexible, and easy to maintain.
1994-08-01T00:00:00ZParallel Multifrontal Solution of Sparse Linear Least Squares Problems on Distributed-memory Multiprocessors
https://hdl.handle.net/1813/5521
Parallel Multifrontal Solution of Sparse Linear Least Squares Problems on Distributed-memory Multiprocessors
Sun, Chunguang
We describe the issues involved in the design and implementation of efficient parallel algorithms for solving sparse linear least squares problems on distributed-memory multiprocessors. We consider both the QR factorization method due to Golub and the method of corrected semi-normal equations due to Bjorck. The major tasks involved are sparse QR factorization, sparse triangular solution and sparse matrix-vector multiplication. The sparse QR factorization is accomplished by a parallel multifrontal scheme recently introduced. New parallel algorithms for solving the related sparse triangular systems and for performing sparse matrix-vector multiplications are proposed. The arithmetic and communication complexities of our algorithms on regular grid problems are presented. Experimental results on an Intel iPSC/860 machine are described.
1994-07-01T00:00:00ZTight Binding Molecular Dynamics on Parallel Computers
https://hdl.handle.net/1813/5520
Tight Binding Molecular Dynamics on Parallel Computers
Goedecker, S.; Colombo, L.
With a new and intrinsically parallel algorithm for Tight Binding Molecular Dynamics we obtain a performance of 3.4 Gigaflops per million dollar on a cluster of 8 Hewlett Packard workstations in a simulation of 216 Silicon atoms. One time step with this new algorithm takes as much time for the 216 atom system as one time step with a conventional algorithm on a NEC-SX3 supercomputer. In addition, the linear scaling of new algorithm allows us to calculate systems of unprecendented size which are not any more accessible by the combination of standard algorithms and vector supercomputers.
1994-06-01T00:00:00ZOptimization and Parallelization of a Commodity Trade Model for the SP1, Using Parallel Programming Tools
https://hdl.handle.net/1813/5519
Optimization and Parallelization of a Commodity Trade Model for the SP1, Using Parallel Programming Tools
Bergmark, Donna; Pottle, Marcia
We compare two different approaches to parallelization of Fortran programs. The first approach is to optimize the serial code so that it runs as fast as possible on a single processor, and then optimize the parallel version. In this paper a variety of parallel programming tools is used to obtain an optimal, parallel version of an economic policy modelling application for the IBM SP1. We apply a new technique called Data Access Normalization; we use an extended ParaScope as our parallel programming environment; we use FORGE 90 as our parallelizer; and we use KAP as our optimizer. We make a number of observations about the effectiveness of these tools. Both strategies obtain a working, parallel program, but use different tools to get there. On this occasion, both KAP and Data Access Normalization lead to the same critical transformation of inverting four of the twelve loop nests in the original program. The next most important optimization is parallel I/O, one of the few transformations that had to be done by hand. Speedups are obtained on the SP1 (using MPLp communication over the High Speed Switch).
1994-06-01T00:00:00ZThe Integration of ParaScope and Lambda
https://hdl.handle.net/1813/5518
The Integration of ParaScope and Lambda
Bergmark, Donna; Presberg, David
We have been experimenting with combining three powerful language tools for large, scientific, parallel Fortran codes. One tool is ParaScope, a programming envirnment; another tool is the Lambda Toolkit, a collection of routines for performing loop transformations using invertible matrices; the third is FORGE 90, a collection of tools for parallelizing Fortran programs. Initial sucess with incorporating the Lambda Toolkit into ParaScope led us to undertake the work leading to a new program preparation strategy, in which one first uses a modified ParaScope to perform DataAccess Normalization, then uses FORGE 90 to produce a parallel program for a distributed memory platform. We describe the details of this strategy and present some performance results for the IBM SP1. We conclude that the combination of ParaScope and the Lambda Toolkit (called "ped-Lambda") is a useful transformation tool.
1994-06-01T00:00:00ZCalculation of Pseudospectra by the Arnoldi Iteration
https://hdl.handle.net/1813/5517
Calculation of Pseudospectra by the Arnoldi Iteration
Toh, Kim-Chuan; Trefethen, Lloyd N.
The Arnoldi iteration, usually viewed as a method for calculating eigenvalues, can also be used to estimate pseudospectra. This possibility may be of practical importance, for in applications involving highly non-normal matrices or operators, such as hydrodynamic stability, pseudospectra may be physically more significant than spectra.
1994-05-01T00:00:00ZRate Equations for the Growth of Cu Islands on Cu(001)
https://hdl.handle.net/1813/5516
Rate Equations for the Growth of Cu Islands on Cu(001)
Biham, Ofer; Barkema, G.T.; Breeman, M.
The kinetics of island nucleation and growth during deposition of Cu atoms on Cu(001) is studied using rate equations. The equations are derived using microscopic calculations of the energy landscape on the surface, previously used in Monte Carl (MC) simulations. This allows a quantitative comparison between the rate equations and the MC results. Our rate equations take into account atoms that fall on the bare substrate as well as on top of existing islands, the mobility of single atoms and small islands, the coalescence of adjacent islands and the possible separation of atoms from island edges. The rate equations are used to explore the island size distribution and island density as a function of the coverage and deposition rates. These rate equations provide a useful and flexible tool that allows to easily modify particular microscopic properties of the system such as the mobility of small islands or the rate of coalescence and examine their effect while leaving all other features intact.
1994-05-01T00:00:00ZExploitation of Latency Hiding on the KSR1 - Case Study: The Barnes Hut Algorithm
https://hdl.handle.net/1813/5515
Exploitation of Latency Hiding on the KSR1 - Case Study: The Barnes Hut Algorithm
Tumuluri, Chaitanya; Choudhary, Alok N.
This study is aimed at examining the performance of dynamic, irregular and loosely synchronous class of applications on the KSR1 distributed shared memory COMA system. The Barnes-Hut tree based algorithm for simulating galactic evolution [1], was chosen as a representative of this class of applications. The performance measures include the overall time-stepping loop execution time, the efficacy of the scaling rules (EES and RCTS) proposed in [2] as well as the computational load balance achieved by the CostZone data partitioning scheme [1] under these scaling rules. We define notions of geographical locality, transfer locality flux and partition locality flux to explain the sources of remote memory accesses in the application. The contributions of our study include two runtime latency hiding techniques PST and PREFH proposed for the effective and automatic utilization of the poststore and prefetch instructions to hide the latencies of remote memory accesses. The architectural support assumed, compiler analysis required and code instrumentation schemes for the implementation of the PST and PREFH techniques are presented in this paper. We also examine the scalability of our schemes under the afore mentioned scaling rules. These schemes were tuned for a 32k particle simulation size on a 112 processor configuration, producing a reduction of 30% in the overall loop execution time of the simulation. Further, a combination of the schemes, PREPST, produced an overall reduction of 50% in the loop execution time of simulation. These improvements were traced to a reduction in the problems of locality fluxes which arose as the application was scaled under the EES and RCTS rules/ Interestingly, the problems of locality fluxes manifested themselves as load imbalance conditions in the application. We found that our schemes did not scale too well under EES scaling, but produced appreciable reductions in execution timings under RCTS scaling. It needs to be emphasized that our work involved the study of a whole application and on a 128 processor KSR1 machine, as opposed to most other work reported to date which examine performances of computational kernels on 32 or 64 processor configurations only.
1994-05-01T00:00:00ZParallel Continuation-Based Global Optimization for Molecular Conformation and Protein Folding
https://hdl.handle.net/1813/5514
Parallel Continuation-Based Global Optimization for Molecular Conformation and Protein Folding
Coleman, Thomas F.; Wu, Zhijun
This paper presents our recent work on developing parallel algorithms and software for solving the global minimization problem for molecular conformation, especially protein folding. Global minimization problems are difficult to solve when the objective functions have many local minimizers, such as the energy functionsfor protein folding. In our approach, to avoid directly minimizing a "difficult" function, a special integral transformation is introduced to transform the function into a class of gradually deformed, but "smoother" or "easier" functions. An optimization procedure is then applied to the new functions successively, to trace their solutions back to the original function. The method can be applied to a large class of nonlinear partially separable functions including energy functions for molecular conformation and protein folding. Mathematical theory for the method, as a special continuation approach to global optimization, is established. Algorithms with different solutions tracing strategies are developed. Different levels of parallelism are exploited for the implementation of the algorithms on massively parallel architectures.
1994-03-01T00:00:00ZFast Wavelet Transforms for Matrices Arising from Boundary Element Methods
https://hdl.handle.net/1813/5513
Fast Wavelet Transforms for Matrices Arising from Boundary Element Methods
Bond, David M.; Vavasis, Stephen A.
(The following contains mathematical formulae and symbols that may become distorted in ASCII text.) For many boundary element methods applied to Laplace's equation in two dimensions, the resulting integral equation has both an integral with a logarithmic kernel and an integral with a discontinuous kernel. If standard collocation methods are used to discretize the integral equation we are left with two dense matrices. We consider expressing these matrices in terms of wavelet bases with compact support via a fast wavelet transform as in Beylkin, Coifman, and Rokhlin. Upper bounds on the size of the wavelet transform elements are obtained. These bounds are then used to show that if the original matrices are of the size N x N, the resulting transformed matrices are sparse, having only O(N log N) significant entries. Some numerical results will also be presented. Unlike Beylkin, Coifman and Rokhlin who use the fast wavelet transform as a numerical approximation to a continuous operator already expressed in a full wavelet basis of L2(IR), we think of the fast wavelet transform as a change of basis matrix for a finite dimension, and apply it to a discretized function or matrix. As a result, we can use this fast wavelet transform as a "black box" transformation in existing boundary element codes.
1994-03-01T00:00:00ZVectorization of Multiple Small Matrix Problems
https://hdl.handle.net/1813/5512
Vectorization of Multiple Small Matrix Problems
Pitsianis, Nikos; Loan, Charles Van
Multiple independent matrix problems of very small size appear in a variety of different fields. In this work, we study the implementation of elementary linear algebra subroutines so as to best use vectorizing compilers and vector hardware for multiple small problem instances. We cheek the performance improvement over the single-instance optimized codes on different vector supercomputers. We also describe how to automate the transformation of a single-instance linear algebra solver into a multiple instance solver
1994-03-01T00:00:00ZParallel Simulation of the Ising Model
https://hdl.handle.net/1813/5511
Parallel Simulation of the Ising Model
MacFarland, T.; Barkema, G. T.
(The following contains mathematical formula and symbols that may become distorted in ASCII text.) New methods for parallelizing Ising model simulations are presented. A parallel single-spin Metropolis algorithm has been implemented with a speedup of 27 on 50 processors of the KSR-1 parallel computer. Our parallel Swendsen-Wang algorithm obtains a speedup of 3.2 on 9 processors of the same computer. Both of these simulations were carried out on 200x200 lattices. The parallel Local Cluster algorithm has been implemented with an almost linear speedup. We also discuss ongoing research using the parallel Local Cluster algorithm.
1994-03-01T00:00:00ZAn Efficient Linear Scaling Algorithm for Tight Bonding Molecular Dynamics
https://hdl.handle.net/1813/5510
An Efficient Linear Scaling Algorithm for Tight Bonding Molecular Dynamics
Goedecker, S.; Colombo, L.
A novel formulation for tight binding total energy calculations and tight binding molecular dynamics, which scales linearily with the size of the system, is presented. The linear complexity allows us to treat systems of very large size and the algorithm is already faster than the best implementation of classical diagonalization for systems of 64 atoms. In addition, it is naturally parallelizable and it permits us therefore to perform molecular dynamics simulations of systems of unprecedented size. Finite electronic temperatures can also be taken into account. We illustrate this method by investigating structural and dynamical properties of solid and liquid carbon at different densities.
1994-03-01T00:00:00ZHow Parallel Programming Tools are Used
https://hdl.handle.net/1813/5509
How Parallel Programming Tools are Used
Torne, Jorge F.
Parallel programming is a hot topic among scientists and engineers. The number of parallel machines available to them is constantly increasing. These powerful machines provide a new research platform for engineers and scientists. In order to exploit all their power, new programming techniques are needed. Fortunately, several parallel programming tools have appeared to make this complex programming endeavor much easier. We analyze paths to parallel programming from traditional scientific programming through parallelization of a molecular simulation application. We hope this paper encourages scientists and engineers at Cornell's Theory Center to take the step towards parallelization.
1994-02-01T00:00:00ZExperiments in the Concurrent Computation of Spatial Association on the KSR1
https://hdl.handle.net/1813/5508
Experiments in the Concurrent Computation of Spatial Association on the KSR1
Marciano, Richard J.; Armstrong, Marc P.; Pavlik, Clarie E.
Spatial association measures, when computed for large data sets, have significant computational requirements. Parallel processing is one way to address these requirements. The design of parallel programs, however, requires careful planning since many factors under programmer control affect the efficiency of the resulting computations. In this paper, two strategies of parallel processing are described using as an illustration a measure of spatial association, G(d). An evaluation is made of the efficiency of the parallel implementations by varying the problem size and the number of processors used in parallel. The results obtained indicate that parallel processing can currently enable analysts to work in near-real-time with problems that range in the tens of thousands of observations; such problems require less than two minutes of execution time on the KSR1. Also super linear speed ups of almost 1500 are obtained for the computation phase of the problem.
1994-01-01T00:00:00ZThe Application of Automatic Differentiation to Problems in Engineering Analysis
https://hdl.handle.net/1813/5507
The Application of Automatic Differentiation to Problems in Engineering Analysis
Chinchalkar, Shirish
Automatic differentiation is a technique of computing the derivative of a function or a subroutine written in a higher level language such as FORTRAN or C. Significant progress has been made in this field in the last few years. Here, we give a short exposition to automatic differentiation and demonstrate its applicability to several fields of engineering analysis.
1993-12-01T00:00:00ZSome Efficient Algorithms for Unconstrained Discrete-time Optimal Control Problems
https://hdl.handle.net/1813/5506
Some Efficient Algorithms for Unconstrained Discrete-time Optimal Control Problems
Liao, Aiping
The differential dynamic programming algorithm (DDP)and the stagewise Newton procedure are two typical examples of efficient local procedures for discrete-timeoptimal control (DTOC) problems. It is desirable to generalize these local procedures to globally convergent methods. One successful globalization was recently proposed by Coleman and Liao [3] which combines the trust region idea with Pantoja's stagewise Newton procedure. In this paper, we propose several algorithms for DTOC problems which combine a modified "dogleg" algorithm with DDP or Pantoja's Newton procedure. These algorithms possess advantages of both the dogleg algorithm and the DDP or the stagewise procedure, i.e.,they have strong global and local convergence properties yet remain economical. Numerical results are presentedto compare these algorithms and the Coleman-Liao algorithm.
1993-11-01T00:00:00ZPseudospectra of the Wave Operator with an Absorbing Boundary
https://hdl.handle.net/1813/5505
Pseudospectra of the Wave Operator with an Absorbing Boundary
Driscoll, Tobin A.; Trefethen, Lloyd N.
For systems which can be described by u(sub t) = Au with a highly non-normal matrix or operator A, the spectrum of A may describe the behavior of the system poorly. One such operator arises from the one-dimensional wave equation on a finite interval with a homogeneous Dirichlet condition at one end and a linear damping condition at the other. In this paper the pseudospectra (norm of the resolvent) of this operator are computed in an energy norm, using analytical techniques and computations with discrete approximations. When the damping condition is perfectly absorbing, the pseudospectra are half-planes parallel to the imaginary axis, and in other cases they are periodic in the imaginary direction and approximate strips of finite thickness. The non-normality of the operator is related to the behavior of the system and the limitations of spectral analysis.
1993-10-01T00:00:00ZAn Accelerated Interior Point Method Whose Running Time Depends Only on A
https://hdl.handle.net/1813/5504
An Accelerated Interior Point Method Whose Running Time Depends Only on A
Vavasis, Stephen A.; Ye, Yinyu
We propose a "layered-step" interior point (LIP) algorithm for linear programming. This algorithm follows the central path, either with shortsteps or with a new type of step called a "layered least squares" (LLS)step. The algorithm returns the exact global minimum after a finite numberof steps - in particular, after O (mathematical symbol omitted) iterations, where c(A) is a function of the coefficient matrix. The LLS steps can be thought of as accelerating a path-following interior point method whenever near-degeneracies occur. One consequence of the new method is a new characterization of the central path: we show that it composed of at most n-squared alternating straight and curved segments. If the LIP algorithm is applied to integer data, we get as another corollary a new proof of a well-known theorom by Tardos that linear programming can be solved in strongly polynomial time provided that A contains small-integerentries.
1993-10-01T00:00:00ZExact Monte Carlo Calculations for Fermions on a Parallel Machine
https://hdl.handle.net/1813/5503
Exact Monte Carlo Calculations for Fermions on a Parallel Machine
Zhang, Shiwei; Kalos, M.H.
We describe how a recently published algorithm--which addresses the sign problem with the context of the Green's function Monte Carlo method--can be implemented in a parallel distributed environment. The method of parallelization maintains large granularity and therefore low overhead. Despite the stochastic nature of the algorithm, good load-balancing can be accomplished and reproducibility is ensured.
1993-08-01T00:00:00ZParallel Structural Optimization Applied to Bone Remodeling on Distributed Memory Machines
https://hdl.handle.net/1813/5502
Parallel Structural Optimization Applied to Bone Remodeling on Distributed Memory Machines
Chinchalkar, Shirish; Coleman, Thomas F.
This paper demonstrates parallel structural optimization methods on distributed memory MIMD machines. We have restricted ourselves to the simpler case of minimizing a multivariate non-linear function subject to bounds on the independent variables, when the objective function is expensive to evaluate as compared to the linear algebra portion of the optimization. This is the case in structural applications, when a large three-dimensional finite element mesh is used to model the structure. This paper demonstrates how parallelism can be exploited during the function and gradient computation as well as the optimization iterations. For the finite element analysis, a 'torus-wrapped' skyline solver is used. The reflective Newton method which attempts to reduce the number of iterations at the expense of more linear algebra per iteration is compared with the more conventional active set method. All code is developed for an Intel iPSC/860, but it can be ported to other distributed memory machines. The methods developed are applied to problems in bone remodeling. In the area of biomechanics, optimization models can be used to predict changes in the distribution of material properties in bone due to the presence of an artificial implant. The model we have used minimizes a linear combination of the mass and strain energy in the entire domain subject to bounds on the densities in each finite element. Early results show that the early reflective Newton method can outperform active set methods when a significant number of variables are not active at the minimum.
1993-07-01T00:00:00ZSolving LP Problems via Weighted Centers
https://hdl.handle.net/1813/5501
Solving LP Problems via Weighted Centers
Liao, Aiping; Todd, Michael J.
The feasibility problem for a system of linear inequalities can be converted into an unconstrained optimization problem using ideas from the ellipsoid method, which can be viewed as a very simple minimization technique for the resulting nonlinear function. Using more sophisticated algorithms, we develop and investigate more efficient methods, which lead to two kinds of weighted centers for the feasible set. With these centers, we develop new algorithms for solving linear programming problems.
1993-07-01T00:00:00ZAn Efficient Trust Region Method for Unconstrained Discrete-Time Optimal Control Problems
https://hdl.handle.net/1813/5500
An Efficient Trust Region Method for Unconstrained Discrete-Time Optimal Control Problems
Coleman, Thomas F.; Liao, Aiping
Discrete-time optimal control (DTOC) problems are large-scaleoptimization problems with a dynamic structure. In previous work this structure has been exploited to provide very fast and efficient local procedures. Two examples are the differential dynamic programming algorithm (DDP) and the stagewise Newton procedure - both require only O(N) operations per iteration, where N is the number of time steps. Both exhibit a quadratic convergence rate. However, most algorithms in this category do not have a satisfactory global convergence strategy. The most popular global strategy is shifting: this sometimes works poorly due to the lack of automatic adjustment to the shifting element. In this paper we propose a method that incorporates the trust region idea with the local stagewise Newton's method. This method possesses advantages of both the trust region idea and the stagewise Newton's method, i.e., our proposed method has strong global and local convergence properties yet remains economical. Preliminary numerical results are presented to illustrate the behavior of the proposed algorithm. We also collect in the Appendix some DTOC problems that have appeared in the literature.
1993-07-01T00:00:00ZThe Effective Energy Transformation Scheme as a General Continuation Approach to Global Optimization with Application to Molecular Conformation
https://hdl.handle.net/1813/5499
The Effective Energy Transformation Scheme as a General Continuation Approach to Global Optimization with Application to Molecular Conformation
Wu, Zhijun
This paper discusses a generalization of the special function transformation scheme for global minimization for molecular conformation used in [3,4,14,15]. Theories for the method as a general continuation approach are established. We show that the method can transform an onlinear objective function into a class of gradually deformed, but "smoother", functions. An optimization procedure can then be applied to the new functions successively, to trace the solution back to theoriginal function. Two types of transformation are defined: the isotropic and the anisotropic. We show that both, although not applicable numerically to arbitrary functions because of the required high dimensional integration, can be applied to a large class of nonlinear partially separable functions, and, in particular, the energy functions for molecular conformation. Methods to compute exactly the required transformations are given. Advantages of this transformation approach over the conventional homotopy methods also are discussed.
1993-07-01T00:00:00ZModeling the Landscape as a Dynamic Mosaic of Patches: Some Computational Aspects
https://hdl.handle.net/1813/5498
Modeling the Landscape as a Dynamic Mosaic of Patches: Some Computational Aspects
Wu, Jianguo
The only thing that is certain about Nature is its patchiness. Patchiness is ubiquitous, occurring across systems, organizational levels, and spatio-temporal scales. Traditional modeling approaches in ecology often fail to recognize spatial patchiness because they usually assume spatial homogeneity. A landscape may be viewed as a hierarchical mosaic system of patches that are different in their age, size, shape, content, and other aspects. The spatial change of the patch mosaic results in the landscape pattern, whereas the phase change of individual patches at the local scale and temporal change in patch mosaics at larger scales gives rise to the landscape dynamics. Following such a patch dynamics conceptualization, a spatially explicit patch dynamic modeling approach has been developed based on a serpentine annual grassland. The model has two basic submodels: a spatially-explicit, age/size- structured patch demographic model and a multi-specific plant population dynamic model of a non-equilibrium island biogeographic type. In this paper, the basic structure and some computational aspects of the model are discussed.
1993-07-01T00:00:00ZPerturbative Forward Walking in the Context of the Mirror Potential Approach to the Fermion Problem
https://hdl.handle.net/1813/5497
Perturbative Forward Walking in the Context of the Mirror Potential Approach to the Fermion Problem
Kalos, Malvin H.
We introduce and discuss a perturbative variant of "forward walking" in Quantum Monte Carlo and develop the theory as applied to many-fermion problems.
1993-06-01T00:00:00ZInitial Experiments in the Integration of ParaScope and Lambda
https://hdl.handle.net/1813/5496
Initial Experiments in the Integration of ParaScope and Lambda
Bergmark, Donna; Presberg, David
This document describes the incorporation of the Lambda loop transformation Toolkit into the ParaScope parallel programming environment. The goal was to extend the functionality of ParaScope, to determine the usefulness of the Lambda Toolkit in environments other than that of its original development, and to evaluate the quality of code generation before and after incorporation of Lambda-based analysis and transformation. We learned that ParaScope could be extended, but only by very brave people; we learned that the Lambda Toolkit could be used by other programming systems to good effect; we also compared two different proposed interfaces for the Lambda Toolkit.
1993-06-01T00:00:00ZMagnetoconvection Dynamics in a Stratified Layer. I. 2D Simulations and Visualization (Revised 10/93)
https://hdl.handle.net/1813/5495
Magnetoconvection Dynamics in a Stratified Layer. I. 2D Simulations and Visualization (Revised 10/93)
Lantz, Steven R.; Sudan, R. N.
To gain insight into the problem of fluid convection below the solar photosphere, time-dependent magnetohydrodynamic convection is studied by numerical simulation of the magneto-anelastic equations, a model appropriate for low Mach numbers. Numerical solutions to the equations are generated on a two-dimensional Cartesian mesh by a finite-difference, predictor-corrector algorithm. The thermodynamic properties of the fluid are held constant at the rigid, stress-free top and bottom boundaries of the computational box, while lateral boundaries are treated as periodic. In most runs the background polytropic fluid configuration is held fixed at Rayleigh number R=5.44 times the critical value, Prandtl number P=1.8, and aspect ratio a=1, while the magnetic parameters are allowed to vary. The resulting dynamical behavior is shown to be strongly influenced by a horizontal magnetic field which is imposed at the bottom boundary. As the field strength increases from zero, an initially unsteady "single-roll" state, featuring complex time dependence, is replaced by a steady "traveling-wave" tilted state; then, an oscillatory or "sloshing" state; then, a steady two-roll state with no tilting; and finally, a stationary state. Because the magnetic field is matched onto a potential field at the top boundary, it can penetrate into the nonconducting region above. By varying the magnetic diffusivity, the concentrations of weak magnetic fields at the top of these flows can be shown to be explainable in terms of an advection-diffusion balance.
1993-05-01T00:00:00ZA Parallel Build-up Algorithm for Global Energy Minimizations of Molecular Clusters Using Effective Energy Simulated Annealing
https://hdl.handle.net/1813/5494
A Parallel Build-up Algorithm for Global Energy Minimizations of Molecular Clusters Using Effective Energy Simulated Annealing
Coleman, Thomas F.; Shalloway, David; Wu, Zhijun
This work studies the build-up method for the global minimizationproblem for molecular conformation, especially protein folding. The problem is hard to solve for large molecules using general minimization approaches because of the enormous amount of required computation. We therefore propose a build-up process to systematically "construct" the optimal molecular structures. A prototype algorithm is designed using the anisotropic effective energy simulated annealing method at each build-up stage. The algorithm has been implemented on the InteliPSC/860 parallel computer, and tested with the Lennard-Jones microcluster conformation problem. The experiments showed that the algorithm was effective for relatively large test problems, and also very suitable for massively parallel computation. In particular, for the 72-atom Lennard-Jones microcluster, the algorithm found a structure whose energy is lower than any others found in previous studies.
1993-05-01T00:00:00ZA Suite of Software Tools for Managing a Large Parallel Programming Project
https://hdl.handle.net/1813/5493
A Suite of Software Tools for Managing a Large Parallel Programming Project
Schwarz, Paul
A suite of software tools is presented for managing a large parallel programming project. The tools were selected recognizing that parallel program development is an iterative process and subject to mistakes and that software tools can be useful for maintaining source code flexibility and portability, tracking revisions, and analyzing variable usage and loop structure within a program. The tools discussed are: make, cpp, RCS, and FORGE 90. The concept of a toy program is introduced as a means for experimenting with a simpler version of an application program. Finally, the use of these tools and techniques is demonstrated as part of an optimization and parallelization effort for a scientific application program called ZELIG.
1993-05-01T00:00:00ZA Modified BFGS Method
https://hdl.handle.net/1813/5492
A Modified BFGS Method
Liao, Ai-Ping
In this paper, we study the global convergence property of the modified BFGS update method proposed by Liao. We show that under certain circumstances this modified BFGS method corrects the eigenvalues better than BFGS does. Our numerical results support this claim and also indicate that the modified BFGS method may be competitive with the BFGS method in general.
1993-04-01T00:00:00Z