A SUBSPACE, INTERIOR, AND CONJUGATE GRADIENT METHOD FOR LARGE-SCALE BOUND-CONSTRAINED MINIMIZATION PROBLEMS   MARY ANN BRANCH¡ , THOMAS F. COLEMAN¢ AND YUYING LI¡ Abstract. A subspace adaptation of the Coleman-Li trust region and interior method is proposed for solving large-scale bound-constrained minimization problems. This method can be implemented with either sparse Cholesky factorization or conjugate gradient computation. Under reasonable conditions the convergence properties of this subspace trust region method are as strong as those of its full-space version. Computational performance on various large-scale test problems are reported; advantages of our approach are demonstrated. Our experience indicates our proposed method represents an efficient way to solve large-scale bound-constrained minimization problems. Key Words. Interior method, trust region method, negative curvature direction, inexact Newton step, conjugate gradients, bound-constrained problem, box-constraints 1. Introduction. Recently Coleman and Li [1, 2, 3] proposed two interior and reflective Newton methods to solve the bound-constrained minimization problem, i.e., (1.1) £¥m¤§¦©in¨  "!#$!&% where ('0)§1023)54(68797A@ , %B'#)§1C2D)§6E797@ , (FG% , and  is a smooth function,  : 1H@PIQ1 1. Both algorithms are interior methods since the iterates )¥SR97 are in the strict interior of the feasible region, i.e., TRU'WVYX`Abac dd ef )¥ : FB$F0%e7 . These two methods differ in that a line search to update iterates is used in [2, 3] while a trust region idea is used in [1]. However, in both cases convergence is accelerated with the use of a novel reflection technique. The line search method version appears to be computationally viable for large-scale quadratic problems [3]. Our main objective here is to investigate solving large-scale bound-constrained nonlinear minimization problems (1.1), using a large-scale adaptation of the Trust-region Interior Reflective (TIR) approach proposed in [1]. The TIR method [1], outlined in FIG. 1, elegantly generalizes the trust region idea for unconstrained minimization to bound-constrained nonlinear minimization. Here f R ddhef g §RiqprR ddsef g 2§R . The crucial role of the (diagonal) affine scaling matrices t R and u R will become clear in v 2. An attractive feature of the TIR method [1] is that the main computation per iteration is solving a w Research partially supported by the Applied Mathematical Sciences Research Program (KC-04-02) of the Office of Energy Research of the U.S. Department of Energy under grant DE-FG02-90ER25013.A000, and by NSF, AFOSR, and ONR through grant DMS-8920550, and by the Advanced Computing Research Institute, a unit of the Cornell Theory Center which receives major funding from the National Science Foundation and IBM Corporation, with additional support from New York State and memxy bCCeoormms oppfuuttieetsrr Corporate Research Institute. Science Department, Cornell University, Ithaca, NY 14850. Science Department and Center for Applied Mathematics, Cornell University, Ithaca NY 14850. 1 The TIR Method [1] [Let 0 Fh€Fƒ‚„F VˆX`Abac‰ ∆0 F Λ† .] 1, 0 F Λ… F Λ† and ‡ 1 F 1F ‡ 2 be given. Let  0 ' For  d 0 1‘A‘A‘ 1. Compute ’R , f R , t R , prR , and u R ; define the quadratic model “ R”ˆ•¥ dd ef f5R– •˜— 1• 2 – ˆprR™— R§d•9e u 2. Compute a step •AR , with TRH—#•Rf'WVˆXg`Abac , based on the subproblem: mh in) “ Riˆ•¥ : j‰t R9• ! j2 ∆R575e 3. Compute kR d TR˜—#•AR9"4ClTRAS— “ R”ˆ•R’ 1• 2 –R u R9•R e 4. 5. If k RUmB€ Update ∆R then set TR‰n 1 d TRq—#•AR as specified below. . Otherwise set gR‰n d 1 TR . Updating Trust Region Size ∆R 1. 2. 3. If k If k If k RU!B€ then set ∆R‰n RU'oY€lr‚i then set RUv&‚ then 1∆R‰'on 1 0't ‡ s 1 ‡ ∆Rpqe 1∆Ru ∆R¥pqe if ∆RUm Λ… then set ∆R‰n 1 ' either s ‡ Rw 1∆ ∆R¥p or s ∆Ru ‡ R¥p 2∆ , otherwise, set ∆R‰n 1 'os ∆Rw min  ‡ 2∆R5 Λ† xp . FIG. 1. The TIR Method for Minimization Subject to Bounds 2 standard unconstrained trust region subproblem: (1.2) h m¤§¦©in¨ ) “ R©ˆ•¥ : j‰t R9• ! j2 ∆Rw75e The method of More´ and Sorensen [4] can be directly applied to (1.2) if Cholesky factorizations of matrices with the structure of pyR can be computed efficiently. However, this method is unsuitable for large-scale problems if the Hessian pyR is not explicitly available or (sparse) Cholesky factorizations are too expensive. Recently, Sorensen [5] proposed a new method for solving the subproblem (1.2) using matrix vector multiplications. Nonetheless, the effectiveness of this approach for large-scale minimization, particularly in the context of our trust region algorithm, is yet to be investigated. We take the view that solving the full space trust region subproblem (1.2) is too costly for a largescale problem. This view is shared by Steihaug [6] who proposes an approximate (conjugate gradient) approach. Steihaug’s approach to (1.2) seems viable although our computational experience (see Table 4) indicates that important negative curvature information can be missed, causing a significant increase in the number of minimization iterations. In this paper, we propose an alternative: an approximate subspace trust region approach (STIR). We verify that, under reasonable conditions, the convergence properties of this STIR method are as strong as those of its full-space version. We explore the use of sparse linear algebra techniques, i.e., sparse Cholesky factorization and preconditioned conjugate gradients, in the context of this approach. In addition, we demonstrate the benefits of our affine scaling, reflection and subspace techniques with computational results. First, for (1.1), our affine scaling technique outperforms the classical Dikin scaling [7], at least in the context of our algorithm. Second, we examine our method with and without reflection. We show the reflection technique can substantially reduce the number of minimization iterations. Third, our computational experiments support the notion that the subspace trust region method is a promising way to solve large-scale bound-constrained nonlinear minimization problems. Compared to the Steihaug [6] approach, the subspace approach is more likely to capture negative curvature information and consequently leads to better computational performance. Finally, our subspace method is competitive with, and often superior to, the active set method in LANCELOT [8]. The paper is organized as follows. In v 2, we briefly summarize the existing TIR method. Then we provide a computational comparison of the subspace trust region method and the Steihaug algorithm in the context of unconstrained minimization in v 3. We introduce a subspace method STIR, and discuss its convergence properties, in v 4. Issues concerning the computation of negative curvature directions and inexact Newton steps are discussed in v 5; computational results are provided indicating that performance is typically not impaired by using an inexact Newton step. Concluding remarks appear in v 7. The convergence analysis of the STIR method is included in the appendix. 2. The TIR Method. In this section we briefly review the full-space TIR method [1], sketched in FIG. 1. This method closely resembles a typical trust region method for unconstrained minimization, min£z¤§¦ ¨ l . The key difference is the presence of the affine scaling (diagonal) matrices t R and u R . Next we briefly motivate these matrices and the TIR algorithm. 3 The trust region subproblem (1.2) and the affine scaling matrices t R and u R arise naturally from examining the first-order Kuhn-Tucker conditions for (1): if a feasible point ˜F{ is a local minimizer, then i| f | d 0 for 1 !GV}!GX and if f |(F 0 then i| is not at any of its bounds. This characterization is expressed in the nonlinear system of equations (2.1) t ~ 2g l d 0 where (2.2) t  dd ef diag € ‚gY¥€ƒ~ 1„ 2‰ and the vector ‚H'…1 @ is defined below: for each 1 !&V†!&X , (i). If f |‡F 0 and %g|lF86 then ‚’| dd ef i|S4ˆ%T| ; (ii). If f |‡v 0 and ‰|‡m{4Š6 then ‚’| dd ef i|S4ˆ‹| ; (iii). If f |‡F 0 and %g| d 6 then ‚’| dd ef 4 1; (iv). If f |‡v 0 and ‰| d 4Š6 then ‚’| dd ef 1. The nonlinear system (2.1) is not differentiable everywhere; nondifferentiability occurs when ‚u| d 0. Hence we avoid such points by maintaining strict feasibility, i.e., restricting ŒRU'WVYX`Abac . A Newton step for (2.1) is then defined and satisfies (2.3)  ˆ R t R§•zRŽ d 4 f ˆR© where (2.4) f ˆR dd ef t R~ 1f R d diag € ‚9R€ 1„ 2 f R5  ˆ R dd ef t R~ 1prR t R~ 1 — diag f R’SR‘ e Here  ‘ U'B1 @T’i@ corresponds to the Jacobian of € ‚¥€ . Each diagonal component of the diagonal matrix  ‘ equals to zero or “ 1. If all the components of  and % are finite,  ‘ d diag  sgn f ” . If ‚’| d 0, we define  |•‘ | d 1. Equation (2.3) suggests the use of the affine scaling transformation:  ˆ dd ef t R¥ . This transformation reduces the constrained problem (1.1) into an unconstrained problem: a local minimizer of (1.1) corresponds to an unconstrained minimizer in the new coordinates  ˆ (for more details, see [1]). Therefore a reasonable way to improve gR is to solve the trust region subproblem (2.5) where hˆm¤§¦©in¨ ) “ ˆR© •zˆ : j •ˆ j 2 ! ∆Ri75 “ ˆR© •zˆ dd ef fiˆR– •Hˆ — 1 2 • ˆ–  ˆR •9ˆ e 4 Let • d t R~ 1•ˆ. Subproblem (2.5) is equivalent to the following problem in the original variable space: (2.6) h m¤§¦©in¨ ) “ R©ˆ•¥ : j‰t R9• ! j2 ∆Rw75 where “ R©ˆ•z dd ef • ––f R†— 1•  2– R§•9 u R dd ef t R diag  f R§SR‘ t Ru  R dd ef prR†— Rue u In addition to the close resemblance to an unconstrained trust region method, the TIR algorithm has strong convergence properties with explicit conditions on steps for optimality. We now describe these conditions. The TIR algorithm requires strict feasibility, i.e., SRŠ——•AR˜'tVˆX`bac . We use ™ R  s›š5Rp to denote the step obtained from š5R with a possible step-back for strict feasibility. Let œ R   denote the minimizer along š5R within the feasible trust region, i.e., œ R   d argmin ) “ R© šiRz œ : jœgt R§š5R j ! ∆Ru”TRž— œ š5RŸ'…a 75e Let ¡Rf'ts›¡ …  1p for some 0 F8¡ … F 1 and ¡R¢4 1 d—£  j š5R j  . Then (2.7) ™ R  s›š5Rp dd ef ¡AR œ R   š5R dd¥ef ¤ œ R   š5R ¡AR œ R   šiR if TR™— œ R   š5RU'cVˆX`¦ac‰ otherwise e The above definition implies that ¡¥R d 1 if TR†— œ R   š5RU'WVYX`Abac . Explicit conditions which yield first and second-order optimality are analogous to those of trust region methods for unconstrained minimization [1]: (AS.3) (AS.4) “ R©q•R§™FC§ “ R  s‹4 t R~ 2f R¥p , j‰t R9•AR j !&§ Assume that ¨ R is a solution to minh ¤z¦ positive constants. Then •¥R satisfies “ VˆX`bace 0∆R©eTR™—0•Rf'WVYX`Abac . ¨ ) “ R©ˆ•¥ : j‰t R9• j ! ∆Rw7 R©ˆ•AR9™F0§e© “ R  s ¨ Rpq j‰t R’•R and §e© !0§ j © 0 a∆nRud§ T0©R™a—#re•tRªw'o Condition (AS.3) is necessary for first-order convergence; (AS.4), together with (AS.3), is necessary for second-order convergence. Both conditions (AS.3) and (AS.4) are extensions of convergence conditions for unconstrained trust region methods. In particular, when  d 4(6 and % d 6 , these assumptions are exactly what is required of trust region methods for unconstrained minimization problems. Satisfaction of both conditions (AS.3) and (AS.4) is not difficult. For example, one can choose •9R so that “ R©q•R§ is the minimum of the values “ R  s ¨ R‰p and “ R  s•4 t R~ 2f RAp . However, this does not lead to an efficient computation process. In [3] and [2], we have utilized a reflection technique to permit further possible reduction of the objective function along a reflection path on the boundary. We have found in [3] and [2] that this reflection process significantly enhances performance for minimizing a general quadratic function subject to simple bounds. 5 s x p pr i FIG. 2. Reflection Technique For all the computational results in this paper, •zR is determined from the best of three points corresponding to “ R  s ¨ R‰p , “ R  s‹4 t R~ 2f Rp and “ R  s ¨SR« p where ¨–R« denotes the piecewise direction path with ¨ R reflected on the first boundary it encounters, see FIG. 2. We can appreciate the convergence results for this approach by observing the role of the affine scaling matrix t R . For the components | which are approaching the “correct” bounds, the sequence of directions )54 t R~ 2f R97 becomes increasingly tangential to these bounds. Hence, the bounds will not prevent a large step size along )54 t R~ 2f Rw7 from being taken. For the components T| which are approaching the “incorrect” bounds, )54 t R~ 2f R97 points away from these bounds in relatively large angles (the corresponding diagonal components of t R are relatively large and f R points away from these bounds). Hence, a reduction of at least “ R  s‹4 t R~ 2f Rp implies the scaled gradient ) t R~ 2f R57 converges to zero (i.e., first-order optimality). The scaling matrix used in our approach is related to, but different from, the scaling typically used in affine scaling methods for linear programming. The affine scaling matrix t Raffine dd ef diag  minYTR¬4 ­R©x%RU4ETR’” [7], commonly used in affine scaling methods for linear programming, is formed from the distance of variables to their closest bounds. Our scaling matrix t R2 equals to t Raffine only when min TR¢4o®Rur%R¯4CTR§ d € ‚iRi€ . (Note that even in this case we employ the square root of the quantities used to define t Raffine.) Before we investigate a subspace adaptation of TIR, we demonstrate the effectiveness of our reflection idea and affine scaling technique. We consider random problem instances of molecule minimization [9, 10], which minimize a quartic subject to bounds on the variables. Table 1 and 2 list the average number of iterations (over ten random test problem instances for each entry) required for the different techniques under comparison. The notation m in front of a number indicates that the average number is at least this number because the iteration number exceeds 1000, the maximum allowed, for some instance. The details of the algorithm implementation are given in v 6. Table 1 demonstrates the significant difference made by a single reflection. The only difference 6 X 100 200 400 800 1000 With Reflection 34.1 41.7 66.8 83.4 93.6 Without Reflection 71.4 >210.1 >425.4 >302.2 > 408.5 TABLE 1 The STIR algorithm with and without reflection: number of iterations X unconstrained: constrained: R t t Raffine R t t Raffine 100 38.6 36.4 36.6 >517.4 200 47.3 49 50.5 >617.6 400 61.4 58.5 65.6 >517.3 800 72.7 73.9 89.7 >1000 1000 93.6 94.6 102.3 >1000 TABLE 2 Comparison of the STIR scaling °(± and Dikin scaling ° a± ffine: number of iterations between the rows with and without reflection is the following. Without reflection, •’R is determined by the best of the two points based on “ R  s ¨ Rp and “ R  s‹4 t R~ 2f Rp ; with reflection, •¥R is determined by the best of the three points based on “ R  s ¨ Rp , “ R  s•4 t R~ 2f Rp and “ R  s ¨–R« p (with reflection). The superiority of using the reflection technique is clearly demonstrated with this problem. In Table 2, we compare the computational advantage of the selection t R over t Raffine: the only difference is the scaling matrix. We differentiate between problems that have an unconstrained solution (no bounds active at a solution) and those with a constrained solution. We observe that, for unconstrained problems, there is no significant difference between the two scaling matrices. However, for the constrained problems we tested, the choice t R is clearly superior. We observe that when t R is used, the number of iterations for a constrained problem is roughly the same as that for the corresponding unconstrained problem. For t Raffine, on the other hand, the number of iterations for a constrained problem is much larger than for the corresponding unconstrained problem. 3. Approximation to the Trust Region Solution in Unconstrained Minimization. There are two possible ways to approximate a full-space trust region solution in unconstrained minimization. Byrd, Schnabel, and Schultz [11] suggest substituting the full trust region subproblem in the unconstrained setting by (3.1) h m¤§¦in¨ ) “ R©q•¥ : • j ! j2 ∆Ru•¬'c²eR975 where ²eR is a low-dimensional subspace. (Our implementation employs a two-dimensional choice for ²eR .) Another possible consideration for the approximation of (1.2) is the Steihaug idea [6], also proposed 7 in the large-scale unconstrained minimization setting. In a nutshell, Steihaug proposes applying the method of preconditioned conjugate gradients (PCG) to the current Newton system until either negative curvature is revealed, the current approximate solution reaches the boundary of the trust region, or the Newton system residual is sufficiently reduced. We believe that a subspace trust region approach better captures the negative curvature information compared to the Steihaug approach [6]. To justify this we have conducted a limited computational study in the unconstrained minimization setting. We implement the subspace method with the subspace ²lR defined by the gradient direction f R and the output of a Modified Preconditioned Conjugate Gradient (MPCG) method applied to the linear Newton system: prRi• d 4 f R . The output is either an inexact Newton step •A³R Ž defined by, (3.2) prR9• ³R Ž d 4 R™—o´zR f such that ´zR R j j‰µij”f j !&‚5R© or a direction of negative curvature, detected by MPCG. Algorithm MPCG is given in greater detail in FIG. 11, Appendix B. Our implementation of the Steihaug method can also be found in Appendix B. Both the Steihaug and subspace implementations are wrapped in a standard trust region framework for the unconstrained minimization problem. For both methods the preconditioning matrix used is ¶ R d—· 2R where · R is the diagonal matrix computed from · R |•| d¹¸ € prR |•| € for pfR |•|†dº 0 and · R |•| d 1 otherwise. The same strategy is used to update used for the subspace method and j ∆R ‘ j‰» (see v 6 for more details). We let for the Steihaug method ([6]). ∆0 d 0e 1 j”f 0 j where the j ‘ j 2 is We used twenty different unconstrained nonlinear test problems. All but four are test problems described in [12], but with all the bound constraints removed. The problems EROSENBROCK and EPOWELL are taken from [13]. The last two problems, molecule problems MOLE1 and MOLE3, are described in [9, 10]. For all problems, the number of variables X is 260. The minimization algorithm terminates when j”f–j 2 ! 10~ 6. We use the parameter ‚ d 0e 0005 in both FIG. 11 and FIG. 12. Tables 3 and 4 compare the Steihaug and subspace methods described above in terms of the number of minimization iterations and the total number of conjugate gradient (CG) iterations. Table 3 shows problems for which negative curvature was not detected, and Table 4 shows problems for which negative curvature was detected. Although not included here, the function values and gradient norms (upon termination) were virtually the same for both methods for all problems. Since these values were essentially the same among the two methods, we only discuss the difference in iterations counts. The difference in minimization and CG iteration counts is plotted in FIG. 3 and FIG. 4. Most notable in Table 3 and the graphs of FIG. 3 is how strikingly similar the results are for the Steihaug and subspace methods; the minimization with each method stops within two iterations of the other in all cases. Furthermore, both methods take an identical number of total CG iterations except for the problem BROWN1 where the Steihaug method takes four more iterations. When negative curvature is encountered, shown in Table 4 and in FIG. 4, the iteration counts for each method are again similar for a few problems. For most problems, however, the Steihaug method takes more iterations, and for some problems the difference is substantial. This is particularly true for the problems CHAINWOOD, MOLE1 and MOLE3 (for CHAINWOOD, problem 3 in FIG. 4, the total difference in iteration counts is 8 Minimization CG Problem Subspace Steihaug Subspace Steihaug 1. BROWN1 27 29 39 43 2. BROWN3 66 66 3. BROYDEN1A 11 11 81 81 4. BROYDEN1B 5 5 34 34 5. BROYDEN2B 7 7 71 71 6. CHAINSING 22 22 188 188 7. CRAGGLEVY 21 21 125 125 8. DEGENSING 22 22 188 188 9. EPOWELL 18 18 72 72 10. GENSING 22 22 83 83 11. TOINTBROY 7 7 58 58 12. VAR 43 43 5590 5590 TABLE 3 Comparison when only positive curvature is encountered: number of iterations Minimization CG Problem Subspace Steihaug Subspace Steihaug 1. AUGMLAGN 36 29 267 228 2. BROYDEN2A 22 19 247 196 3. CHAINWOOD 156 988 3905 3878 4. EROSENBROCK 44 46 52 86 5. GEROSE 23 33 166 165 6. GENWOOD 58 63 304 275 7. MOLE1 46 119 460 376 8. MOLE3 125 186 6311 5356 TABLE 4 Comparison when negative curvature is encountered: number of iterations 9 minimization iterations CG iterations 100 excess Steihaug iterations excess subspace iterations 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 positive curvature problems 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 positive curvature problems FIG. 3. Comparison of subspace and Steihaug trust region methods for unconstrained problems minimization iterations CG iterations 100 *832* 80 60 40 20 0 12345678 negative curvature problems 100 *955* 80 60 40 20 0 12345678 negative curvature problems FIG. 4. Comparison of subspace and Steihaug trust region methods for unconstrained problems 10 explicitly noted as it is beyond the scale of the graph). In general the subspace method does take more CG iterations on problems with negative curvature, but it is these extra relatively inexpensive CG iterations that reduce the total number of minimization iterations. (Again, for the problem MOLE3 the difference in CG iterations is explicitly noted in FIG. 4 as it is beyond the scale of the graph.) A closer examination of the behavior of the two algorithms indeed shows that when negative curvature is not encountered, both methods take similar steps. (In this case, if the trust region is large enough, both methods in FIG. 11 and FIG. 12 will stop under the same conditions after the same number of CG iterations, as displayed in Table 3.) By the nature of the algorithms, if the Steihaug method detects negative curvature, then so will the subspace approach. However if the subspace algorithm detects negative curvature, the Steihaug method may terminate before it finds negative curvature; and then it does not converge (to a local minimizer) as quickly as the subspace method. The important role that negative curvature plays is supported by the fact that the subspace method often moves in a substantial negative curvature direction when the Steihaug method overlooks negative curvature. Furthermore, it is when the trust region radius ∆R is small that the Steihaug method is most likely to stop early and miss negative curvature. Thus it appears that the effectiveness of the Steihaug idea decreases as nonlinearity increases. 4. The STIR Method. Supported by the discussion in v 3, we propose a large-scale subspace adaptation of the TIR method [1] for the bound constrained problem (1.1). In moving from the unconstrained subspace approach to the box-constrained setting, it seems natural to replace the full trust region subproblem (1.2) by the following subspace subproblem (4.1) h m¤§¦©in¨ ) “ R©ˆ•¥ : j‰t R’• ! j2 ∆R©•¬'W²ŒRi75 where ²eR is a small-dimensional subspace in 1†@ , e.g., a two-dimensional subspace. A two-dimensional subspace for the trust region subproblem (2.5) can be selected from the span of the two vectors ) t R~ 1f Rw •ˆRŽ 7 and a negative curvature direction ¼ ˆ R for  ˆ R . This suggests that we form ²"R from the directions ) t R~ 2f Ru t R~ 1•ˆRŽ  t R~ 1¼ ˆ R57 . Will such subspace formulations succeed in achieving optimality? We examine this issue in more detail. It is clear that the including the scaled gradient direction t R~ 2f R in ²eR , and satisfying (AS.3), will guarantee convergence to a point satisfying the first-order optimality conditions. Let us assume for now that )¥TR57 converges to a first-order point    . To guarantee that    is also a second-order point the following conditions must be met. Firstly, it is clear that a “sufficient negative curvature” condition must be carried over from the unconstrained setting [14]. To this end, we can require that sufficient negative curvature of the matrix  ˆ R be captured if it is indefinite, i.e., ²R must contain a direction ¼ R d t R~ 1¼ ˆ R such that (4.2) ¼ ˆ R– j  ¼ ˆ ˆR R ¼ j ˆR !&½$¾5–)54(¿ @iÀ  œTÁ©Â | @   ˆ R§75e Secondly, it is important that a solution to (4.1) lead to a sufficiently large step — the potential difficulty is running into a (bound) constraint immediately. This difficulty can be avoided if the stepsize sequence, along the trust region solution direction, is bounded away from zero. Subsequently, we define: 11 DEFINITION 4.1. A direction sequence )§•zR57 has large-step-size if lim inf R‰Ã¬Ä{€ t R2 •ARi€5F86 . If fast local convergence is desired then the subspace ²lR should also contain a sufficiently accurate approximation to the the Newton direction t R~ 1 •ˆRŽ when  ˆ R is positive definite and •ˆRŽ d 4  ˆ R ~ 1f ˆR . An inexact Newton step •ˆRŽ for problem (1.1) is defined as an approximate solution to  ˆ R’• d 4 f ˆR5 with accuracy ‚wR : (4.3)  ˆ R •ˆ³R Ž d 4 f ˆR™—t´¥R such that j ´zR j‰µij f ˆR j !0‚5Rue Can we select two-dimensional subspaces satisfying all three properties and thus guarantee quadratic (superlinear) convergence to a second-order point? The answer, in theory, is yes — the subspace adaptation of TIR algorithm (STIR) in FIG.5 is an example of a subspace method capable of achieving the desired properties. To ensure convergence to a solution, the solution sequence of the subspace trust region subproblems (4.1) need to have large-step-size. Lemma 1 below indicates that this can be achieved if we set ²˜R d span) ¼ Ru”Å¥Ri7 , where ) ¼ R57 and )¥ÅzR57 are two sequences of uniformly independent vectors in the sense that lim inf ) j ÅzRŠ4 ¼ R 7Æm j 0, each with large-step-size. LEMMA 1. Assume that ) ¼ R57 and )¥ÅzR57 have large-step-size with j‰t R¼ R j d 1 and j‰t RzÅ¥R j d 1. Moreover, lim inf R‰Ã¬Ä ) ÅzR¬4 j ¼ R 7Ÿm j 0. Then the solution sequence ) ¨ R97 to the subproblem (4.1) with ²eR d span)¥ÅzR5 ¼ R57 has large-step-size. Proof. The proof is very straightforward and is omitted here. For the STIR method, a natural extension of the condition (AS.4) necessary for second-order optimality is the following. (AS.5) Assume that ¨ R is a solution to minh ¤z¦ ¨ ) “ R©ˆ•¥ : j‰t R9• j ! ∆Rwd•P'o²eR97 and §e© and § § © 0© 0 are two positive constants. ∆R and gRq—#•ARr'WVˆX`bace Then •¥R satisfies “ R”ˆ•R9HF&§e© “ R  s ¨ Rpq where j‰t R9•R j ! Theorem 2 below, with the proof provided in the Appendix, formalizes the convergence properties of STIR. THEOREM 2. Let the level set Ç d )¥3'…1†@ : H!{l 3'˜a 7 0 be compact and  : aÈIÉ1 be twice continuously differentiable on Ç . Let )¥SRi7 be the sequence generated by the STIR algorithm in FIG.5. Then 1. If (AS.3) is satisfied, then the Kuhn-Tucker condition is satisfied at every limit point. 12 The STIR Method [Let 0 Fh€Fƒ‚„F 1, 0 F Λ… F Λ† and ‡ 1 F 1 F ‡ 2 be given. Let  0 ' VˆX`Abac‰ F ∆0 Λ† .] For  d 0 1‘A‘A‘ 1. Compute ’R , f R , t R , prR , and u R ; define the quadratic model “ R©q•¥ d f5R– •q— 1• 2 – ˆprR™— R’”•9e u 2. Compute a step •¥R , with gRr—•R„'ÊVˆXg`Abac , based on the subspace subproblem, mh in) “ R©ˆ•¥ : j‰t R9• ! j2 ∆Rw•U'W²ŒRi75 where the subspace ²"R is set up as below. 3. Compute kR d TR˜—#•AR9"4ClTRAS— “ R”ˆ•R’ 1• 2 –R u R9•R e 4. 5. If k RUmB€ Update ∆R then set TR‰n 1 d as specified in TRq—#•AR FIG.1. . Otherwise set gR‰n d 1 TR . Determine Subspace ²eR : [Assume that ¼ R d t R~ 1¼ ˆ R where ) ¼ Ru7 has large-step-size. Let 0 F œ F 1 be a small positive constant.] IF  ˆ R is positive definite ²eR dd ef span) t R~ 2f R5 ¼ Rw7 ELSE  ˆ R is not positive definite IF  t R~ 2sgn  f R’” –  Rw t R~ 2sgn  f R§”HF ²eR dd ef span) t R~ 2sgn  f R’7 ELSE ²eR dd ef span) t R~ 2sgn  f R’‰ ¼ R97 END END œ™Ëx̘ËϱrÍ 2± Î Ë ± 2Ë 2 ¼ R–  R¼ R FIG. 5. The STIR Method for Minimization Subject to Bound Constraints 13 2. Assume that both (AS.3) and (AS.5) are satisfied and ¼ ˆ R in FIG. 5 contains sufficient negative curvature information whenever  ˆ R is indefinite, i.e., ¼ ˆ R– j ¼ ˆ ˆR R j ¼ ˆR 2 ! max Ð4Š¿ @9À  œTÁ  min  ˆ R¥d‰ with ¿ @9À m 0 and 0 F œ F 1. Then (a) If every limit point of )¥R57 is nondegenerate, then there is a limit point    at which both the first and second-order necessary conditions are satisfied. (b) If    is an isolated nondegenerate limit point, then both the first and second-order necessary conditions are satisfied at    . (c) If  ˆ   is nonsingular for some limit point    of )¥TR57 and ¼ ˆ R d •ˆRŽ whenever  ˆ R is positive definite, then  ˆ   is positive definite, )¥gR57 converges to    , all iterations are eventually successful, and ) ∆R57 is bounded away from zero. The degeneracy definition is the same as in [1]. DEFINITION 4.2. A point …'Wa is nondegenerate if, for each index V : (4.4) f q| d 0 dSÑ ‰|lF#i|F&%T|Ðe We have established that in principle it is possible to replace the full-dimensional trust region subproblem with a two-dimensional variation. However, the equally strong convergence properties of STIR hinges on obtaining (guaranteed) sufficient negative curvature direction with large-step-size. We discuss this next. 5. Computing Negative Curvature Directions with Large-Step-Size. Is it possible, in principle, to satisfy both the sufficient negative curvature requirement (4.2) and the large-step-size property? The answer is yes: let %SR be a unit eigenvector of  ˆ R corresponding to the most negative eigenvalue, i.e., t hˆeRA%seR qud enÁ cmein)  t  ˆ R~ Rzq%R . 1 %R97 It is easily verified that has large-step-size. for any convergent subsequence limR‰Ã}Ä Á  min  ˆ R§(F 0, However, it is not computationally feasible to compute the (exact) eigenvector %eR . Therefore, approximations, and short cuts, are in order. Can we compute approximate eigenvectors with large-step- size? A good approximation to an eigenvector corresponding to an extreme eigenvalue can usually be obtained through a Lanczos process [15]. Using the Lanczos method for  ˆ R with an initial vector Ò ˆR , approximate eigenvectors at the Ó -th step are computed in the Krylov space Ô   ˆ R5 Ò ˆRu Ó  dd ef span  Ò ˆR5  ˆ R Ò ˆR5A‘‘A‘A 8ˆ RÕ ~ 1Ò ˆR§e In the context of our algorithm, the vectors t R~ 1sgn f R§ or t R~ 1f R are natural choices for the initial vector Ò ˆR when applying the Lanczos method. 14 Our key observation is the following. If a sequence ) t R~ 1Ò ˆR57 has large-step-size then each sequence in t R~ 1 ) Ò ˆR©  ˆ R Ò ˆRw‘A‘A‘  ˆ RÕ ~ 1Ò ˆR97 retains this property. Now assume that ¼ ˆ R is the computed vector from the Lanczos method which contains the sufficient negative curvature information with respect to  ˆ R . It can be verified, based on the recurrence relation, that ) t R~ 1Ò ˆR1 A‘A‘‘‰ t R~ 1 Ò ˆÕR 7 all have large-step-size if the Lanczos vectors ) Ò ˆR1 ‘A‘‘A Ò ˆÕR 7 retain orthogonality. Since ¼ ˆ R is in the Krylov space Ô   ˆ R5 Ò ˆRu Ó  , it is clear that ) ¼ R d t R~ 1¼ ˆ Ri7 has large-step-size. In other words, in order to generate a negative curvature direction sequence with large-step-size, orthogonality needs to be maintained in the Lanczos process. Fortunately, as discussed in [16], it is quite reasonable to assume that until all of the distinct eigenvalues of the original matrix have been approximated well, orthogonality of the Lanczos vectors are well maintained. Since we are only interested in a direction with sufficient negative curvature, we expect that it can be computed before loss of orthogonality occurs. A second (and cheaper) strategy is to employ a modified preconditioned conjugate gradient scheme, e.g., MPCG in FIG.12. Unfortunately, this process is not guaranteed to generate sufficient negative curvature; nonetheless, as indicated in [17], the MPCG output will satisfy the large-step-size property. Finally we consider a modified Cholesky factorization, e.g., [18], to obtain a negative curvature direction. Assume that )  ˆ R57 is indefinite and )§šwR97 is obtained from the modified Cholesky method. We demonstrate below that )§šwR d t R~ 1š5ˆR97 has large-step-size under a nondegeneracy assumption. The negative curvature direction šiˆR d t R9š5R computed from the modified Cholesky method (see [18], page 111) satisfies Ö ÖÖ Ö –R šiˆR dØ× Õ ±  and ¶ŠR –  ˆ R ¶ R™—#ٞR d R diag ˆÚ‰R§ –R  where R is a lower triangular matrix, ¶ R i.e., × Õ ± bVÐ d 0, if V dº Ó R and × Õ ±  R§ Ó d 1. Without loss of generality, we assume that is a permutation Moreover, Ù(R is ¶ R d—Û . matrix and × Õ ± a bounded and is the Ó R th elementary non-negative diagonal vector, matrix. We argue, by contradiÖ ction, that )§šwRi7 has tÖhe large-step-size property. Assume that )§š©R97 does not have this clear that definition property. š5ˆRws jk— 1 : (2.4) of  From np d 0. ˆ R , the –R fiMš5rˆsRotrdÜeÓ oRUv× 4 Õe±r,1afnrcodommthpa otˆnR eš5nRˆRŠtiss—GoafٞloR) wš5t ˆR eR rš5d ˆtR9ri7 ډaRnaÕ gr± euÖ lbaR or× uÕ m±n,adteډrdRPix. mÈwTiÚhthifsouirnmistpodlmiieaesgژothnmaatls0),‚ it is and Õ± 7 converges to zero. From the modified Cholesky factorization, the matrix  ˆ Ris 1 : jk  1 : jkp is indefinite but  ˆ Ris 1 : jk4 1 1 : jk4 1p is positive definite. But this is impossible for sufficiently large  because, again using the definition (2.4) of  ˆ R , )  ˆ Rus 1 : jk  1 : jkp7 converges to a matrix of the form Ý  ˆ Rws 1 : jk4 1 1 : jk4 1p 0 0 ‚5R…Þ where ‚wR is positive (because of the nondegeneracy assumption). Therefore, we conclude that )§šiR57 has large-step-size. 6. Computational Experience. We demonstrate the computational performance of our STIR method given in FIG.5. Below we report our experience with the modified Cholesky and the conjugate gradient 15 (MPCG) implementations. We examine the sensitivity of the STIR method to a starting point. Finally, some limited comparisons with SBMIN of LANCELOT [8] are also made. In the implementation of STIR, we compute •¥R using a reflective technique as shown in FIG.2. The exact trust region updating procedure is given below in FIG.6. Updating Trust Region Size ∆R Let € d 0e 25}‚ d 0e 75, Λ… d 1, Λ† d max ”ß à | min db%T|g4t‹|ˆ 2  1000A 1 , ∆0 d min  0e 1 j”f–j  Λ†  , ‡ 0 d 0e 0625 ‡ d 1 0e 5 ‡ 2 d 2 be given. 1. 2. 3. 4. If k If k If k If k RU! 0 then set ∆R‰n RU'o 0”€p then set d 1 R‰n ∆ RU'oY€lr‚i then set ∆R‰n RUv&‚ then ‡ 1 1 0∆; d max d ∆Rue  ‡ Ru 0∆ ‡ 1 j‰t R9•AR ‰e j if ∆RUm Λ… ∆R‰n d 1 ‡ 2∆R otherwise ∆R‰n d 1 min  max  ∆Rw ‡ 2 j‰t R9•AR j ‰ Λ†  . FIG. 6. Updating Trust Region Size Our experiments were carried out on a Sun Sparc workstation using the Matlab environment. The stopping criteria used are as follows. We stop if either or or lgR¥4tYTR‰n dH! TR‰n 4ˆTR ! 1 œ  1 1 —{€ lTRz¥€á j no 1 negative j2 œ2 curvature has been detected for  ˆ R and j‰t RRÄ fj ! e œ1 We define œ 1 d 10~ 10 and œ dÜâ 2 œ 1µ 10 d 10~ 6. We also impose an upper bound of 600 on the number of iterations. We first report the results of the STIR method using the modified Cholesky factorization. Table 5 lists the number of iterations required for some standard testing problems (for details of these problems see [12]). (For all the results in this paper, the number of iterations is the same as the number of objective function evaluations.) The problem sizes vary from 100 to 10 000. The results in Table 5 indicate that, for these testing problems at least, the number of iterations increases only slightly, if at all, with the problem size. Moreover, in comparison to the unconstrained problems, the presence of the bound restrictions does not seem to increase the number of iterations. This is depicted pictorially in FIG. 7. In this graph, the problem size is plotted versus iteration count. For each problem, the corresponding points have been connected to show how the iteration count relates to the problem size. Our second set of results are for the STIR algorithm but using a conjugate gradient implementation. We use the algorithm MPCG in FIG.12 to find the directions needed to form the subspace ²˜R . The stopping condition applied to the relative residual in MPCG is ‚ d 0e 005. The results are shown in Table 6 and FIG. 8. Again, for these problems the iteration counts are low and steady. The exception is for the problem VAR C with 10 000 variables, where the iteration count jumps to 86. This is one of 16 X Problem 100 200 500 1000 10000 GENROSE U 25 25 25 25 25 GENROSE C 11 11 11 11 10 GENSING U 24 25 25 26 27 GENSING C 18 19 20 20 21 CHAINSING U 23 23 23 23 23 CHAINSING C 16 16 16 16 19 DEGENSING U 22 23 23 40 39 DEGENSING C 28 28 28 28 29 GENWOOD C 9 10 10 10 11 CHAINWOOD C 9 10 10 10 11 BROYDEN1A U 12 12 13 13 14 BROYDEN1A C 11 11 11 11 11 BROYDEN1B U 7 7 7 7 7 BROYDEN1B C 8 8 8 8 8 BROYDEN2A U 13 13 13 14 14 BROYDEN2A C 14 19 17 19 19 BROYDEN2B U 9 9 9 9 9 BROYDEN2B C 13 11 15 14 15 TOINTBROY U 888 8 8 TOINTBROY C 999 9 9 CRAGGLEVY U 16 14 15 16 15 CRAGGLEVY C 29 29 30 30 31 AUGMLAGN C 38 32 35 36 37 BROWN3 U 888 8 8 BROWN3 C 17 10 11 9 11 BVP U 9 10 9 8 8 BVP C 11 11 10 10 7 VAR U 9 9 10 12 15 VAR C 18 18 23 45 38 TABLE 5 STIR method with exact Newton steps: number of iterations 17 100 iterations 50 0100 1000 problem size FIG. 7. STIR performance with exact Newton steps 100 10000 iterations 50 0100 1000 problem size 10000 FIG. 8. STIR method with inexact Newton steps several degenerate problems included in this test set. With a tighter bound ‚ on the relative residual in MPCG, we could decrease the number of minimization iterations for this problem (note that the STIR with exact Newton steps only takes 38 iterations). However, this change would also increase the amount of computation (conjugate gradient iterations). Next we include some results which indicate that our STIR method is fairly insensitive to the starting point. The results in Table 7 were obtained using exact Newton steps on problems of dimension 1000. The results in Table 8 were obtained using the conjugate gradient implementation, also on problems with 1000 variables. The starting points are as follows: original is the suggested starting point according to [12]; upper starts all variables at upper bounds; lower starts all variables at the lower bounds; middle starts at the midpoint between bounds; zero starts each variable at zero (the origin); upper-lower starts the odd variables at the upper and the even variables at the lower bounds; lower-upper is the reverse of this. For all of these, we perturb the starting point slightly if necessary to be strictly feasible. Note that for the problem BROWN3 C, the iteration count is not shown starting at middle and at origin as the gradient is undefined at both these starting points. These results are also shown graphically in FIG. 9 and FIG. 10. From these graphs it is clear that both implementations of STIR are fairly robust when it comes to starting 18 X Problem 100 200 500 1000 10000 GENROSE U 21 21 21 21 21 GENROSE C 10 10 10 10 17 GENSING U 23 23 24 24 25 GENSING C 16 16 16 16 16 CHAINSING U 21 21 21 21 21 CHAINSING C 14 17 19 19 20 DEGENSING U 32 32 33 33 35 DEGENSING C 33 56 35 33 31 GENWOOD C 888 8 8 CHAINWOOD C 8 8 8 8 8 BROYDEN1A U 11 11 11 11 12 BROYDEN1A C 9 8 8 8 8 BROYDEN1B U 6 6 6 6 6 BROYDEN1B C 7 7 7 7 7 BROYDEN2A U 15 15 19 17 20 BROYDEN2A C 10 10 10 10 10 BROYDEN2B U 8 8 8 8 9 BROYDEN2B C 9 9 9 9 9 TOINTBROY U 777 7 7 TOINTBROY C 888 8 8 CRAGGLEVY U 26 26 27 27 29 CRAGGLEVY C 26 26 26 26 27 AUGMLAGN C 26 33 29 34 27 BROWN3 U 777 7 7 BROWN3 C 777 7 8 BVP U 13 13 12 13 25 BVP C 15 15 14 14 15 VAR U 34 35 35 37 36 VAR C 19 21 32 36 86 TABLE 6 STIR method with inexact Newton steps, ãräAã”å¥ãˆæ5ãŒç 0è 005: number of iterations 19 Problem GENROSE C GENSING C CHAINSING C DEGENSING C GENWOOD C CHAINWOOD C BROYDEN1A C BROYDEN1B C BROYDEN2A C BROYDEN2B C CRAGGLEVY C AUGMLAGN C BROWN3 C BVP C VAR C original 11 20 16 28 10 10 11 8 19 14 30 36 9 10 45 upper 27 31 29 47 18 17 24 22 38 30 38 40 28 17 9 Starting Point lower middle zero 33 15 16 45 25 22 33 13 11 39 52 42 14 13 10 14 13 10 25 13 12 19 18 9 38 13 9 34 12 8 33 26 26 26 36 15 14 * * 8 9 10 32 18 21 up-low 43 31 32 39 17 17 25 19 38 33 34 23 28 11 23 low-up 27 32 30 36 17 16 24 21 38 30 37 37 14 17 17 TABLE 7 STIR method with exact Newton steps for é¯ê 1000: number of iterations points. This is in contrast to active set methods where the starting point can have a more dramatic effect on the iteration count. Last we contrast the performance of the STIR method using the conjugate gradient option with the SBMIN algorithm, an active set method, in the LANCELOT software package [8]. In particular, we choose problems where negative curvature is present or where it appears that the “active set” at the solution may be difficult to find. We expect our STIR method to outperform an active set method in these situations; indeed, we have found this to be the case. For these problems, we use the default settings for LANCELOT and adjusted our STIR stopping conditions to be comparable if not more stringent. First consider a constrained convex quadratic problem. The results, given in Table 9, show that our proposed STIR method is markedly superior (by an order of magnitude) to SBMIN on this problem (c.g. it is the total number of conjugate gradient iterations). SBMIN takes many iterations on this problem when the starting point is near some of the bounds — the method mis-identifies the correct active set at the solution and takes many iterations to recover. Our proposed STIR method, a strictly interior method, moves directly to the solution without faltering when started at the same point. Table 10 summarizes the performances of STIR and SBMIN, on a set of constrained problems exhibiting negative curvature. (Again the problems are from [12] except the last two have been constrained differently to display negative curvature.) STIR is significantly better on these problems — this is probably due to the fact that negative curvature is better exploited in our subspace trust region approach than in 20 Problem GENROSE C GENSING C CHAINSING C DEGENSING C GENWOOD C CHAINWOOD C BROYDEN1A C BROYDEN1B C BROYDEN2A C BROYDEN2B C CRAGGLEVY C AUGMLAGN C BROWN3 C BVP C VAR C original 10 16 19 33 8 8 8 7 10 9 26 34 7 14 36 upper 23 27 29 43 14 14 24 21 35 28 39 60 29 21 7 Starting Point lower middle zero 37 17 20 57 26 22 33 11 10 37 42 37 10 11 8 10 11 8 21 13 8 16 13 8 35 13 8 32 12 8 35 27 24 45 32 10 53 * * 14 13 14 34 29 25 up-low 37 30 33 37 13 13 21 16 36 31 35 24 29 14 28 low-up 23 29 28 44 13 13 24 21 35 28 45 46 53 21 8 TABLE 8 STIR method with inexact Newton steps for é¬ê 1000: number of iterations 100 iterations 50 0 orig up low mid zero up−lo lo−up FIG. 9. STIR method with exact Newton steps at varied starting points X d 800 BIGGSB2 inexact STIR iteration c.g. it 16 5551 SBMIN iteration c.g. it 281 53157 TABLE 9 STIR with inexact Newton steps vs. LANCELOT SBMIN on a convex quadratic: number of iterations 21 100 iterations 50 0 orig up low mid zero up−lo lo−up FIG. 10. STIR method with inexact Newton steps at varied starting points inexact STIR X SBMIN X Problem 100 1000 10000 100 1000 10000 AUGMLAGN U CHAINWOOD U 34 30 122 1004 37 29 38 46 8953 6594 m 10000 m 10000 GENWOOD U 62 67 63 439 952 554 GENROSE U 29 29 29 76 76 76 CHAINWOOD NC 17 GENWOOD NC 16 31 24 16 54 48 23 47 m 1000 61 60 TABLE 10 STIR with inexact Newton steps vs. LANCELOT SBMIN when negative curvature exists: number of iterations the Steihaug trust region method, which SBMIN employs. This is consistent with results presented in v 3, e.g., see Table 4. 7. Conclusion. Based on the trust-region interior reflective (TIR) method in [1], we have proposed a subspace TIR method (STIR) suitable for large-scale minimization with bound constraints on the variables. In particular, we consider a two-dimensional STIR in which a subspace is formed from the scaled gradient and (inexact or exact) Newton steps or a negative curvature direction. We have designed and reported on a variety of computational experiments. The results strongly support the different components of our approach: the “subspace idea”, the use of our novel affine scaling matrix, the modified Cholesky factorization and conjugate gradient variations, and the “reflection technique”. Moreover, preliminary experimental comparisons with code SBMIN, from LANCELOT [8], indicate that our proposed STIR method can significantly outperform an active-set approach for some large-scale problems. 22 REFERENCES [1] Thomas F. Coleman and Yuying Li. An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, to appear. [2] Thomas F. Coleman and Yuying Li. On the convergence of reflective Newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67:189–224, 1994. [3] Thomas F. Coleman and Yuying Li. A reflective Newton method for minimizing a quadratic function subject to bounds on the variables. SIAM Journal on Optimization, to appear. [4] Jorge J. More´ and D.C. Sorensen. Computing a trust region step. SIAM Journal on Scientific and Statistical Computing, 4:553–572, 1983. [5] D.C. Sorensen. Minimization of a large scale quadratic function subject to an ellipsoidal constraint. Technical Report TR94-27, Department of Computational and Applied Mathematics, Rice University, 1994. [6] T. Steihaug. The conjugate gradient methods and trust regions in large scale optimization. SIAM Journal on Numerical Analysis, 20:626–637, 1983. [7] I.I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademiia Nauk SSSR, 174:747– 748, 1967. [8] A. R. Conn, N. I. M. Gould, and Ph. L. Toint. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A). Springer-Verlag, 1992. [9] Bruce A. Hendrickson. The molecule problem: Determining conformation from pairwise distances. Cornell University Ph.D thesis, Computer Science, 1991. [10] Thomas F. Coleman. Large-scale numerical optimization: Introduction and overview. In Allen Kent and James G. Williams, editors, Encyclopedia of Computer Science and Technology, pages 167–195. Marcel Dekker, INC., 1993. [11] Gerald A. Shultz, R. B. Schnabel, and Richard H. Byrd. A family of trust-region-based algorithms for unconstrained minimization with strong global convergence properties. SIAM Journal on Numerical Analysis, 22(1):47–67, 1985. [12] Andrew R. Conn, N. I. M. Gould, and Ph. L. Toint. Testing a class of methods for solving minimization problems with simple bounds on the variables. Mathematics of Computation, 50(182):399–430, 1988. [13] Jorge J. More´, Burton S. Garbow, and Kenneth. E. Hillstrom. Testing unconstrained optimization software. ACM Transactions on Mathematical Software, 7:17–41, 1981. [14] Richard H. Byrd, Robert B. Schnabel, and Gerald A. Shultz. Approximate solution of the trust region problem by minimization over two-dimensional subspaces. Mathematical Programming, 40:247–263, 1988. [15] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1989. [16] Jane K. Cullum and Ralph A. Willoughby. Lanczos Algorithms for Large symmetric eigenvalue computations, Vol. 1 Theory. Birkhau¨ser, Boston, 1985. [17] Mary Ann Branch. Inexact Reflective Newton Methods for Large-Scale Optimization Subject to Bound Constraints. PhD thesis, Department of Computer Science, Cornell University, 1995. [18] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, 1981. A. Proofs for THEOREM 2. The convergence results (THEOREM 2) for the STIR algorithm can be obtained in a similar manner to THEOREM 3.10 for the full-space trust region and interior reflective method (TIR)[1]. Indeed, first-order optimality is a direct consequence of the condition (AS.3). The second order optimality rests on the fact that the solution subsequence of the subspace trust region subproblem would have large-step-size if the corresponding )  ˆ   7 were indefinite at a limit point (see Lemma 3 below). Moreover, if  ˆ R is positive definite at a limit point then we prove that the step size along the subspace trust region solution is sufficiently large in the following sense: ™ RUvìëeí ∆R min  j‰t R~ 1f R j  j‰t R5•zRŽ ‰ j for some ëeí m 0e 23 Here ™ R is the stepsize, along ¨ R , to the boundary of the feasible region ( see Lemma 4 ). Based on this inequality, it follows that the trust region size is bounded away from zero and Newton steps are eventually successful. Assume that ¨ R is a solution to a subspace trust region subproblem (4.1) with ²lR d span) ¼ R©ÐÅzR97 . Assume that the columns of î R form an orthonormal basis for span) t R¥Å¥R© t R¼ R97 . Then ¨ R d t R~ 1î R¥ï5R where ï5R solves (A.1)  îrR –  ˆ R î Rž— Á R Û qï5R d 4 îrR – t R~ 1f Rw îrR –  ˆ R î R†— Á R Ûfd{ð –R ð Ru and (A.2)  t R R§ ¨–  ˆR t R RH— ¨ Á R j ï5R j 2 d 4¬ t R ¨ R§ – f ˆR©e Next we prove that the subspace trust region solution sequence from the STIR algorithm in FIG.5. has large-step-size if the corresponding sequence )  ˆ Rw7 satisfies that limR‰Ã¬Ä Á  min ˆ R§HF 0. ) ¨ R97 LEMMA 3. If lim of the subspace sup Á trust  min  ˆ R’HF 0 for a region subproblem subsequence, then the corresponding (4.1) has large-step-size. solution subsequence Proof. Consider two subsequences of )  ˆ Rw7 : one sequence satisfies € ²R©€ d 1 and the other sequence has € ²eR©€ d 2. For the subsequence with € ²R©€ d 1, the corresponding trust region solution sequence clearly has large-step-size. )¥Å¥Rw7 For d the ) t R~ subsequence with 2sgn  f R’7 and ) ¼ € ²"R”€ d Rw7 have 2, it is clear that lim inf large-step-size, we have ) thjtaËqt̘̘) ±±ÍÍ ¨ 2sgnñ Î ±‰ò 2sR5g7 nhñ Îa±‰sò Ëlar4 geË-ÏÏst±± eË pj -s7tizm e 0. Since following Lemma 1. We state the following result which is similar to Lemma 8 in [1] and omit the proof. LEMMA 4. Assume that (AS.4) is satisfied. Then §"© 4 “ R©ˆ•AR§™v 2 s min) 1 ™ 2R 7 Á R ∆2R — min) 1 ™ R57 j ð R¥ï5R j 2pq where ™ R is the stepsize along ¨ R d t R~ 1î R¥ï5R to the boundary and ïuR is defined by (A.1). Let • RŽ denote the Newton step (2.3) of (2.1). Then (A.3) diag  f R’SR‘ t R§•¥RŽ d 4 t R~ 1f R¢4 t R~ 1 prR t R~ 1 t R§•¥RŽ e The next result is required to establish that Newton steps • RŽ will eventually lead to successful steps. LEMMA 5. Assume that¨ R is a solution to the subspace subproblem (4.1) with ²lR d span) t R~ 2f R5• RŽ 7 . If )¥TR97 converges to a nondegenerate point    where the second order sufficiency conditions are satisfied, 24 then (A.4) ™ RUvóëŒí ∆R min  j‰t R~ 1f R j  j‰t R9•¥RŽ  j for  sufficiently large, where ™ R is the stepsize to the boundary along ¨ R . Proof. By definition, ™R d m| in  max  ­R | 4DTR ¨ R| |  %R | ¨ 4ˆTR R| | ”‰e For any  , if t R~ 1f R d we have k R t R§• RŽ for some k RU'…1 1, then ¨ R d 4 ∆± Ëq̘±fÍ 1Î ± Ë t R~ 2f R . Hence, if ô51õ v R j”f Ä j , ™ RUvóëŒí ∆R min  j‰t R~ 1f R j  j‰t R9•¥RŽ e j Assume that t R~ 1f R dº k R t R9• RŽ . We first show that if we can establish (A.5) R ¨ d RwÐ4 ‡ t R~ 2f RiS—D§gRi•¥RŽ where §gRfv 0 and ‡ Rfv 0 then (A.4) holds. From (A.5) and  t R’• RŽ  – Ð4 t R~ 1f R’†v 0, we have ®§R t R9•zRŽ  –  R©Ð4 ‡ t R~ 1f R’”Hv 0e Using ¨ ˆR d t RR ¨ d R©Ð4 ‡ t R~ 1f R§–—t§R t R9• RŽ again, j ¨ ˆR j 2 d ‡ R2 j‰t R~ 1f R j 2 — 2 ‡ R t R’•¥RŽ  – ®§gRuÐ4 t R~ 1f R§dS—o§ R2 j‰t R9•¥RŽ j 2e But j ¨ ˆR j ! ∆R . Hence 0 !C§RU! ∆R j‰t R~ 1f R j and 0 ! Rr! ‡ ∆R e j‰t Ri• RŽ j Hence, from (A.3), the boundedness of f R , t R~ 1f R , t R9• RŽ , and the fact that    is a nondegenerate first-order point, it is easy to verify that ™ RUvìëeí ∆R min  j‰t R~ 1f R j  j‰t R5•zRŽ ‰ j for some ëeí m 0e Finally, we need to establish (A.5) under the linearly independent assumption t R~ 1f R dº k R t R9• RŽ . Assume that the columns of î R form an orthonormal basis for span) t R~ 1f R© t R’• RŽ 7 . Then î R î R – t R~ 1f R d t R~ 1f R ,î R î R– t R9• RŽ d t R’• RŽ , and î R – î R d—Û 2 where Û 2 is the 2-by-2 identity matrix. Moreover, R ¨ d 4 t R~ 1î R©s îrR –   ˆ R™— Á  Û  î R‰p~ 1îÆR – t R~ 1f R where Á   v 0 and if Á   m 0, j‰t R ¨ R j d ∆RUm 0. Let ¨ ˆ Á  d t R ¨ Ru Á  and (A.6) ¨ ˆ  Á  dd ef 4 î Rus îrR –   ˆ R†— Á Û  î RpY~ 1îÆR – t R~ 1f Ru for Á v 0e 25 Then there exists §H Á  and ‡  Á  such that ¨ ˆ Á  d §H AÐ4 Á t R~ 1f R9–—  ‡Át Ri•¥RŽ e First, it is clear that §H 0 d 0, ‡  0 d 1. From (A.6), ö Ã}limn–Ä ¨ ˆ Á j ¨ ˆ Á   j d t R~ 1f R  j‰t R~ 1f R j and by the linear independence assumption t R~ 1f R dº k R t R§• RŽ , we have §H  ö Ã}limn–Ä j ¨ ˆ Á Á  j d 1e j‰t R~ 1f R j Hence, for Á sufficiently large, §H Á †m 0. We now prove that ‡  Á  A†v 0 by contradiction. Assume that ‡  Á  AHF 0 (this means that j‰t R§• RŽ m j ∆R ). From continuity of ‡  Á  ,‡  0 d 1 and ‡  Á   (F 0, there exists 0 F Á¯ F Á   so that ‡  Á¯  d 0. This implies that §† Á¯  t R~ 1f R d 4 î Ru îrR –   ˆ RH— Á¯Û  î R§~ 1îrR – t R~ 1f R5e From  ˆ R •ˆRŽ d 4 t R~ 1f R , î R î R – •ˆRŽ d •ˆRŽ , and the columns of î R are linearly independent, there exists ë such that s îrR –  ˆ R î R¥p îfR – t R~ 1f R d ë s îÆR –  ˆ R î R¥p îUR – •zˆRŽ e Again using î R î R– t R~ 1f R d t R~ 1f R and î R î R – •ˆRŽ d •ˆRŽ , we have t R~ 1f R d ë t Ri•¥RŽ  which contradicts the assumption t R~ 1f R dº k R t R9• RŽ . Similarly, we can prove that §H Á  A†v 0 based on §H Á žm holds. This completes the proof. 0 for sufficiently large Á . Therefore (A.5) Now we establish the convergence properties of the STIR algorithm. THEOREM 2. Let the level set Ç d )¥$'$1 @ : l˜!El –P'Wa 7 0 be compact and  : aÊI÷1 be twice continuously differentiable on Ç . Let )¥SRi7 be the sequence generated by the STIR algorithm in FIG.5. Then 1. If (AS.3) is satisfied, then the Kuhn-Tucker condition is satisfied at every limit point. 2. Assume that both (AS.3) and (AS.5) are satisfied and ¼ ˆ R in FIG. 5 contains sufficient negative curvature information whenever  ˆ R is indefinite, i.e., ¼ ˆ R– j ¼ ˆ ˆR R j ¼ ˆR 2 ! max Ð4Š¿ @9À  œTÁ  min  ˆ R¥d‰ with ¿ @9À m 0 and 0 F œ F 1. Then 26 (a) If every limit point is nondegenerate, then there is a limit point    at which both the first and second order necessary conditions are satisfied. (b) If    is an isolated nondegenerate limit point, then both the first and second order necessary conditions are satisfied at    . (c) If  ˆ   is nonsingular for some limit point    of )¥TR57 and ¼ ˆ R d •ˆRŽ whenever  ˆ R is positive definite, then  ˆ   is positive definite, )¥gR57 converges to    , all iterations are eventually successful, and ) ∆R57 is bounded away from zero. Proof. Using Lemma 3, for any subsequence with limR‰Ã¬Ä Á  min  ˆ Rz¯F 0, the corresponding ) ¨ R57 has large-step-size. Therefore, there using Lemma 4, for some ¿ 1 m 0, exists ¿ 0 m 0 such that ™ R3mø¿  0 for  sufficiently largee Hence, 4 “ Riˆ•AR§HvE¿ § © s 12 Á R ∆2R — jð R t R ¨ R j 2pqe Condition (AS.5) then implies that (A.7) §© lTRz"4tlTR‰n HvB€e¿ 11 2 s Á R ∆2R — jð R t R ¨ R j 2pqe Now assume that  ˆ   is positive definite and )¥gR97 converges to    . From Lemma 4, we have §"© 4 “ R©ˆ•AR§™v 2 s min) 1 ™ 2R 7 Á R ∆2R — min) 1 ™ R57 j ð R¥ï5R j 2pqe where ™ R is the stepsize along ¨ R . Let ¿™m 0 be a lower bound for the eigenvalues of  ˆ R . From (A.1), (A.4) in Lemma 5, and j‰t R9• RŽ j ! 1ù j‰t R~ 1f R j , there exists ë m 0 such that (A.8) €“ R  s ¨ RApr€wv ë min) ∆2R  j‰t R9•¥RŽ j 2 75e Using (A.7) and (A.8), the proof is essentially the same as that of Theorem 3.10 in [1]: replacing (3.21) in [1] by (A.7) and (3.22) in [1] by (A.8). B. Implementation Details. We present details of our Steihaug method implementation in FIG.11 and the modified preconditioned conjugate gradient (MPCG) in FIG.12. For more details on the large-step-size property of the steps computed from FIG.12, see [17]. 27 function [• ] = Steihaug(p , f , u , ‚ , ∆) % Note: u is some preconditioning matrix for H. u X d × X f `”úŒ f ;  ½…¾i d X µ 2; d 0; ¨ d 0 0; û d 0 0; ´ 0 d 4 f; must be positive definite. while  F ½…¾i  Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: end • d ¨ R , return Solve u ÅzR d ´¥R ; d —  1 if  d 1, š 1 d Å 0 else §gR d ´ R– ~ ÅzR 1 ~ 1µ ´ R– ~ ÅzR 2 ~ 2; šiR d Å¥R ~ —D§gR9š5R 1 ~ 1 end R ‡ d š –R p$šiR if ‡ RØ! 0, compute œ m 0 so that j¨ R™— šiR œ jü d ∆, • d R™— ¨ œ š5R , return else R ™ d ´ R– ~ Å¥R 1 ~ 1 µ¥‡ R ; ¨ R d ¨ R~ 1 — ™ R9š5R ; ´¥R d ´zR ~ 1 4 R§p$š5R ™ end if j¨ R j v ∆ compute œ m 0 so that j¨ RH— šiR œ jü d ∆, • d RH— ¨ œ šiR , return, end if ju ~ 1j ‘ ´zR j j !&‚ª‘ ju ~ 1´ 0 j , • d ¨ R , return, end FIG. 11. The Steihaug algorithm 28 Step 0: Ó % Note: ‚ d 0; ¨ and ¿ d 0 0; š 0 d 0; ´ 0 d 4 fˆ; are positive constants. while Ó F maximum iteration Step 1: Solve ¶ Å Õ d ´ Õ Step 2: Ó d — Ó 1 Step 3: if Ó d 1 š dÅ 10 else § Õ d šÕ d ´ Ֆ ~ ÅÕ~ Å 1 1 Õ~ —D§ 1µ Õ ´ š Ֆ Õ ~ ~ Å 2 1 Õ ~ 2 end Step 4: ‡ Õ d š Ֆ  ˆ š Õ Step 5: if ‡ Õ ! 0, exit: š d š Õ , ¨ d ¨ Õ else if ‡ Õ !E¿¥qš Ֆ ¶ š Õ  , exit: š d 0, ¨ d else ™Õ d ¨Õ d ´Õ d ´ Ֆ ~ ¨ Õ~ Å 1 1 Õ — ~ 1 µ¥‡ ™Õ Õ šÕ ´Õ ~ 1 4 ™ Õ ˆš Õ end Step 6: if ju ~ 1 j ‘ j ´ Õ j !ý‚ª‘ ju ~ 1´ 0 j , exit: ¨ end ¨Õ d ¨ Õ,š d 0 FIG. 12. MPCG (B.1) (B.2) (B.3) 29