A SUBSPACE, INTERIOR, AND CONJUGATE GRADIENT METHOD FOR LARGE-SCALE BOUND-CONSTRAINED MINIMIZATION PROBLEMS   MARY ANN BRANCH¡ , THOMAS F. COLEMAN¢ AND YUYING LI¡
Abstract. A subspace adaptation of the Coleman-Li trust region and interior method is proposed for solving large-scale bound-constrained minimization problems. This method can be implemented with either sparse Cholesky factorization or conjugate gradient computation. Under reasonable conditions the convergence properties of this subspace trust region method are as strong as those of its full-space version.
Computational performance on various large-scale test problems are reported; advantages of our approach are demonstrated. Our experience indicates our proposed method represents an efﬁcient way to solve large-scale bound-constrained minimization problems.

Key Words. Interior method, trust region method, negative curvature direction, inexact Newton step, conjugate gradients, bound-constrained problem, box-constraints

1. Introduction. Recently Coleman and Li [1, 2, 3] proposed two interior and reﬂective Newton methods to solve the bound-constrained minimization problem, i.e.,

(1.1)

£¥m¤§¦©in¨  "!#$!&%

where ('0)§1023)54(68797A@ , %B'#)§1C2D)§6E797@ , (FG% , and  is a smooth function,  : 1H@PIQ1 1. Both algorithms are interior methods since the iterates )¥SR97 are in the strict interior of the feasible region, i.e., TRU'WVYX`Abac dd ef )¥ : FB$F0%e7 . These two methods differ in that a line search to update iterates is used in [2, 3] while a trust region idea is used in [1]. However, in both cases convergence is accelerated with the use of a novel reﬂection technique.
The line search method version appears to be computationally viable for large-scale quadratic problems [3]. Our main objective here is to investigate solving large-scale bound-constrained nonlinear minimization problems (1.1), using a large-scale adaptation of the Trust-region Interior Reﬂective (TIR) approach proposed in [1].
The TIR method [1], outlined in FIG. 1, elegantly generalizes the trust region idea for unconstrained minimization to bound-constrained nonlinear minimization. Here f R ddhef g §RiqprR ddsef g 2§R . The crucial role of the (diagonal) afﬁne scaling matrices t R and u R will become clear in v 2.
An attractive feature of the TIR method [1] is that the main computation per iteration is solving a

w Research partially supported by the Applied Mathematical Sciences Research Program (KC-04-02) of the Ofﬁce of Energy

Research of the U.S. Department of Energy under grant DE-FG02-90ER25013.A000, and by NSF, AFOSR, and ONR through

grant DMS-8920550, and by the Advanced Computing Research Institute, a unit of the Cornell Theory Center which receives

major funding from the National Science Foundation and IBM Corporation, with additional support from New York State and

memxy bCCeoormms oppfuuttieetsrr

Corporate Research Institute. Science Department, Cornell University, Ithaca, NY 14850. Science Department and Center for Applied Mathematics, Cornell

University,

Ithaca

NY

14850.

1

The TIR Method [1]

[Let 0 FhFF VX`Abac ∆0 F Λ .]

1, 0 F

Λ F

Λ and  1 F

1F

 2 be given. Let  0 '

For  d 0 1AA 1. Compute R , f R , t R , prR , and u R ; deﬁne the quadratic model

 R¥ dd ef f5R 

1
2



prR

R§d9e u

2. Compute a step AR , with TRH#Rf'WVXg`Abac , based on the subproblem:

mh in)  Ri¥ : jt

R9 ! j2

∆R575e

3. Compute

kR d

TR#AR9"4ClTRAS  RR

1
2

R

u

R9R

e

4. 5.

If k RUmB Update ∆R

then set TRn 1 d TRq#AR as speciﬁed below.

. Otherwise set gRn

d
1

TR .

Updating Trust Region Size ∆R

1. 2. 3.

If k If k If k

RU!B then set ∆Rn RU'oYlri then set RUv& then

1∆R'on 1 0't  s

1


∆Rpqe 1∆Ru

∆R¥pqe

if ∆RUm Λ then

set ∆Rn 1 '

either

s 

Rw 1∆

∆R¥p

or

s ∆Ru



R¥p 2∆

,

otherwise, set ∆Rn 1 'os ∆Rw min   2∆R5 Λ xp .

FIG. 1. The TIR Method for Minimization Subject to Bounds

2

standard unconstrained trust region subproblem:

(1.2)

h m¤§¦©in¨ )  R©¥ : jt

R9 ! j2

∆Rw75e

The method of More´ and Sorensen [4] can be directly applied to (1.2) if Cholesky factorizations of matrices with the structure of pyR can be computed efﬁciently. However, this method is unsuitable for large-scale problems if the Hessian pyR is not explicitly available or (sparse) Cholesky factorizations are too expensive. Recently, Sorensen [5] proposed a new method for solving the subproblem (1.2) using matrix vector multiplications. Nonetheless, the effectiveness of this approach for large-scale minimization, particularly in the context of our trust region algorithm, is yet to be investigated.
We take the view that solving the full space trust region subproblem (1.2) is too costly for a largescale problem. This view is shared by Steihaug [6] who proposes an approximate (conjugate gradient) approach. Steihaug’s approach to (1.2) seems viable although our computational experience (see Table 4) indicates that important negative curvature information can be missed, causing a signiﬁcant increase in the number of minimization iterations.
In this paper, we propose an alternative: an approximate subspace trust region approach (STIR). We verify that, under reasonable conditions, the convergence properties of this STIR method are as strong as those of its full-space version. We explore the use of sparse linear algebra techniques, i.e., sparse Cholesky factorization and preconditioned conjugate gradients, in the context of this approach.
In addition, we demonstrate the beneﬁts of our afﬁne scaling, reﬂection and subspace techniques with computational results. First, for (1.1), our afﬁne scaling technique outperforms the classical Dikin scaling [7], at least in the context of our algorithm. Second, we examine our method with and without reﬂection. We show the reﬂection technique can substantially reduce the number of minimization iterations. Third, our computational experiments support the notion that the subspace trust region method is a promising way to solve large-scale bound-constrained nonlinear minimization problems. Compared to the Steihaug [6] approach, the subspace approach is more likely to capture negative curvature information and consequently leads to better computational performance. Finally, our subspace method is competitive with, and often superior to, the active set method in LANCELOT [8].
The paper is organized as follows. In v 2, we brieﬂy summarize the existing TIR method. Then we provide a computational comparison of the subspace trust region method and the Steihaug algorithm in the context of unconstrained minimization in v 3. We introduce a subspace method STIR, and discuss its convergence properties, in v 4. Issues concerning the computation of negative curvature directions and inexact Newton steps are discussed in v 5; computational results are provided indicating that performance is typically not impaired by using an inexact Newton step. Concluding remarks appear in v 7. The convergence analysis of the STIR method is included in the appendix.

2. The TIR Method. In this section we brieﬂy review the full-space TIR method [1], sketched in FIG. 1. This method closely resembles a typical trust region method for unconstrained minimization, min£z¤§¦ ¨ l . The key difference is the presence of the afﬁne scaling (diagonal) matrices t R and u R . Next we brieﬂy motivate these matrices and the TIR algorithm.

3

The trust region subproblem (1.2) and the afﬁne scaling matrices t R and u R arise naturally from examining the ﬁrst-order Kuhn-Tucker conditions for (1): if a feasible point F{ is a local minimizer, then i| f | d 0 for 1 !GV}!GX and if f |(F 0 then i| is not at any of its bounds. This characterization is expressed in the nonlinear system of equations

(2.1)

t ~ 2g l d 0

where

(2.2)

t  dd ef diag  gY¥~ 1 2

and the vector H'1 @ is deﬁned below: for each 1 !&V!&X ,

(i). If f |F 0 and %g|lF86 then | dd ef i|S4%T| ; (ii). If f |v 0 and |m{46 then | dd ef i|S4| ; (iii). If f |F 0 and %g| d 6 then | dd ef 4 1; (iv). If f |v 0 and | d 46 then | dd ef 1.

The nonlinear system (2.1) is not differentiable everywhere; nondifferentiability occurs when u| d 0. Hence we avoid such points by maintaining strict feasibility, i.e., restricting RU'WVYX`Abac . A Newton step for (2.1) is then deﬁned and satisﬁes

(2.3)

 ˆ R t R§zR d 4 f ˆR©

where

(2.4)

f ˆR dd ef t R~ 1f R d diag  9R 1 2 f R5  ˆ R dd ef t R~ 1prR t R~ 1  diag f RSR e

Here   U'B1 @Ti@ corresponds to the Jacobian of  ¥ . Each diagonal component of the diagonal matrix   equals to zero or  1. If all the components of  and % are ﬁnite,   d diag  sgn f  . If | d 0, we deﬁne  | | d 1.
Equation (2.3) suggests the use of the afﬁne scaling transformation:  ˆ dd ef t R¥ . This transformation reduces the constrained problem (1.1) into an unconstrained problem: a local minimizer of (1.1) corresponds to an unconstrained minimizer in the new coordinates  ˆ (for more details, see [1]). Therefore a reasonable way to improve gR is to solve the trust region subproblem

(2.5) where

hˆm¤§¦©in¨ )  ˆR© zˆ : j ˆ j 2 ! ∆Ri75

 ˆR© zˆ dd ef fiˆR Hˆ 

1 2



ˆ

 ˆR

9ˆ e

4

Let  d t R~ 1ˆ. Subproblem (2.5) is equivalent to the following problem in the original variable space:

(2.6)

h m¤§¦©in¨ )  R©¥ : jt

R9 ! j2

∆Rw75

where

 R©z dd ef  f R

1  2

R§9

u R dd ef t R diag  f R§SR t Ru



R dd ef prR

Rue u

In addition to the close resemblance to an unconstrained trust region method, the TIR algorithm has

strong convergence properties with explicit conditions on steps for optimality. We now describe these

conditions.

The TIR algorithm requires strict feasibility, i.e., SRAR'tVX`bac . We use  R  s5Rp to denote the

step obtained from 5R with a possible step-back for strict feasibility. Let  R   denote the minimizer along

5R within the feasible trust region, i.e.,  R   d

argmin ) 

R© iRz 

: jgt

R§5R j

!

∆RuTR

 5R'a 75e Let

¡Rf'ts¡   1p for some 0 F8¡  F 1 and ¡R¢4 1 d£  j 5R j  . Then

(2.7)

 R  s5Rp dd ef ¡AR  R   5R dd¥ef ¤

 R   5R ¡AR  R   iR

if TR  R   5RU'cVX`¦ac otherwise e

The above deﬁnition implies that ¡¥R d 1 if TR  R   5RU'WVYX`Abac . Explicit conditions which yield ﬁrst and second-order optimality are analogous to those of trust region
methods for unconstrained minimization [1]:

(AS.3) (AS.4)



R©qR§FC§ 

R  s4

t

R~ 2f R¥p , jt

R9AR j

!&§

Assume that ¨ R is a solution to minh ¤z¦

positive constants. Then ¥R satisﬁes 

VX`bace

0∆R©eTR0Rf'WVYX`Abac .

¨ )  R©¥ : jt

R9 j

!

∆Rw7

R©AR9F0§e© 

R 

s ¨

Rpq

jt

RR

and §e© !0§
j

©
0

a∆nRud§ T0©Ra#retRªw'o

Condition (AS.3) is necessary for ﬁrst-order convergence; (AS.4), together with (AS.3), is necessary

for second-order convergence. Both conditions (AS.3) and (AS.4) are extensions of convergence conditions for unconstrained trust region methods. In particular, when  d 4(6 and % d 6 , these assumptions

are exactly what is required of trust region methods for unconstrained minimization problems.

Satisfaction of both conditions (AS.3) and (AS.4) is not difﬁcult. For example, one can choose 9R

so that 

R©qR§

is the minimum of the values 

R 

s ¨

Rp

and 

R  s4

t

R~ 2f RAp . However, this does not lead to

an efﬁcient computation process. In [3] and [2], we have utilized a reﬂection technique to permit further

possible reduction of the objective function along a reﬂection path on the boundary. We have found in [3]

and [2] that this reﬂection process signiﬁcantly enhances performance for minimizing a general quadratic

function subject to simple bounds.

5

s x
p pr i

FIG. 2. Reﬂection Technique

For all the computational results in this paper, zR is determined from the best of three points

corresponding to 

R 

s ¨

Rp , 

R  s4

t

R~ 2f Rp and 

R  s ¨SR« p where ¨R«

denotes the piecewise direction path with ¨ R

reﬂected on the ﬁrst boundary it encounters, see FIG. 2.

We can appreciate the convergence results for this approach by observing the role of the afﬁne scaling matrix t R . For the components | which are approaching the “correct” bounds, the sequence of directions )54 t R~ 2f R97 becomes increasingly tangential to these bounds. Hence, the bounds will not prevent a large step size along )54 t R~ 2f Rw7 from being taken. For the components T| which are approaching the “incorrect” bounds, )54 t R~ 2f R97 points away from these bounds in relatively large angles (the corresponding diagonal components of t R are relatively large and f R points away from these bounds). Hence, a reduction of at least  R  s4 t R~ 2f Rp implies the scaled gradient ) t R~ 2f R57 converges to zero (i.e., ﬁrst-order optimality).

The scaling matrix used in our approach is related to, but different from, the scaling typically used in afﬁne scaling methods for linear programming. The afﬁne scaling matrix t Rafﬁne dd ef diag  minYTR¬4 ­R©x%RU4ETR [7], commonly used in afﬁne scaling methods for linear programming, is formed from the distance of variables to their closest bounds. Our scaling matrix t R2 equals to t Rafﬁne only when min TR¢4o®Rur%R¯4CTR§ d  iRi . (Note that even in this case we employ the square root of the quantities used to deﬁne t Rafﬁne.)

Before we investigate a subspace adaptation of TIR, we demonstrate the effectiveness of our reﬂection

idea and afﬁne scaling technique. We consider random problem instances of molecule minimization

[9, 10], which minimize a quartic subject to bounds on the variables. Table 1 and 2 list the average

number of iterations (over ten random test problem instances for each entry) required for the different techniques under comparison. The notation m in front of a number indicates that the average number is at

least this number because the iteration number exceeds 1000, the maximum allowed, for some instance.

The details of the algorithm implementation are given in v 6.

Table 1 demonstrates the signiﬁcant difference made by a single reﬂection. The only difference

6

X
100 200 400 800 1000 With Reﬂection 34.1 41.7 66.8 83.4 93.6 Without Reﬂection 71.4 >210.1 >425.4 >302.2 > 408.5

TABLE 1 The STIR algorithm with and without reﬂection: number of iterations

X

unconstrained: constrained:

R t t Rafﬁne
R t t Rafﬁne

100 38.6 36.4 36.6 >517.4

200 47.3
49 50.5 >617.6

400 61.4 58.5 65.6 >517.3

800 72.7 73.9 89.7 >1000

1000 93.6 94.6 102.3 >1000

TABLE 2 Comparison of the STIR scaling °(± and Dikin scaling ° a± fﬁne: number of iterations

between the rows with and without reﬂection is the following. Without reﬂection, R is determined by the

best of the two points based on 

R 

s ¨

Rp

and 

R  s4

t

R~ 2f Rp ; with reﬂection, ¥R

is determined by the best of

the three points based on 

R 

s ¨

Rp , 

R  s4

t

R~ 2f Rp and 

R  s ¨R« p (with reﬂection). The superiority of using the

reﬂection technique is clearly demonstrated with this problem. In Table 2, we compare the computational advantage of the selection t R over t Rafﬁne: the only

difference is the scaling matrix. We differentiate between problems that have an unconstrained solution

(no bounds active at a solution) and those with a constrained solution. We observe that, for unconstrained

problems, there is no signiﬁcant difference between the two scaling matrices. However, for the constrained problems we tested, the choice t R is clearly superior. We observe that when t R is used, the number

of iterations for a constrained problem is roughly the same as that for the corresponding unconstrained problem. For t Rafﬁne, on the other hand, the number of iterations for a constrained problem is much larger

than for the corresponding unconstrained problem.

3. Approximation to the Trust Region Solution in Unconstrained Minimization. There are two possible ways to approximate a full-space trust region solution in unconstrained minimization.
Byrd, Schnabel, and Schultz [11] suggest substituting the full trust region subproblem in the unconstrained setting by

(3.1)

h m¤§¦in¨ ) 

R©q¥

:

 j

! j2

∆Ru¬'c²eR975

where ²eR is a low-dimensional subspace. (Our implementation employs a two-dimensional choice for ²eR .)

Another possible consideration for the approximation of (1.2) is the Steihaug idea [6], also proposed

7

in the large-scale unconstrained minimization setting. In a nutshell, Steihaug proposes applying the method of preconditioned conjugate gradients (PCG) to the current Newton system until either negative curvature is revealed, the current approximate solution reaches the boundary of the trust region, or the Newton system residual is sufﬁciently reduced.
We believe that a subspace trust region approach better captures the negative curvature information compared to the Steihaug approach [6]. To justify this we have conducted a limited computational study in the unconstrained minimization setting.
We implement the subspace method with the subspace ²lR deﬁned by the gradient direction f R and the output of a Modiﬁed Preconditioned Conjugate Gradient (MPCG) method applied to the linear Newton system: prRi d 4 f R . The output is either an inexact Newton step A³R  deﬁned by,

(3.2)

prR9 ³R 

d

4 Ro´zR f

such that

´zR R j jµijf j

!&5R©

or a direction of negative curvature, detected by MPCG. Algorithm MPCG is given in greater detail in

FIG. 11, Appendix B. Our implementation of the Steihaug method can also be found in Appendix B.

Both the Steihaug and subspace implementations are wrapped in a standard trust region framework for

the unconstrained minimization problem. For both methods the preconditioning matrix used is ¶ R d· 2R

where · R is the diagonal matrix computed from · R || d¹¸  prR ||  for pfR ||dº 0 and · R || d 1 otherwise.

The same strategy is used to update used for the subspace method and j

∆R 
j»

(see v 6 for more details). We let for the Steihaug method ([6]).

∆0

d

0e 1 jf 0 j where the j  j 2 is

We used twenty different unconstrained nonlinear test problems. All but four are test problems

described in [12], but with all the bound constraints removed. The problems EROSENBROCK and

EPOWELL are taken from [13]. The last two problems, molecule problems MOLE1 and MOLE3, are described in [9, 10]. For all problems, the number of variables X is 260. The minimization algorithm terminates when jfj 2 ! 10~ 6. We use the parameter  d 0e 0005 in both FIG. 11 and FIG. 12.
Tables 3 and 4 compare the Steihaug and subspace methods described above in terms of the number of

minimization iterations and the total number of conjugate gradient (CG) iterations. Table 3 shows problems

for which negative curvature was not detected, and Table 4 shows problems for which negative curvature

was detected. Although not included here, the function values and gradient norms (upon termination) were

virtually the same for both methods for all problems. Since these values were essentially the same among

the two methods, we only discuss the difference in iterations counts. The difference in minimization and

CG iteration counts is plotted in FIG. 3 and FIG. 4.

Most notable in Table 3 and the graphs of FIG. 3 is how strikingly similar the results are for the

Steihaug and subspace methods; the minimization with each method stops within two iterations of the

other in all cases. Furthermore, both methods take an identical number of total CG iterations except for

the problem BROWN1 where the Steihaug method takes four more iterations. When negative curvature

is encountered, shown in Table 4 and in FIG. 4, the iteration counts for each method are again similar

for a few problems. For most problems, however, the Steihaug method takes more iterations, and for

some problems the difference is substantial. This is particularly true for the problems CHAINWOOD,

MOLE1 and MOLE3 (for CHAINWOOD, problem 3 in FIG. 4, the total difference in iteration counts is

8

Minimization

CG

Problem

Subspace Steihaug Subspace Steihaug

1. BROWN1

27 29

39 43

2. BROWN3

66

66

3. BROYDEN1A

11 11

81 81

4. BROYDEN1B

5 5 34 34

5. BROYDEN2B

7 7 71 71

6. CHAINSING

22 22 188 188

7. CRAGGLEVY

21 21 125 125

8. DEGENSING

22 22 188 188

9. EPOWELL

18 18

72 72

10. GENSING

22 22

83 83

11. TOINTBROY

7 7 58 58

12. VAR

43 43 5590 5590

TABLE 3 Comparison when only positive curvature is encountered: number of iterations

Minimization

CG

Problem

Subspace Steihaug Subspace Steihaug

1. AUGMLAGN

36 29 267 228

2. BROYDEN2A

22 19 247 196

3. CHAINWOOD

156 988 3905 3878

4. EROSENBROCK

44 46

52 86

5. GEROSE

23 33 166 165

6. GENWOOD

58 63 304 275

7. MOLE1

46 119

460 376

8. MOLE3

125 186 6311 5356

TABLE 4 Comparison when negative curvature is encountered: number of iterations

9

minimization iterations CG iterations

100 excess Steihaug iterations excess subspace iterations
80 60 40 20
0 1 2 3 4 5 6 7 8 9 10 11 12 positive curvature problems

100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12
positive curvature problems

FIG. 3. Comparison of subspace and Steihaug trust region methods for unconstrained problems

minimization iterations CG iterations

100 *832* 80 60 40 20 0 12345678 negative curvature problems

100 *955* 80 60 40 20 0 12345678 negative curvature problems

FIG. 4. Comparison of subspace and Steihaug trust region methods for unconstrained problems

10

explicitly noted as it is beyond the scale of the graph). In general the subspace method does take more CG iterations on problems with negative curvature, but it is these extra relatively inexpensive CG iterations that reduce the total number of minimization iterations. (Again, for the problem MOLE3 the difference in CG iterations is explicitly noted in FIG. 4 as it is beyond the scale of the graph.)
A closer examination of the behavior of the two algorithms indeed shows that when negative curvature is not encountered, both methods take similar steps. (In this case, if the trust region is large enough, both methods in FIG. 11 and FIG. 12 will stop under the same conditions after the same number of CG iterations, as displayed in Table 3.) By the nature of the algorithms, if the Steihaug method detects negative curvature, then so will the subspace approach. However if the subspace algorithm detects negative curvature, the Steihaug method may terminate before it ﬁnds negative curvature; and then it does not converge (to a local minimizer) as quickly as the subspace method. The important role that negative curvature plays is supported by the fact that the subspace method often moves in a substantial negative curvature direction when the Steihaug method overlooks negative curvature. Furthermore, it is when the trust region radius ∆R is small that the Steihaug method is most likely to stop early and miss negative curvature. Thus it appears that the effectiveness of the Steihaug idea decreases as nonlinearity increases.

4. The STIR Method. Supported by the discussion in v 3, we propose a large-scale subspace adaptation of the TIR method [1] for the bound constrained problem (1.1).
In moving from the unconstrained subspace approach to the box-constrained setting, it seems natural to replace the full trust region subproblem (1.2) by the following subspace subproblem

(4.1)

h m¤§¦©in¨ )  R©¥ : jt

R ! j2

∆R©¬'W²Ri75

where ²eR is a small-dimensional subspace in 1@ , e.g., a two-dimensional subspace. A two-dimensional

subspace for the trust region subproblem (2.5) can be selected from the span of the two vectors

) t

R~ 1f Rw ˆR

7

and a negative curvature direction ¼ ˆ R

for  ˆ R .

This suggests that we form ²"R

from the

directions ) t

R~ 2f Ru t

R~ 1ˆR

 t

R~ 1¼ ˆ R57 . Will such subspace formulations succeed in achieving optimality?

We examine this issue in more detail. It is clear that the including the scaled gradient direction t R~ 2f R in ²eR , and satisfying (AS.3), will

guarantee convergence to a point satisfying the ﬁrst-order optimality conditions. Let us assume for now that )¥TR57 converges to a ﬁrst-order point    . To guarantee that    is also a second-order point the

following conditions must be met. Firstly, it is clear that a “sufﬁcient negative curvature” condition must be carried over from the
unconstrained setting [14]. To this end, we can require that sufﬁcient negative curvature of the matrix  ˆ R be captured if it is indeﬁnite, i.e., ²R must contain a direction ¼ R d t R~ 1¼ ˆ R such that

(4.2)

¼

ˆ R

j

 ¼

ˆ ˆR

R

¼ j

ˆR

!&½$¾5)54(¿ @iÀ  TÁ©Â

| @   ˆ R§75e

Secondly, it is important that a solution to (4.1) lead to a sufﬁciently large step — the potential difﬁculty is running into a (bound) constraint immediately. This difﬁculty can be avoided if the stepsize sequence, along the trust region solution direction, is bounded away from zero. Subsequently, we deﬁne:
11

DEFINITION 4.1. A direction sequence )§zR57 has large-step-size if lim inf RÃ¬Ä{ t R2 ARi5F86 .

If fast local convergence is desired then the subspace ²lR should also contain a sufﬁciently accurate approximation to the the Newton direction t R~ 1 ˆR when  ˆ R is positive deﬁnite and ˆR d 4  ˆ R ~ 1f ˆR . An inexact Newton step ˆR for problem (1.1) is deﬁned as an approximate solution to

 ˆ R d 4 f ˆR5

with accuracy wR :

(4.3)

 ˆ R ˆ³R  d 4 f ˆRt´¥R such that j ´zR jµij f ˆR j !05Rue

Can we select two-dimensional subspaces satisfying all three properties and thus guarantee quadratic

(superlinear) convergence to a second-order point? The answer, in theory, is yes — the subspace adaptation

of TIR algorithm (STIR) in FIG.5 is an example of a subspace method capable of achieving the desired

properties.

To ensure convergence to a solution, the solution sequence of the subspace trust region subproblems

(4.1) need to have large-step-size. Lemma 1 below indicates that this can be achieved if we set ²R d

span) ¼ RuÅ¥Ri7 , where ) ¼ R57 and )¥ÅzR57 are two sequences of uniformly independent vectors in the sense

that lim inf ) j ÅzR4

¼

R

7Æm j

0, each with large-step-size.

LEMMA 1. Assume that ) ¼

R57

and )¥ÅzR57

have large-step-size with jt

R¼

R j

d

1 and jt

RzÅ¥R j

d

1.

Moreover, lim inf RÃ¬Ä

) ÅzR¬4 j

¼

R 7m j

0. Then the solution sequence ) ¨ R97 to the subproblem (4.1) with

²eR d span)¥ÅzR5 ¼ R57 has large-step-size.

Proof. The proof is very straightforward and is omitted here.

For the STIR method, a natural extension of the condition (AS.4) necessary for second-order optimality is the following.

(AS.5) Assume that ¨ R

is a solution to minh ¤z¦ ¨ ) 

R©¥

: jt

R9 j

!

∆RwdP'o²eR97 and §e© and

§ §

© 0©
0

are two positive constants. ∆R and gRq#ARr'WVX`bace

Then

¥R

satisﬁes 

RR9HF&§e© 

R 

s ¨

Rpq where jt

R9R j

!

Theorem 2 below, with the proof provided in the Appendix, formalizes the convergence properties of STIR.

THEOREM 2. Let the level set Ç

d

)¥3'1@

:

H!{l

3'a 7
0

be compact and 

: aÈIÉ1

be twice continuously differentiable on Ç . Let )¥SRi7 be the sequence generated by the STIR algorithm in

FIG.5. Then

1. If (AS.3) is satisﬁed, then the Kuhn-Tucker condition is satisﬁed at every limit point.

12

The STIR Method

[Let 0 FhFF 1, 0 F Λ F Λ and  1 F 1 F  2 be given. Let  0 '

VX`Abac F ∆0

Λ .]

For  d 0 1AA 1. Compute R , f R , t R , prR , and u R ; deﬁne the quadratic model

 R©q¥ d

f5R q

1
2



prR

R9e u

2. Compute a step ¥R , with gRrR'ÊVXg`Abac , based on the subspace subproblem,

mh in)  R©¥ : jt

R9 ! j2

∆RwU'W²Ri75

where the subspace ²"R is set up as below. 3. Compute

kR d

TR#AR9"4ClTRAS  RR

1
2

R

u

R9R

e

4. 5.

If k RUmB Update ∆R

then set TRn 1 d as speciﬁed in

TRq#AR FIG.1.

. Otherwise set gRn

d
1

TR .

Determine Subspace ²eR : [Assume that ¼ R d t R~ 1¼ ˆ R where ) ¼ Ru7 has large-step-size. Let 0 F  F 1 be a small positive constant.]

IF  ˆ R is positive deﬁnite ²eR dd ef span) t R~ 2f R5 ¼ Rw7
ELSE  ˆ R is not positive deﬁnite

IF  t

R~

2sgn  f

R 



Rw t

R~ 2sgn  f R§HF

²eR dd ef span) t R~ 2sgn  f R7

ELSE ²eR dd ef span) t R~ 2sgn  f R ¼ R97

END

END

ËxÌËÏ±rÍ 2± Î Ë ± 2Ë 2 ¼ R 

R¼ R

FIG. 5. The STIR Method for Minimization Subject to Bound Constraints 13

2. Assume that both (AS.3) and (AS.5) are satisﬁed and ¼ ˆ R in FIG. 5 contains sufﬁcient negative curvature information whenever  ˆ R is indeﬁnite, i.e.,

¼

ˆ R j

¼

ˆ ˆR

R j

¼ ˆR
2

!

max Ð4¿

@9À



TÁ


min



ˆ R¥d

with ¿ @9À m 0 and 0 F  F 1. Then

(a) If every limit point of )¥R57 is nondegenerate, then there is a limit point    at which

both the ﬁrst and second-order necessary conditions are satisﬁed. (b) If    is an isolated nondegenerate limit point, then both the ﬁrst and second-order
necessary conditions are satisﬁed at    . (c) If  ˆ   is nonsingular for some limit point    of )¥TR57 and ¼ ˆ R d ˆR whenever  ˆ R is
positive deﬁnite, then  ˆ   is positive deﬁnite, )¥gR57 converges to    , all iterations are eventually successful, and ) ∆R57 is bounded away from zero.

The degeneracy deﬁnition is the same as in [1].

DEFINITION 4.2. A point 'Wa is nondegenerate if, for each index V :

(4.4)

f q| d 0 dSÑ |lF#i|F&%T|Ðe

We have established that in principle it is possible to replace the full-dimensional trust region subproblem with a two-dimensional variation. However, the equally strong convergence properties of STIR hinges on obtaining (guaranteed) sufﬁcient negative curvature direction with large-step-size. We discuss this next.

5. Computing Negative Curvature Directions with Large-Step-Size. Is it possible, in principle,

to satisfy both the sufﬁcient negative curvature requirement (4.2) and the large-step-size property? The

answer is yes: let %SR be a unit eigenvector of  ˆ R corresponding to the most negative eigenvalue, i.e.,

t hˆeRA%seR qud enÁ cmein)  t 

ˆ R~

Rzq%R . 1 %R97

It is easily veriﬁed that has large-step-size.

for

any

convergent

subsequence

limRÃ}Ä

Á


min



ˆ R§(F

0,

However, it is not computationally feasible to compute the (exact) eigenvector %eR . Therefore,

approximations, and short cuts, are in order. Can we compute approximate eigenvectors with large-step-

size?

A good approximation to an eigenvector corresponding to an extreme eigenvalue can usually be obtained through a Lanczos process [15]. Using the Lanczos method for  ˆ R with an initial vector Ò ˆR ,

approximate eigenvectors at the Ó -th step are computed in the Krylov space

Ô   ˆ R5 Ò ˆRu Ó  dd ef span  Ò ˆR5  ˆ R Ò ˆR5AAA 8ˆ RÕ ~ 1Ò ˆR§e

In the context of our algorithm, the vectors t R~ 1sgn f R§ or t R~ 1f R are natural choices for the initial vector Ò ˆR when applying the Lanczos method.

14

Our key observation is the following. If a sequence ) t R~ 1Ò ˆR57 has large-step-size then each sequence in t R~ 1 ) Ò ˆR©  ˆ R Ò ˆRwAA  ˆ RÕ ~ 1Ò ˆR97 retains this property.
Now assume that ¼ ˆ R is the computed vector from the Lanczos method which contains the sufﬁcient negative curvature information with respect to  ˆ R . It can be veriﬁed, based on the recurrence relation, that ) t R~ 1Ò ˆR1 AA t R~ 1 Ò ˆÕR 7 all have large-step-size if the Lanczos vectors ) Ò ˆR1 AA Ò ˆÕR 7 retain orthogonality. Since ¼ ˆ R is in the Krylov space Ô   ˆ R5 Ò ˆRu Ó  , it is clear that ) ¼ R d t R~ 1¼ ˆ Ri7 has large-step-size. In other

words, in order to generate a negative curvature direction sequence with large-step-size, orthogonality

needs to be maintained in the Lanczos process. Fortunately, as discussed in [16], it is quite reasonable

to assume that until all of the distinct eigenvalues of the original matrix have been approximated well,

orthogonality of the Lanczos vectors are well maintained. Since we are only interested in a direction with

sufﬁcient negative curvature, we expect that it can be computed before loss of orthogonality occurs.

A second (and cheaper) strategy is to employ a modiﬁed preconditioned conjugate gradient scheme,

e.g., MPCG in FIG.12. Unfortunately, this process is not guaranteed to generate sufﬁcient negative

curvature; nonetheless, as indicated in [17], the MPCG output will satisfy the large-step-size property.

Finally we consider a modiﬁed Cholesky factorization, e.g., [18], to obtain a negative curvature direction. Assume that )  ˆ R57 is indeﬁnite and )§wR97 is obtained from the modiﬁed Cholesky method. We demonstrate below that )§wR d t R~ 15ˆR97 has large-step-size under a nondegeneracy assumption.
The negative curvature direction iˆR d t R95R computed from the modiﬁed Cholesky method (see

[18], page 111) satisﬁes Ö

ÖÖ

Ö R iˆR dØ× Õ ±  and ¶R   ˆ R ¶ R#ÙR d R diag ÚR§ R 

where R is a lower triangular matrix, ¶ R

i.e., × Õ ± bVÐ d

0, if V dº

Ó

R

and × Õ ±

 R§ Ó

d

1.

Without loss of generality, we assume that

is a permutation Moreover, Ù(R is ¶ R dÛ .

matrix and × Õ ± a bounded and

is the Ó R th elementary non-negative diagonal

vector, matrix.

We argue, by contradiÖ ction, that )§wRi7 has tÖhe large-step-size property. Assume that )§©R97 does not

have this clear that deﬁnition

property. 5ˆRws jk 1 : (2.4) of



From np d 0. ˆ R , the

R

ﬁM5rˆsRotrdÜeÓ oRUv× 4 Õe±r,1afnrcodommthpa otˆnR e5nRˆRtissGoafÙloR) w5t ˆR eR r5d ˆtR9ri7 ÚaRnaÕ gr± euÖ lbaR or× uÕ m±n,adteÚrdRPix. mÈwTiÚhthifsouirnmistpodlmiieaesgÚothnmaatls0),

it is and Õ± 7

converges to zero. From the modiﬁed Cholesky factorization, the matrix  ˆ Ris 1 : jk  1 : jkp is indeﬁnite but  ˆ Ris 1 : jk4 1 1 : jk4 1p

is positive deﬁnite. But this is impossible for sufﬁciently large  because, again using the deﬁnition (2.4) of  ˆ R , )  ˆ Rus 1 : jk  1 : jkp7 converges to a matrix of the form

Ý  ˆ Rws 1 : jk4 1 1 : jk4 1p 0

0 5RÞ

where wR is positive (because of the nondegeneracy assumption). Therefore, we conclude that )§iR57 has large-step-size.

6. Computational Experience. We demonstrate the computational performance of our STIR method given in FIG.5. Below we report our experience with the modiﬁed Cholesky and the conjugate gradient
15

(MPCG) implementations. We examine the sensitivity of the STIR method to a starting point. Finally,

some limited comparisons with SBMIN of LANCELOT [8] are also made. In the implementation of STIR, we compute ¥R using a reﬂective technique as shown in FIG.2. The

exact trust region updating procedure is given below in FIG.6.

Updating Trust Region Size ∆R

Let  d 0e 25} d 0e 75, Λ d 1, Λ d max ß à | min db%T|g4t| 2  1000A 1 ,

∆0 d

min  0e 1 jfj  Λ  ,  0 d

0e 0625



d
1

0e 5  2 d

2 be given.

1. 2. 3. 4.

If k If k If k If k

RU! 0 then set ∆Rn RU'o 0p then set

d 1 Rn ∆

RU'oYlri then set ∆Rn

RUv& then


1 1

0∆; d max d ∆Rue





Ru 0∆



1 jt

R9AR e j

if ∆RUm Λ

∆Rn

d
1

 2∆R

otherwise

∆Rn

d
1

min  max  ∆Rw  2 jt

R9AR j  Λ  .

FIG. 6. Updating Trust Region Size

Our experiments were carried out on a Sun Sparc workstation using the Matlab environment. The stopping criteria used are as follows. We stop if

either or or

lgR¥4tYTRn dH! TRn 4TR ! 1




1

1

{ lTRz¥á

j no

1
negative

j2 2 curvature

has

been

detected

for



ˆ

R

and jt

RRÄ fj

!

e 1

We deﬁne  1 d

10~

10 and 

dÜâ
2

 1µ 10 d

10~ 6. We also impose an upper bound of 600 on the number

of iterations.

We ﬁrst report the results of the STIR method using the modiﬁed Cholesky factorization. Table 5

lists the number of iterations required for some standard testing problems (for details of these problems

see [12]). (For all the results in this paper, the number of iterations is the same as the number of objective function evaluations.) The problem sizes vary from 100 to 10 000. The results in Table 5 indicate that, for

these testing problems at least, the number of iterations increases only slightly, if at all, with the problem

size. Moreover, in comparison to the unconstrained problems, the presence of the bound restrictions does

not seem to increase the number of iterations. This is depicted pictorially in FIG. 7. In this graph, the

problem size is plotted versus iteration count. For each problem, the corresponding points have been

connected to show how the iteration count relates to the problem size.

Our second set of results are for the STIR algorithm but using a conjugate gradient implementation. We use the algorithm MPCG in FIG.12 to ﬁnd the directions needed to form the subspace ²R . The stopping condition applied to the relative residual in MPCG is  d 0e 005. The results are shown in

Table 6 and FIG. 8. Again, for these problems the iteration counts are low and steady. The exception is for the problem VAR C with 10 000 variables, where the iteration count jumps to 86. This is one of

16

X

Problem

100 200 500 1000 10000

GENROSE U

25 25 25 25

25

GENROSE C

11 11 11 11

10

GENSING U

24 25 25 26

27

GENSING C

18 19 20 20

21

CHAINSING U 23 23 23 23

23

CHAINSING C 16 16 16 16

19

DEGENSING U 22 23 23 40

39

DEGENSING C 28 28 28 28

29

GENWOOD C

9 10 10 10

11

CHAINWOOD C 9 10 10 10

11

BROYDEN1A U 12 12 13 13

14

BROYDEN1A C 11 11 11 11

11

BROYDEN1B U 7 7 7

7

7

BROYDEN1B C 8 8 8

8

8

BROYDEN2A U 13 13 13 14

14

BROYDEN2A C 14 19 17 19

19

BROYDEN2B U 9 9 9

9

9

BROYDEN2B C 13 11 15 14

15

TOINTBROY U

888

8

8

TOINTBROY C

999

9

9

CRAGGLEVY U 16 14 15 16

15

CRAGGLEVY C 29 29 30 30

31

AUGMLAGN C 38 32 35 36

37

BROWN3 U

888

8

8

BROWN3 C

17 10 11 9 11

BVP U

9 10 9

8

8

BVP C

11 11 10 10

7

VAR U

9 9 10 12

15

VAR C

18 18 23 45

38

TABLE 5 STIR method with exact Newton steps: number of iterations

17

100

iterations

50

0100 1000

problem size

FIG. 7. STIR performance with exact Newton steps 100

10000

iterations

50

0100 1000

problem size

10000

FIG. 8. STIR method with inexact Newton steps
several degenerate problems included in this test set. With a tighter bound  on the relative residual in MPCG, we could decrease the number of minimization iterations for this problem (note that the STIR with exact Newton steps only takes 38 iterations). However, this change would also increase the amount of computation (conjugate gradient iterations).
Next we include some results which indicate that our STIR method is fairly insensitive to the starting point. The results in Table 7 were obtained using exact Newton steps on problems of dimension 1000. The results in Table 8 were obtained using the conjugate gradient implementation, also on problems with 1000 variables. The starting points are as follows: original is the suggested starting point according to [12]; upper starts all variables at upper bounds; lower starts all variables at the lower bounds; middle starts at the midpoint between bounds; zero starts each variable at zero (the origin); upper-lower starts the odd variables at the upper and the even variables at the lower bounds; lower-upper is the reverse of this. For all of these, we perturb the starting point slightly if necessary to be strictly feasible. Note that for the problem BROWN3 C, the iteration count is not shown starting at middle and at origin as the gradient is undeﬁned at both these starting points. These results are also shown graphically in FIG. 9 and FIG. 10. From these graphs it is clear that both implementations of STIR are fairly robust when it comes to starting
18

X

Problem

100 200 500 1000 10000

GENROSE U

21 21 21 21

21

GENROSE C

10 10 10 10

17

GENSING U

23 23 24 24

25

GENSING C

16 16 16 16

16

CHAINSING U 21 21 21 21

21

CHAINSING C 14 17 19 19

20

DEGENSING U 32 32 33 33

35

DEGENSING C 33 56 35 33

31

GENWOOD C

888

8

8

CHAINWOOD C 8 8 8

8

8

BROYDEN1A U 11 11 11 11

12

BROYDEN1A C 9 8 8

8

8

BROYDEN1B U 6 6 6

6

6

BROYDEN1B C 7 7 7

7

7

BROYDEN2A U 15 15 19 17

20

BROYDEN2A C 10 10 10 10

10

BROYDEN2B U 8 8 8

8

9

BROYDEN2B C 9 9 9

9

9

TOINTBROY U

777

7

7

TOINTBROY C

888

8

8

CRAGGLEVY U 26 26 27 27

29

CRAGGLEVY C 26 26 26 26

27

AUGMLAGN C 26 33 29 34

27

BROWN3 U

777

7

7

BROWN3 C

777

7

8

BVP U

13 13 12 13

25

BVP C

15 15 14 14

15

VAR U

34 35 35 37

36

VAR C

19 21 32 36

86

TABLE 6 STIR method with inexact Newton steps, ãräAãå¥ãæ5ãç 0è 005: number of iterations

19

Problem GENROSE C GENSING C CHAINSING C DEGENSING C GENWOOD C CHAINWOOD C BROYDEN1A C BROYDEN1B C BROYDEN2A C BROYDEN2B C CRAGGLEVY C AUGMLAGN C BROWN3 C BVP C VAR C

original 11 20 16 28 10 10 11 8 19 14 30 36 9 10 45

upper 27 31 29 47 18 17 24 22 38 30 38 40 28 17 9

Starting Point lower middle zero
33 15 16 45 25 22 33 13 11 39 52 42 14 13 10 14 13 10 25 13 12 19 18 9 38 13 9 34 12 8 33 26 26 26 36 15 14 * * 8 9 10 32 18 21

up-low 43 31 32 39 17 17 25 19 38 33 34 23 28 11 23

low-up 27 32 30 36 17 16 24 21 38 30 37 37 14 17 17

TABLE 7 STIR method with exact Newton steps for é¯ê 1000: number of iterations

points. This is in contrast to active set methods where the starting point can have a more dramatic effect on the iteration count.
Last we contrast the performance of the STIR method using the conjugate gradient option with the SBMIN algorithm, an active set method, in the LANCELOT software package [8]. In particular, we choose problems where negative curvature is present or where it appears that the “active set” at the solution may be difﬁcult to ﬁnd. We expect our STIR method to outperform an active set method in these situations; indeed, we have found this to be the case. For these problems, we use the default settings for LANCELOT and adjusted our STIR stopping conditions to be comparable if not more stringent.
First consider a constrained convex quadratic problem. The results, given in Table 9, show that our proposed STIR method is markedly superior (by an order of magnitude) to SBMIN on this problem (c.g. it is the total number of conjugate gradient iterations). SBMIN takes many iterations on this problem when the starting point is near some of the bounds — the method mis-identiﬁes the correct active set at the solution and takes many iterations to recover. Our proposed STIR method, a strictly interior method, moves directly to the solution without faltering when started at the same point.
Table 10 summarizes the performances of STIR and SBMIN, on a set of constrained problems exhibiting negative curvature. (Again the problems are from [12] except the last two have been constrained differently to display negative curvature.) STIR is signiﬁcantly better on these problems — this is probably due to the fact that negative curvature is better exploited in our subspace trust region approach than in
20

Problem GENROSE C GENSING C CHAINSING C DEGENSING C GENWOOD C CHAINWOOD C BROYDEN1A C BROYDEN1B C BROYDEN2A C BROYDEN2B C CRAGGLEVY C AUGMLAGN C BROWN3 C BVP C VAR C

original 10 16 19 33 8 8 8 7 10 9 26 34 7 14 36

upper 23 27 29 43 14 14 24 21 35 28 39 60 29 21 7

Starting Point lower middle zero
37 17 20 57 26 22 33 11 10 37 42 37 10 11 8 10 11 8 21 13 8 16 13 8 35 13 8 32 12 8 35 27 24 45 32 10 53 * * 14 13 14 34 29 25

up-low 37 30 33 37 13 13 21 16 36 31 35 24 29 14 28

low-up 23 29 28 44 13 13 24 21 35 28 45 46 53 21 8

TABLE 8 STIR method with inexact Newton steps for é¬ê 1000: number of iterations

100

iterations

50

0

orig up

low

mid

zero

up−lo

lo−up

FIG. 9. STIR method with exact Newton steps at varied starting points

X d 800 BIGGSB2

inexact STIR iteration c.g. it
16 5551

SBMIN iteration c.g. it
281 53157

TABLE 9 STIR with inexact Newton steps vs. LANCELOT SBMIN on a convex quadratic: number of iterations

21

100

iterations

50

0

orig up

low

mid

zero

up−lo

lo−up

FIG. 10. STIR method with inexact Newton steps at varied starting points

inexact STIR X

SBMIN X

Problem

100 1000 10000 100

1000 10000

AUGMLAGN U CHAINWOOD U

34 30 122 1004

37 29 38 46 8953 6594 m 10000 m 10000

GENWOOD U

62 67

63 439

952

554

GENROSE U

29 29

29 76

76

76

CHAINWOOD NC 17

GENWOOD NC

16

31 24

16 54

48

23 47 m 1000

61 60

TABLE 10 STIR with inexact Newton steps vs. LANCELOT SBMIN when negative curvature exists: number of iterations

the Steihaug trust region method, which SBMIN employs. This is consistent with results presented in v 3, e.g., see Table 4.
7. Conclusion. Based on the trust-region interior reﬂective (TIR) method in [1], we have proposed a subspace TIR method (STIR) suitable for large-scale minimization with bound constraints on the variables. In particular, we consider a two-dimensional STIR in which a subspace is formed from the scaled gradient and (inexact or exact) Newton steps or a negative curvature direction.
We have designed and reported on a variety of computational experiments. The results strongly support the different components of our approach: the “subspace idea”, the use of our novel afﬁne scaling matrix, the modiﬁed Cholesky factorization and conjugate gradient variations, and the “reﬂection technique”. Moreover, preliminary experimental comparisons with code SBMIN, from LANCELOT [8], indicate that our proposed STIR method can signiﬁcantly outperform an active-set approach for some large-scale problems.

22

REFERENCES

[1] Thomas F. Coleman and Yuying Li. An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, to appear.
[2] Thomas F. Coleman and Yuying Li. On the convergence of reﬂective Newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67:189–224, 1994.
[3] Thomas F. Coleman and Yuying Li. A reﬂective Newton method for minimizing a quadratic function subject to bounds on the variables. SIAM Journal on Optimization, to appear.
[4] Jorge J. More´ and D.C. Sorensen. Computing a trust region step. SIAM Journal on Scientiﬁc and Statistical Computing, 4:553–572, 1983.
[5] D.C. Sorensen. Minimization of a large scale quadratic function subject to an ellipsoidal constraint. Technical Report TR94-27, Department of Computational and Applied Mathematics, Rice University, 1994.
[6] T. Steihaug. The conjugate gradient methods and trust regions in large scale optimization. SIAM Journal on Numerical Analysis, 20:626–637, 1983.
[7] I.I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademiia Nauk SSSR, 174:747– 748, 1967.
[8] A. R. Conn, N. I. M. Gould, and Ph. L. Toint. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A). Springer-Verlag, 1992.
[9] Bruce A. Hendrickson. The molecule problem: Determining conformation from pairwise distances. Cornell University Ph.D thesis, Computer Science, 1991.
[10] Thomas F. Coleman. Large-scale numerical optimization: Introduction and overview. In Allen Kent and James G. Williams, editors, Encyclopedia of Computer Science and Technology, pages 167–195. Marcel Dekker, INC., 1993.
[11] Gerald A. Shultz, R. B. Schnabel, and Richard H. Byrd. A family of trust-region-based algorithms for unconstrained minimization with strong global convergence properties. SIAM Journal on Numerical Analysis, 22(1):47–67, 1985.
[12] Andrew R. Conn, N. I. M. Gould, and Ph. L. Toint. Testing a class of methods for solving minimization problems with simple bounds on the variables. Mathematics of Computation, 50(182):399–430, 1988.
[13] Jorge J. More´, Burton S. Garbow, and Kenneth. E. Hillstrom. Testing unconstrained optimization software. ACM Transactions on Mathematical Software, 7:17–41, 1981.
[14] Richard H. Byrd, Robert B. Schnabel, and Gerald A. Shultz. Approximate solution of the trust region problem by minimization over two-dimensional subspaces. Mathematical Programming, 40:247–263, 1988.
[15] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1989. [16] Jane K. Cullum and Ralph A. Willoughby. Lanczos Algorithms for Large symmetric eigenvalue computations, Vol. 1
Theory. Birkhau¨ser, Boston, 1985. [17] Mary Ann Branch. Inexact Reﬂective Newton Methods for Large-Scale Optimization Subject to Bound Constraints. PhD
thesis, Department of Computer Science, Cornell University, 1995. [18] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, 1981.

A. Proofs for THEOREM 2. The convergence results (THEOREM 2) for the STIR algorithm can be obtained in a similar manner to THEOREM 3.10 for the full-space trust region and interior reﬂective method (TIR)[1]. Indeed, ﬁrst-order optimality is a direct consequence of the condition (AS.3). The second order optimality rests on the fact that the solution subsequence of the subspace trust region subproblem would have large-step-size if the corresponding )  ˆ   7 were indeﬁnite at a limit point (see Lemma 3 below). Moreover, if  ˆ R is positive deﬁnite at a limit point then we prove that the step size along the subspace trust region solution is sufﬁciently large in the following sense:



RUvìëeí ∆R

min  jt

R~ 1f R j  jt

R5zR

 j

for some ëeí m

0e

23

Here  R is the stepsize, along ¨ R , to the boundary of the feasible region ( see Lemma 4 ). Based on this

inequality, it follows that the trust region size is bounded away from zero and Newton steps are eventually

successful.

Assume that ¨ R is a solution to a subspace trust region subproblem (4.1) with ²lR d span) ¼ R©ÐÅzR97 .

Assume that the columns of î

R

form an orthonormal basis for span) t

R¥Å¥R© t

R¼

R97 . Then ¨ R

d

t R~ 1î R¥ï5R

where ï5R solves

(A.1)

 îrR   ˆ R î R Á R Û qï5R d 4 îrR  t R~ 1f Rw îrR   ˆ R î R Á R Ûfd{ð R ð Ru

and (A.2)

 t

R R§ ¨

 ˆR t

R RH ¨

Á R j ï5R j 2 d

4¬ t

R ¨

R§ 

f ˆR©e

Next we prove that the subspace trust region solution sequence from the STIR algorithm in FIG.5.

has large-step-size if the corresponding sequence )  ˆ Rw7 satisﬁes that limRÃ¬Ä

Á


min

ˆ R§HF

0.

) ¨

R97

LEMMA 3. If lim of the subspace

sup Á trust


min



ˆ RHF

0 for a

region subproblem

subsequence, then the corresponding (4.1) has large-step-size.

solution

subsequence

Proof. Consider two subsequences of )  ˆ Rw7 : one sequence satisﬁes  ²R© d 1 and the other sequence

has  ²eR© d 2.

For the subsequence with  ²R© d 1, the corresponding trust region solution sequence clearly has

large-step-size.

)¥Å¥Rw7

For d

the

) t

R~

subsequence with 2sgn  f R7 and ) ¼

 ²"R d Rw7 have

2, it is clear that lim inf large-step-size, we have

)

thjtaËqtÌÌ) ±±ÍÍ

¨

2sgnñ Î ±ò 2sR5g7 nhñ Îa±sò

Ëlar4 geË-ÏÏst±± eË pj -s7tizm e

0. Since following

Lemma 1.

We state the following result which is similar to Lemma 8 in [1] and omit the proof.

LEMMA 4. Assume that (AS.4) is satisﬁed. Then §"©
4  R©AR§v 2 s min) 1  2R 7 Á R ∆2R  min) 1  R57 j ð R¥ï5R j 2pq where  R is the stepsize along ¨ R d t R~ 1î R¥ï5R to the boundary and ïuR is deﬁned by (A.1).

Let  R denote the Newton step (2.3) of (2.1). Then

(A.3)

diag  f RSR t R§¥R d 4 t R~ 1f R¢4 t R~ 1 prR t R~ 1 t R§¥R e

The next result is required to establish that Newton steps  R will eventually lead to successful steps.

LEMMA 5. Assume that¨ R is a solution to the subspace subproblem (4.1) with ²lR d span) t R~ 2f R5 R 7 . If )¥TR97 converges to a nondegenerate point    where the second order sufﬁciency conditions are satisﬁed,
24

then

(A.4)



RUvóëí ∆R

min  jt

R~ 1f R j  jt

R9¥R

 j

for  sufﬁciently large, where  R is the stepsize to the boundary along ¨ R .

Proof. By deﬁnition,

R d

m| in  max 

­R

|

4DTR ¨ R|

|



%R

| ¨

4TR R|

|

e

For any  , if t R~ 1f R d we have

k R t R§ R

for some k RU'1 1, then ¨ R d

4

∆± ËqÌ±fÍ 1Î

±

Ë

t

R~ 2f R . Hence, if ô51õ

v

R jf

Ä j

,



RUvóëí ∆R

min  jt

R~ 1f R j  jt

R9¥R

e j

Assume that t R~ 1f R dº k R t R9 R . We ﬁrst show that if we can establish

(A.5)

R ¨

d

RwÐ4 

t

R~ 2f RiSD§gRi¥R

where §gRfv 0 and  Rfv 0

then (A.4) holds. From (A.5) and  t

R R

 

Ð4

t

R~ 1f Rv

0, we have

®§R t

R9zR

 

 R©Ð4 

t

R~ 1f RHv

0e

Using ¨ ˆR d

t

RR ¨

d

R©Ð4 

t

R~ 1f R§t§R t

R9 R

again,

j ¨ ˆR j 2 d

 R2 jt R~ 1f R j 2 

2  R t

R¥R

 

®§gRuÐ4

t

R~ 1f R§dSo§ R2 jt

R9¥R

j 2e

But j ¨ ˆR j ! ∆R . Hence

0 !C§RU!

∆R jt R~ 1f R j

and 0 !

Rr! 

∆R e jt Ri R j

Hence, from (A.3), the boundedness of f R , t R~ 1f R , t R9 R , and the fact that    is a nondegenerate ﬁrst-order point, it is easy to verify that



RUvìëeí ∆R

min  jt

R~ 1f R j  jt

R5zR

 j

for some ëeí m

0e

Finally, we need to establish (A.5) under the linearly independent assumption t R~ 1f R dº k R t R9 R .

Assume that the columns of î R form an orthonormal basis for span) t

R~

1f

R© t

R R

7 . Then î R î R  t

R~ 1f R

d

t

R~

1f

R ,î

R î

R

t

R9 R

d

t R R , and î R  î R dÛ 2 where Û 2 is the 2-by-2 identity matrix. Moreover,

R ¨

d

4 t R~ 1î R©s îrR    ˆ R

Á

 Û

 î

Rp~

1îÆR 

t

R~ 1f R

where Á   v

0 and if Á   m

0, jt

R ¨

R j

d

∆RUm

0. Let ¨ ˆ Á  d

t

R ¨

Ru Á



and

(A.6)

¨ ˆ  Á  dd ef 4 î Rus îrR    ˆ R

Á

Û

 î

RpY~

1îÆR 

t

R~ 1f Ru

for Á v

0e

25

Then there exists §H Á



and 

 Á



such that

¨ ˆ Á  d

§H AÐ4 Á

t

R~ 1f R9

 Át

Ri¥R e

First, it is clear that §H 0 d 0,   0 d 1. From (A.6),

ö Ã}limnÄ

¨ ˆ Á j ¨ ˆ Á

  j

d

t R~ 1f R  jt R~ 1f R j

and by the linear independence assumption t R~ 1f R dº k R t R§ R , we have

§H 

ö Ã}limnÄ

j ¨ ˆ

Á Á



j

d

1e jt R~ 1f R j

Hence, for Á sufﬁciently large, §H Á m 0.

We now prove that 

 Á

 Av

0 by contradiction. Assume that 

 Á

 AHF

0 (this means that jt

R§ R

m j

∆R ). From continuity of 

 Á

 ,

 0

d

1 and 

 Á

  (F

0, there exists 0 F

Á¯ F

Á   so that   Á¯  d

0. This

implies that

§ Á¯  t R~ 1f R d

4 î Ru îrR    ˆ RH

Á¯Û

 î

R§~

1îrR 

t

R~ 1f R5e

From  ˆ R ˆR d 4 t R~ 1f R , î R î R  ˆR d ˆR , and the columns of î R are linearly independent, there exists ë such that

s îrR   ˆ R î R¥p îfR  t R~ 1f R d ë s îÆR   ˆ R î R¥p îUR  zˆR e

Again using î

R î

R

t

R~ 1f R

d

t R~ 1f R and î R î R  ˆR

d

ˆR , we have

t R~ 1f R d ë t Ri¥R 

which contradicts the assumption t R~ 1f R dº k R t R9 R . Similarly, we can prove that §H Á  Av 0 based on §H Á m
holds. This completes the proof.

0 for sufﬁciently large Á . Therefore (A.5)

Now we establish the convergence properties of the STIR algorithm.

THEOREM 2. Let the level set Ç

d

)¥$'$1 @

:

l!El

P'Wa 7
0

be compact and 

: aÊI÷1

be twice continuously differentiable on Ç . Let )¥SRi7 be the sequence generated by the STIR algorithm in

FIG.5. Then

1. If (AS.3) is satisﬁed, then the Kuhn-Tucker condition is satisﬁed at every limit point. 2. Assume that both (AS.3) and (AS.5) are satisﬁed and ¼ ˆ R in FIG. 5 contains sufﬁcient negative
curvature information whenever  ˆ R is indeﬁnite, i.e.,

¼

ˆ R j

¼

ˆ ˆR

R j

¼ ˆR
2

!

max Ð4¿

@9À



TÁ


min



ˆ R¥d

with ¿ @9À m 0 and 0 F  F 1. Then

26

(a) If every limit point is nondegenerate, then there is a limit point    at which both the

ﬁrst and second order necessary conditions are satisﬁed.

(b) If    is an isolated nondegenerate limit point, then both the ﬁrst and second order

necessary conditions are satisﬁed at    .

(c) If  ˆ   is nonsingular for some limit point    of )¥TR57 and ¼ ˆ R d ˆR whenever  ˆ R is

positive deﬁnite, then  ˆ   is positive deﬁnite, )¥gR57 converges to    , all iterations are

eventually successful, and ) ∆R57 is bounded away from zero.

Proof. Using Lemma 3, for any subsequence with limRÃ¬Ä

Á


min



ˆ Rz¯F

0, the corresponding ) ¨ R57

has large-step-size. Therefore, there using Lemma 4, for some ¿ 1 m 0,

exists

¿

0

m

0 such that 

R3mø¿ 
0

for 

sufﬁciently largee Hence,

4



RiAR§HvE¿ § © s 12 Á

R ∆2R 

jð

R t

R ¨

R

j 2pqe

Condition (AS.5) then implies that

(A.7)

§©

lTRz"4tlTRn

HvBe¿
11

2

s Á

R

∆2R 

jð

R t

R ¨

R

j 2pqe

Now assume that  ˆ   is positive deﬁnite and )¥gR97 converges to    . From Lemma 4, we have

§"© 4  R©AR§v 2 s min) 1  2R 7 Á R ∆2R  min) 1  R57 j ð R¥ï5R j 2pqe

where  R is the stepsize along ¨ R . Let ¿m 0 be a lower bound for the eigenvalues of  ˆ R . From (A.1), (A.4) in Lemma 5, and jt R9 R j ! 1ù jt R~ 1f R j , there exists ë m 0 such that

(A.8)



R 

s ¨

RAprwv

ë min) ∆2R  jt R9¥R j 2 75e

Using (A.7) and (A.8), the proof is essentially the same as that of Theorem 3.10 in [1]: replacing (3.21) in [1] by (A.7) and (3.22) in [1] by (A.8).
B. Implementation Details. We present details of our Steihaug method implementation in FIG.11 and the modiﬁed preconditioned conjugate gradient (MPCG) in FIG.12.
For more details on the large-step-size property of the steps computed from FIG.12, see [17].

27

function [ ] = Steihaug(p , f , u ,  , ∆)

% Note: u is some preconditioning matrix for H. u

X

d

×

X

f

`ú f

;



½¾i

d

X µ 2;

d

0;

¨

d
0

0; û

d
0

0; ´ 0 d

4 f;

must be positive deﬁnite.

while 

F

½¾i 

Step 1:

Step 2:

Step 3:

Step 4: Step 5:

Step 6:
Step 7: end  d ¨ R , return

Solve u ÅzR d ´¥R ;

d

 

1

if  d

1,  1 d

Å
0

else

§gR d

´

R

~

ÅzR
1

~

1µ

´

R

~

ÅzR
2

~

2;

iR

d

Å¥R

~

D§gR95R
1

~

1

end

R 

d

 R p$iR

if  RØ!

0, compute 

m

0 so that j¨ R

iR  jü

d

∆,  d

R ¨

 5R , return

else

R 

d

´

R ~

Å¥R
1

~

1 µ¥

R

;

¨

R

d

¨ R~ 1 

 R95R ; ´¥R d ´zR ~ 1 4

R§p$5R 

end

if j¨ R j v ∆

compute 

m

0 so that j¨ RH

iR  jü

d

∆,  d

RH ¨

 iR , return, end

if ju

~ 1j

 ´zR j

j

!&ª ju

~ 1´ 0 j , 

d

¨ R , return, end

FIG. 11. The Steihaug algorithm

28

Step 0: Ó % Note:



d

0; ¨ and ¿

d
0

0;  0 d

0; ´ 0 d

4 fˆ;

are positive constants.

while Ó F maximum iteration

Step 1: Solve ¶ Å Õ d ´ Õ

Step 2: Ó

d

 Ó

1

Step 3: if Ó d 1

 dÅ
10

else

§ Õ

d

Õ d

´ Õ ~ ÅÕ~

Å
1
1

Õ~ D§

1µ Õ

´ 

Õ Õ

~ ~

Å
2
1

Õ

~

2

end

Step 4:  Õ d  Õ  ˆ  Õ

Step 5: if  Õ ! 0, exit:  d  Õ , ¨ d ¨ Õ

else if  Õ

!E¿¥q Õ

¶

 Õ  , exit: 

d

0, ¨

d

else Õ d ¨Õ d ´Õ d

´ Õ ~ ¨ Õ~

Å
1
1

Õ 

~

1 µ¥ Õ

Õ Õ

´Õ ~ 1 4  Õ ˆ Õ

end

Step 6: if ju ~ 1 j  j ´ Õ j

!ýª ju

~ 1´ 0 j , exit: ¨

end

¨Õ d ¨ Õ, d

0

FIG. 12. MPCG

(B.1) (B.2)
(B.3)

29