Вычислительные технологии
Том 6, № 4, 2001
A NEW AND EFFICIENT LARGE-UPDATE INTERIOR-POINT METHOD FOR LINEAR
OPTIMIZATION
J. Peng, C. Roos Delft University of Technology, Netherlands e-mail: [email protected], [email protected]
T. TERLAKY McMaster University, Hamilton, Ontario, Canada e-mail: [email protected]
В данной статье представлен уточненный анализ нового метода внутренних точек для линейной оптимизации, использующий некоторые новые математические инструменты, в частности, разработанные в [11], где рассмотрено семейство аналогичных методов.
1. Introduction
Interior-point methods (IPMs) are among the most effective methods for solving wide classes of Linear Optimization (LO) problems. Since the seminal work of Karmarkar [5], many researchers have proposed and analyzed various IPMs for LO and a large amount of results have been reported. For a survey we refer to recent books on the subject [14, 16, 17, 19]. An interesting fact is that almost all known polynomial-time variants of IPMs use the so-called central path [15] as a guideline to the optimal set, and some variant of Newton's method is used to follow the central path approximately. Therefore, analyzing the behavior of Newton's method has a prominent role in the theoretical investigation of IPMs.
1.1. Notation
We use the following notational conventions. Throughout the paper, ||-|| denotes the 2-norm of a vector, whereas ||-||^ denotes the infinity norm. For any x = (x\ , X2, ... , Xn )T G IRn, xmin denotes the smallest and xmax the largest value of the components of x. If also s G IRn, then xs denotes the coordinatewise product of the vectors x and s. Furthermore, e denotes the all-one vector of length n. Finally, if z G IR+ and f : IR+ ^ IR+, then f (z) denotes the vector in IR+ whose i-th component is f (zj), with 1 < i < n.
© J. Peng, C. Roos, T. Terlaky, 2001.
1.2. The gap between theory and practice
At present there is still a gap between the practical behavior of the algorithms and the theoretical performance results, in favor of the practical behavior. This is especially true for so-called primal-dual large-update methods, which are the most efficient methods in practice (see, e.g. Andersen et al. [1]). The aim of this paper is to present a new analysis of the primal-dual Newton algorithm proposed in [10]. The new iteration bound further improves the bound in [10], and almost closes the above gap, at least from a theoretical point of view. The method of [10] belongs to a wider family of methods, introduced and thoroughly studied in [11]. By applying some of the new analysis tools developed in [11] to the method of [10], a relatively simple analysis leads to the presently best possible iteration bound for large-update methods.
To be more concrete we need to go into more detail at this stage. We deal with the following LO-problem:
(P) min{cTx : Ax = b, x > 0}, where A £ Rmxn, b £ Rm, c £ IT, and its dual problem
(D) max{bTy : ATy + s = c, s > 0}.
We assume that both (P) and (D) satisfy the interior-point condition (IPC), i.e., there exists (x0,s0,y0) such that
Ax0 = b, x0 > 0, ATy0 + s0 = c,s0 > 0.
It is well known that the IPC can be assumed without loss of generality. In fact we may, and will assume that x0 = s0 = e. For this and some other properties mentioned below, see, e.g., [14]. Finding an optimal solution of (P) and (D) is equivalent to solving the following system
Ax = b, x > 0, ATy + s = c, s > 0,
xs = 0. (1)
The basic idea of primal-dual IPMs is to replace the third equation in (1), the so-called complementarity condition for (P) and (D), by the parametrized equation xs = ^e, where e denotes the all-one vector and ^ > 0. Thus we consider the system
Ax = b, x > 0, ATy + s = c, s > 0,
xs = ^e. (2)
If rank(A) = m and the IPC holds, then for each ^ > 0, the parameterized system (2) has a unique solution. This solution is denoted as (x(^),y(^), s(^)) and we call x(^) the ^-center of (P) and (y(^),s(^)) the ^-center of (D). The set of ^-centers (with ^ running through all positive real numbers) gives a homotopy path, which is called the central path of (P) and (D). The relevance of the central path for LO was recognized first by Sonnevend [15] and Megiddo [6]. If ^ ^ 0 then the limit of the central path exists and since the limit points satisfy the complementarity condition, the limit yields optimal solutions for (P) and (D).
IPMs follow the central path approximately. Let us briefly indicate how this goes. Without loss of generality we assume that (x(^),y(^), s(^)) is known for some positive For example,
due to the above assumption we may assume this for ^ =1, with x(1) = s(1) = e. We then decrease ^ to = (1 — 9)^, for some 9 G (0,1) and we solve the following Newton system
AAx = 0, AT Ay + As = 0, sAx + xAs = — xs. (3)
This system defines a search direction (Ax, As, Ay). By taking a step along the search direction, with the step size defined by some line search rules, one constructs a new triple (x, y, s). If necessary, we repeat the procedure until we find iterates that are 'close' to (x(^+), y(^+), s(^+)) Then ^ := is again reduced by the factor 1 — 9 and we apply Newton's method targeting at the new ^-centers, and so on. This process is repeated until ^ is small enough, say until n^ < e; at this stage we have found an e-solution of the problems (P) and (D).
The "closeness" of some iterates (x,y, s) to the ^-center will be quantified by using a proximity measure ¿(xs,^) that will be defined later on. More formally, the algorithm can now be described as follows.
Primal-Dual Newton Algorithm for LO
Input:
A proximity parameter t;
an accuracy parameter e > 0;
a fixed barrier update parameter 9, 0 < 9 < 1;
begin
'x : e ; s : e ; ^^ : 1 ; while n^ > e do
begin
^ := (1 — 9)^;
while ¿(xs; > t do begin
x := x + aAx;
s := s + aAs; y := y + aAy
end end end
As said before, we take x = s = e and ^ =1 to initialize the algorithm. The parameter a (0 < a < 1) is the so called step size (or damping factor) and will be specified later on. If the proximity measure ¿(xs; exceeds some threshold value t then we use one or more damped Newton steps to recenter; otherwise ^ is reduced by the factor 1 — 9. This is repeated until n^ < e. The step size (or damping factor) a has to be taken such that the proximity measure function 6 decreases by a sufficient amount. In the theoretical analysis the step size a is usually given some default value.
Let us mention that most practical algorithms use the e-solution to construct a basic solution and then produce an optimal basic solution by crossing-over to the Simplex method. An alternative way is to apply a rounding procedure as described by Ye [18] (see also Mehrotra and Ye [7], and Roos et al. [14]).
The choice of the parameter 9 plays an important role both in theory and practice of IPMs. Usually, if 9 is a constant independent of the dimension order n of the problem, for instance
9 = 2, then we call the algorithm a large-update (or long-step) method. If 9 depends on the
problem, such as 9 = —=, then the algorithm is named a small-update (or short-step) method.
n
It is now well known that small-update methods have the best iteration bound; they require O (—n log —^ iterations to produce an e-solution. On the other hand, large-update methods, in practice much more efficient than small-update methods [1], have a worse iteration bound, namely O ^n log n^ [14, 17, 19]. This phenomenon is what we called "the gap between theory and practice".
1.3. Aim of this paper
Several authors have suggested to use the so called higher-order methods to improve the complexity of large-update IPMs [3, 4, 8, 20, 21]. In these methods, in each iteration one solves some additional equations based on the higher-order approximations to the system (2).
In this paper, as in [10] and [11], we propose a different way to narrow the gap between theory and practice for large-update IPMs. In the next section we briefly discuss how the above mentioned iteration bounds are obtained. Then we indicate some inconsistency that occurs in the analysis and which suggests how to improve the iteration bound by slightly changing the search direction. The new search direction is defined in Section 2.4 and the rest of the paper is devoted to the analysis of this new direction. The result is an O ^-^/nlog n log n^ iteration
bound. Clearly, this bound almost closes the gap between the iteration bounds for small-update and large-update methods.
The result of this paper improves the result in [10], where the same authors used the same search direction as in this paper but could obtain only a O ^n2 log n^ iteration bound. The
improvement in this paper is due to the use of some new analytical tools, that were developed in [11]. In fact, in [11] a whole class of IPMs is considered, based on so called self-regular proximity functions, and for some IPMs in this class complexity results are derived that are the same as the result of this paper. The aim of this paper is to concentrate on one member of this class, thus showing that the currently best possible complexity result for large-update methods can be obtained with a relatively simple analysis. In addition, we devote an extra section to explain which ideas have led to the new approach. This will be done in Section 2.
2. Preliminaries
2.1. The classical approach
As noted before, for the analysis of the Newton process we need to measure the "distance" from a primal-dual pair (x, s) to the ^-center (x(^),s(^)). The most popular tool for this is
the so-called primal-dual logarithmic barrier function. This function is, up to the "constant" —n + n log given by
xT s
$c(xs,ß) =--^^ log XiSi — n + n log ß.
ß
(4)
i=1
Its usefulness can easily be understood by introducing the vector
/xs"
v :=
ß
(5)
Note that the pair (x, s) coincides with the ^-center (x(^),s(^)) if and only if v = e. Now defining
^c(t) := t2 - 1 - 2 logt, t> 0, one may easily verify that $c(xs,^) can be expressed in terms of the vector v as follows
n
$c(xs, ß) = ^c(v) := Y^ ^c (vi).
i=1
(6)
Since ^c(t) is strictly convex, and attains its minimal value at t = 1, with = 0, it follows
that $c(xs, ß) is nonnegative and vanishes if and only if v = e, i. e., if and only if xs = ße. Below we need a second proximity measure, a so-called norm-based measure, namely
¿(xs,ß) :=1 iiv—v iy=2
xs
ß
(7)
Note that
5(xs, ß) = 0 ^ $c(xs, ß) = 0 ^ v = e ^ xs = ße,
and otherwise 8(xs,^) > 0 and $c(xs,^) > 0.
The classical analysis of large-update methods depends on the following two results. The first result estimates the increase in $c(xs, if, at the start of an outer iteration, ^ is multiplied by the factor 1 — 0.
Proposition 2.1 (Lemma II.72 in [14]). Let (x, y,s) be strictly feasible and ^ > 0. If = (1 — 0)^ then
$c(xs, ß+) < $c(xs, ß) + --V + n^c
I — P
where 5 = 5(xs,ß) and p(5) = 5 + Vi + 52 = O(5).
e
i-er
The second result estimates the decrease in $c(xs,^) during a damped Newton step, with an appropriate damping factor a.
Proposition 2.2 (Section 7.8.2 in [14]). If 8 = 8(xs,^) > 1 there exists a step size a such that
$c(x+s+, ß) < $c(xs, ß) — ^c
25 p(5)
= $c(xs,ß) — 8(I).
n
With these two results we can derive an iteration bound for a large-update method. Taking t =1 in the algorithm, at the start of each outer iteration one has 6 = 6(xs,^) < 1. From
this it follows that then = O(1), and also p(6) = O(1). Taking 9 = -, or some other
fixed constant in the interval (0,1), it follows from Proposition 2.1 that after the update of ^ we have
< O(1) + O (—n + O(n) = O(n).
As long as 6 > 1, each inner iteration decreases the barrier function with at least a constant, by Proposition 2.2. As a consequence, no more than O(n) inner iterations are necessary to reach the situation where 6(xs,^) < 1. This proves that each outer iteration requires no more that O(n) inner iterations. The number of outer iterations is bounded by (cf. Lemma II.17 in [14])
¿■og n №
Multiplication of this number by the number of inner iterations per outer iteration yields a bound for the total number of iterations. If 9 = ©(1), the iteration bound becomes
O (n log ^ . (9)
This is the currently best bound for large-update methods, derived in the past in many ways and by many authors.
2.2. An alternative approach
To make clear what is the underlying idea of this paper, we proceed with a second derivation of the iteration bound (9). In this derivation we do not use the logarithmic barrier funtion, but we only use the norm-based proximity measure 6(xs,^). The analysis was carried out first in [9], and is based on the following two results.
Proposition 2.3 (Lemma IV.36 in [14]). Using the above notation, we have
26(xs, p.) + 9^
6(xs, <
2v/1—"9
Proposition 2.4 (Theorem 3.6 in [9]). If 6 = 6(xs,^) > 1 and a = -—2 then
262
6(x+s+,^)2 < 62 - 12 = 62 - ©(1).
Just as in the previous section, with these two results the iteration bound easily follows. If t = 1, at the start of an outer iteration we have 6 = 6(xs,^) < 1. With 9 = 2, or some other fixed constant in the interval (0,1), Proposition 2.3 implies that after the update of ^ we have
6(xs,^)2 < (O(1) + O (—n))2 = O(n).
By Proposition 2.4, as long as 6 > 1, each inner iteration decreases the squared proximity with at least a constant. As a consequence, no more than O(n) inner iterations are necessary to reach the situation where 6(xs,^) < 1. Hence, each outer iteration requires no more that O(n) inner iterations. Using (8) again, we obtain (9) as a bound for the total number of iterations.
2.3. Two observations 2.3.1. First observation
In [9] we found another interesting result that suggests a direction where a better complexity bound for the algorithm might be obtained.
Proposition 2.5 (Corollary 4.4 in [9]). Let 5 = 5(xs,^) > 1 and vmin > 1. Then the step 1
size a = -- gives
1 + 5
5<*+s+.")2 * 52 - 4(TH) < 52 - 85
If we could prove the inequality in Proposition 2.5 without the assumption vmin > 1, this would give the desired iteration bound
O (yn log (10)
for large-update methods, as we will show below. We need two simple technical results for this purpose; for completeness' sake we include their (short) proofs.
Lemma 2.6 (Lemma 2.1 in [10]). If a G [0,1], then
(1 - t)a < 1 - at, V t G [0,1]. (11)
Proof: For any fixed a < 1 the function (1 - t)a - 1 + at is decreasing for t G [0,1] and zero if t = 0. This implies (11).
Lemma 2.7 (Proposition 2.2 in [10]). Let to, ti, ..., tK be a sequence of positive numbers such that
tk+i < tk - Ktk-Y, k = 0,1, ..., K - 1, (12)
where k > 0 and 0 < y < 1. Then K <
t7
to
KY
Proof: Using (12), we have
0 < < (tk - Ktk-Y)7 = tk (1 - Kt-Y)Y < tY (1 - KYt-Y) = tk - KY,
where the second inequality follows from (11). Hence, for each k, tk < t0 - ^YK- Taking k = K we obtain 0 < t0 - Kyk, which implies the lemma. □
1 3
By taking in y = _, k = -, and tk = i(xksk)2, where xk, sk, and denote the values
2 8
of x, s, and ^ at the k-th iteration, Lemma 2.7 yields the iteration bound (10).
Comparison of the results in Proposition 2.4 and Proposition 2.5 indicates that a larger decrease of the proximity occurs when the smallest coordinate of the vector v is large. Hence, the question becomes how to keep vmin large during the course of the algorithm. In
the next section we make another observation that points in the same direction.
2.3.2. Second observation
In the sequel we will frequently use scaled versions of the displacements Ax and As. These are defined as follows:
dx :=-, ds := -. (13)
x s
Now the system (3) defining the Newton search directions can be rewritten as
Adx = 0, AT Ay + ds = 0, dx + ds = v-1 — v,
where A = AV 1X, with V = diag (v), X = diag (x). The last equation in the above system is called the scaled Newton equation. Yet we observe that this equation can be written as
dx + ds = v-1 — v = —1 Wc(v),
with ^c(v) denoting the 'scaled' logarithmic barrier function, as defined in (6). This shows that the scaled search directions depend on the steepest descent direction of the scaled logarithmic barrier function. This seems to be natural when dealing with the analysis of Section 2.1. However, when dealing with the analysis of Section 2.2, we are trying to minimize the squared proximity, i. e., ||v 1 — v||2, and then it seems to be more attractive to relate the search direction to its gradient, as follows:
1 2
dx + ds = — ^Vv ||v-1 — v|| = v-3 — v. (14)
Changing the definition of the search directions in this way, the authors could obtain a better iteration bound for large-update methods in [10], namely
O (n,2 log n j
The analysis in [10] used as a proximity measure, albeit implicitly, ||v-3 — v||. Note that small coordinates (smaller than 1) in v may blow up this measure. Hence, this measure is in favour of large values of vmin. This strengthens our conclusion of the previous section, namely that the theoretical complexity may improve if we keep vmin large during the course of the algorithm.
We conclude this section by considering the direction (14) in the original space. One has
sAx + xAs = uv (dx + ds) = u (v-2 — v2) ,
or, equivalently,
. ue
sAx + xAs = u--xs.
xs
ue
Thus the new direction is the Newton direction targeting at u — instead of ue.
xs
2.4. A class of new directions
Inspired by the result discussed in the previous section, we modify the Newton equation in this paper as follows:
+ = —W(v) = v-q - v, (15)
with q > 1, and where the functions ^ and ^ are defined according to
™ / v2 _ i v1-q _ 1 \
¥(v):= £ ^(vi), ^(Vi):^+ J . (16)
Note that since q > 1, the function ^(v) has the barrier property, i.e., the property that it goes to infinity if the vector v approaches the boundary of the feasible region, i.e., if one of the coordinates of v goes to zero. We call it a polynomial barrier function. Only in the limit, when q approaches 1, the barrier function becomes logarithmic:
1 n 1 /XT s n
lim^(v) = ^ K2 - 1 - loSvD = 2 I--xisi + nloS^ - n
q i=1 \ ^ i=1 /
This is, up to the factor ^, precisely the logarithmic barrier function (4).
In the analysis of the algorithm with the modified Newton direction, we also use a norm-based proximity measure a(v), naturally related to our new barrier function, according to
a(v) := ||W(v)|| = ||4 + ds|| = ||v-q - v|| . (17)
Letting Ax, Ay, As denote the displacements in the original space, the result of a damped Newton step with damping factor a is denoted as
x+ = x + aAx, y+ = y + aAy, s+ = s + aAs. (18)
The rest of the paper is devoted to the complexity analysis of the large-update algorithm with the modified Newton direction (15). Let us mention that basically the same algorithm, with the same search direction, was proposed and analyzed in [10]. In that paper a parameter p occurs in the definition of the search direction (cf. [10, page 4]); by taking p = q — 1, the definition coincides with (15). By using some new tools in the analysis we obtain in this paper a better complexity result. This is due to the fact that in [10] we had to restrict ourselves to the case p < 2, which corresponds to q < 3, whereas we can now deal also with larger values of q in the analysis.
The rest of the paper is organized as follows. In the next section we derive bounds for the coordinates of the vector v in terms of a(v) and we derive a lower bound for the maximum value of a feasible step size. In Section 4 we estimate the decrease of the barrier function during a damped step, and in Section 5 the increase due to the update of the barrier parameter When having these results, the announced iteration bound easily follows, in Section 6.
3. Bounds for v and the step size
Our first lemma in this section provides a lower and an upper bound for the coordinates of v in terms of a(v).
Lemma 3.1. Let a := a(v), as defined by (17). Then
vmin > (1 + a) q, Vmax < 1 + a.
Proof: The lemma is trivial if vmin > 1 and vmax < 1. Consider the case that vmin < 1. From (17) we derive
a = ||v — v-q|| > |v-q — Vj| , 1 < i < n,
which implies
11
a > -q--Vmin > -q--1-
v■ v ■
min min
On the other hand, if vmax > 1 then we find
1
a > vmax q > vmax 1-vmax
The lemma follows directly from the above inequalities. □
Note that if q goes to infinity, then the lower bound for vmin approaches 1, for all values of a.
In the sequel we use the following notations:
Ax As
Ax = — = —, As = — = -. (19)
x v s v
We then may write
x+ = x (e + aAx), s+ = s (e + aAs). (20)
Hence the maximum step size is determined by the vector (Ax, As): the step size a is feasible if and only if e + aAx > 0 and e + aAs > 0, and this will certainly hold if
1 — a ||(Ax, As)|| > 0. (21)
Since the displacements Ax and As are orthogonal, the scaled displacements dx and ds are orthogonal as well, i.e., djds = 0. This enables us, using (17), to write
a(v) = ||dx + ds| = ||(dx,ds)| - (22)
Now we are ready to state our next result. Lemma 3.2. One has
||(Ax, As)|| < a(1 + a)q. Consequently, the maximal feasible step size, amax, satisfies
1
amax ^ 1 -
a (1 + a) q
Proof: Using Lemma 3.1 and (22), we may write
||(Ax, As)| From (21) we derive that
f dx ds\
V v v)
< =- < a(1 + a)q.
amax
||(Ax, As)||'
Hence the lemma follows. □
In the next section it will become clear that the default step size in our analysis is only a small fraction of the maximal step size in Lemma 3.2.
1
4. Estimate of the decrease of the barrier function
Our aim in this section is to estimate the decrease of the new barrier function
n
$(xs, j) := tf(v) = Y^ ^(vi) (23)
i=1
during one modified Newton step, for a suitable (feasible) step size a. It will be convenient to introduce
™ 1-9 _ 1
*b(v):=V ^b(vi), ^b(vi):= v-—. (24)
¿1 q — 1
Then we may write
*(x.,j) = £ (^ + ^) = ^ + W
i=1 ^ q /
Hence, the value of the barrier function after one step is given by
,T„ ,2
$(x+s+ ,j) = ^±^ + *b(v+), v+ = j x+s+. (25)
We proceed by deriving a simple expression for v+. Using (20) and (19) we find
2 x+s+ xs (e + aAx) (e + aAs) 2 . AW ax / , w v+ = = —^-^-^ = v2 (e + aAx) (e + aAs) = (v + adx) (v + ads).
Hence, using the orthogonality of and we obtain
eTv+ = eT (v2 + av (dx + ds) + a2dxds) = eTv2 + avT (v-q — v) .
Furthermore,
^6(v+) = tfb (V(v + adx) (v + ads)) .
At this stage we need a simple, but important lemma. As we will see later, it provides a new and convenient tool in the analysis of our large-update method.
Lemma 4.1 (cf. Lemma 2.2 in [11]). Let t1 > 0 and t2 > 0. Then
^ < 1(^6 (t1) + ^ (t2)). (26)
Proof: One may easily verify that the property in the lemma holds if and only if the function (ez) is convex in z, and this holds if and only if ^b'(i) + t-06"(t) > 0, whenever t > 0. Since
^'(t) = —t-q, ^"(t) = qt-9-1,
one has ^b'(t) + t^6"(t) = (q — 1)t-q > 0. Hence, the lemma follows. □
The above proof makes clear that the property (26) is equivalent to convexity (with respect to £) of the composed function (e^). We therefore say that is right exponentially convex, or shortly, right e-convex.1
Due to the definition of the function the property in Lemma 4.1 extends to and hence we have2
1 n
^b(v+) = (v^v + adx) (v + ads}) < ^ ^ (^b (v + adxi) + ^b (vi + adsi)) •
i=1
Substitution of the above results in (25) gives
n
eTv2 + avT (v q — v) — n 1 ✓ , ✓ , s , ✓ , „ ,
$(x+s+, j) < -2---+ (vi + adxi) + (vi + adsi)). (27)
i=1
Let us denote the change in $ during the step as f (a). Then
f (a) = $(x+s+ ,j) — $(xs,j)) < fi(a),
where
r / \ \ eTv2 + avT (v-q — v) — n 1 ^ . , . , . , . , ..
fi(a) := —$(xs,j) +--2--+ 2 ^ (^b ^ + adxi) + ^b (vi + adsi)) •
i=1
Note that f (0) = f1(0) = 0. We next consider the first and second derivative of f1(a) to a.
1 1 n
f1 (a) = 2vT (v-q — v) + ^ Y (^b' (vi + adxi) dxi + ^b' (vi + adsi) dsi).
i=1
One easily verifies, using (15) and (17), that
f1 (0) = — 1 a2. (28)
Furthermore, using that ^b"(i) = qt-q-1 is monotonically decreasing,
1n
/1'(a) = 2 (vi + adxi) 4i + ^b" (vi + adsi) ds2)
i=1
nn
1
2 ((q (vi + adxi)-q-^ dx2 + (q (vi + adsi)-q-1) ds2) <2 (vmin - aa)-q-1 (dx2 + ds2)
i=1 i=1
The last inequality is due to
vi + adxi > vmin - a ||dx|| > vmin - aa, vi + adsi > vmin - a ||ds|| > vmin - aa.
1Recall that f : IR ^ (0, to) is called logarithmically convex (or log-convex) if the composite function log of is convex [2]. A natural way to extend this well known notion would be to define a function f to be left g-convex (or shortly g-convex) if g o f is convex; in case f o g is convex we use the adjective 'right' to refer to the reversed order of the factors, and call the function f right g-convex. This terminology is consistent with the above definition of right e-convexity: f is right e-convex if and only if f o exp is convex.
2It may be noted that in [11] it is assumed that the barrier function ^ itself is right e-convex, whereas in this paper we use right e-convexity of its 'barrier part'
Thus, using also (22), we obtain
/"(a) < h(a) := 2qa2 (vmin — aa)-9-1. (29)
By integration, this implies, for any a such that vmin — aa > 0,
na na
/1 (a) = /1 (0) + / /f(e) de < /1 (0) + / c ,/0 ,/0
Since /(a) < /1(a), by integrating once more we get, using /1(0) = 0,
pa pa p'Z
/(a) < /1(a) = /1 (Z) dZ < /2(a) := /1 (0)a + / / h(e) d^d(. Jo ./0 ./0
Obviously, /2'(a) = h(a) > 0. Hence /2(a) is convex in a, and twice differentiable. Since /2(0) = 0, /2(0) = /1 (0) < 0, and /2'(a) = h(a) goes to infinity if a approaches vmin/a, the function /2(a) attains its minimal value at some positive value a of a, and a is the stationary point of /2(a). Hence a is the solution of the equation
a
/2 (a) = /1 (0) + / h(e) de = 0.
0
Using (29), this equation is equivalent to
a
/1 (0) + 2qa2 / (vmin — ea)-9-1 de = 0. 0
Also using (28), and by evaluting the integral, we find that the stationary point a satisfies
a2 a
— T + 2 ((vmin — aa)-9 — vmqn) =
By solving a from this equation, we find
a = ^f fl - (1 + q) . (30)
By Lemma 3.1, Substitution in (30) gives
vl > 1
a + 1
1 -
Vmin I - / - a \ q
a> 1- 1 +
a \ V a + 1
Using (11), in Lemma 2.6, we derive that
a \ q / a \ q a
1 +-7 = 1-^-7 <1-
a +1/ V 2a + 1/ " q (2a + 1)'
and hence we obtain, using Lemma 3.1 once more,
vmin a vmin 1
a >--7-7 = ~t-7 > -1. (31)
a q (2a + 1) q (2a + 1) q (2a + 1) (a + 1) q
1
1
In the sequel we assume a > 1, and we use the step size
a* =-1--. (32)
3qa (a + 1)q
Note that a* < a. To get an estimate for the value of f2(a*) we use another useful lemma from [11]. For completeness' sake we include its elementary proof.
Lemma 4.2 (Lemma 3.12 in [11]). Let h(t) be a twice differentiable convex function with h(0) = 0, h'(0) < 0 and let h(t) attain its (global) minimum at t* > 0. If h''(t) is increasing for t G [0,t*] then
ht) < , 0 < t < t*
Proof: Using the hypothesis of the lemma we may write
h(t) = i(£)d£ = h'(0)t + f iV«)d(d£ < h'(0)t + / V(£)d£ =
./0 ./0 ./0 Jo
= h'(0)t + fedh'(£) = h'(0)t +(£h'(£))|0 — f h'(£)d£ < h'(0)t — f1 dh'(£) = h'(0)t — h(t).
Jo J0 J0
This implies the lemma. □
We apply this lemma with h = f2 and t = a. First we verify that the hypotheses of the lemma are satisfied: we have f2(0) = 0 and f2(0) < 0. Furthermore, f2'(a) = h(a) > 0, and
f'u \ u'f \ q(q + 1)a3 ( \ —q—2 n
/2 (a) = h (a) =-^-(vmin — aa) > 0.
Hence the lemma applies, and using also (28) we obtain
a* f2 (0) a*f1 (0) a*a2
f (a*) < /2(a*) <
2 2 4
Finally, by substitution of (32) and using a > 1, we get
q-1
—a —a —a q
/(a*) < -t < -T < -54—. (33)
12q (a + 1)q 12q (2a)q 24q
Remark 4.3. If vmin > 1 then one may easily verify that the same chain of arguments leads to
2
/(a*) < < —,
Jy ' - 4q(a + 1) - 8q '
with
1
*
a
q(a + 1)'
5. Effect of updating the barrier parameter
We start this section with some technical results. As before, we write
t2 1 t1—q 1
^(t) = + ^b(t), ^b(t) = -—, q > 1. (34)
2 q — 1
Since
(t) = t — t—q, ^''(t) = 1 + qt—q—1, we have (t) > 1 for all t. This will be used in the proof of the next lemma.
Lemma 5.1. One has
2(t -1)2 < ^(t) < 2^'(t)2, t> 0.
Proof: Using that ^(1) = ^'(1) = 0, we may write
^(t) = / 1 ^) dZ d£ > / / dZ d£ = 1 (t — 1)2 ,
which proves the first inequality. The second inequality is obtained as follows:
/t f-^ i-t ft i-t
J ^"(Z)dZde < J J ^W(Z)dZde=J ^''(e)^'(e)de=j ^(eme)=^'(t)2
i
This completes the proof.
Corollary 5.2. One has ||v || < Vn + v/2^(v).
Proof: Using the first inequality in Lemma 5.1 and the inequality of Cauchy — Schwarz we find
nn
2tf(v) = 2 Y ^(v) > (v — 1)2 = ||v|2 — 2eTv + n > ||v||2 — 2 ||e|| ||v|| + ||e||2 = (||v|| — ||e||)2 .
i=1 i=1
This implies
||v|| < ||e|| + \/2^(v) = Vn + v/2^(v),
proving the corollary.
Corollary 5.3. One has ^(v) < ^a(v)2.
Proof: Using the second inequality in Lemma 5.1 and (17) we may write
nn
tf(v) = £ ^ (vi) < ^ (vi)2 = 2 ||V^(v)|2 = dtf rac12a(v)2.
i=1 i=1
Hence the proof is complete. □
We proceed by estimating the increase of ^(t) when t increases to ftt, with ft > 1.
Lemma 5.4. Let ft > 1. Then
^(ftt) < ^(t) + ^ (ft2 - 1) t2
Proof: Using the notation of (34), we may write
«2*2 _ 1 1 = ^^ + ^b(ftt) = ^(t) + 2 (ft2*2 - t2) + ^b(ftt) - ^b(t).
Since ^b(t) is monotonically decreasing, ^b(ftt) — ^b(t) < 0. Therefore, the lemma follows. □
With the last lemma we are able to estimate the effect of an update of the barrier parameter to the value of the barrier function.
Lemma 5.5. Let 9 G (0,1) and = (1 — 9)^. Then
Q , __*
$(xs, ) < $(xs, + 2(1 — Q) + v/2n^(v) + nj .
Proof: Recall that := ^(v). When changing ^ to the vector v becomes v+ = ftv,
with ft = 1/v/1—6. Using Lemma 5.4 with this ft, we write
1 - \ 2\ ^ / s 9 IIV2 ?vi) S > I ^(v^ 1U'2
n n / 1 \ tf(ftv) = £ ^(ftVi) < g ^(Vi) + 2 (ft2 - 1) v2J = tf(v) + 2 (1 - Q) .
This implies, by Corollary 5.2,
e Un + V/2*M) e ,_ ,
*(ftv) < *(v) + V 2(1_ e)-J- = *(v) + 2(^ e) (2tf(v) + nj
2(1 - e) w 2(1 - e)
The proof is completed.
6. New algorithm and its complexity analysis
The algorithm we analyse in this section is a slight modification of the 'classical' primal-dual algorithm considered in Section 1. In the new algorithm we monitor the progress of inner iterations by the new barrier function as defined in (23). Thus the algorithm goes as
follows.
New Primal-Dual Newton Algorithm for LO
Input:
A threshold parameter t > 1; an accuracy parameter e > 0; a fixed barrier update parameter 9, 0 < 9 < 1;
begin
'X : e ; S : e ; ^^ • 1 ;
while n^ > e do begin
y := (1 — 9)^;
while $(xs,^) > t do
begin
x := x + aAx;
s := s + aAs; y := y + aAy
end end end
At he start of an outer iteration, just before the update of the barrier parameter, we have $(xs,^) < t. By Lemma 5.5, after the update of the barrier parameter,
9
$(xs, ju+) < t + 2(1 — 9) ^2t + v/2nr + nj .
Using the default step size a*, as given by (32), each inner iteration decreases the barrier function with at least
q-1
a q ~24q~
(35)
where a = a(v), by (33). Recall that this result is only valid if a > 1. This certainly holds if t > 1, because then, at the start of each inner iteration one has $(xs,^) > 1. By Corollary 5.3 this implies a > 1. As a consequence of (35) and Corollary 5.3 we find that each inner iteration decreases the barrier function value with at least
$ 2q
— 24q '
where $ denotes the barrier function value before the inner iteration. This brings us in a
situation where we can apply Lemma 2.7. With k = - and y = -- we then obtain that
J 24q ' 2q
at most
48q2 T +
9
2(1 - 0)
(2t + v/2nr + n)
q+1 -,
2q
q + 1
inner iterations are needed to recenter. We are now ready for the main complexity result of this paper.
Theorem 6.1. The total number of iterations is at most
48q2 t +
9
2(1 - 9)
(2t + v/2nr + n)
q+1
2q
q + 1
1 n
9log 7
Proof: According to (8) the number of outer iterations is at most
1 n
9log 7
Multiplying this number by the bound (36) for the number of inner iterations per outer iteration we obtain the theorem. □
Removing the integer brackets the iteration bound of Theorem 6.1 becomes
x 9+1
9 \ ~
20-g) ^ + + ^ 1 , n
-qn-9log 7-
48q2 t +
Thus, if 9 = - and t = n the bound becomes ' 3
72q2 (4n + nv^ 2q n n „ / q2n n q±i n
——---log- < 72 4 + V2 --log- < 390qn lo^.
q +1 7 V
q + 1
Fixing q > 1, this bound improves the best bound known for large-update methods, which is O (n log . Note that the order of magnitude does not change if we take t = 1, or t = O (n).
Another interesting choice is q = 2 log n, which minimizes the iteration bound. Then the iteration bound becomes
O logn log —^ ,
which is quite close to the currently best iteration bound known for interior-point methods, namely O ^v7— log —J .
7. Concluding remarks
We have shown that by slightly modifying the classical primal-dual Newton search direction, the complexity of large-update methods can be improved. From a theoretical point of view, the existing gap between small-update and large-update methods has thus been almost closed.
It has become clear that the obtained complexity result highly depends on the choice of a suitable barrier function As we have shown the gradient of the scaled version ^(v)
of naturally induces both a search direction and a norm-based proximity measure.
The barrier function used in this paper is a member of a wider class of barrier functions, each
of which can be used to derive the same complexity bound as obtained in this paper. This is the subject of the recent paper [11], where we introduced the class of self-regular barrier functions. It should be no surprise that the analysis in this paper is much simpler than in [11], since when dealing with one specific barrier function one can explore all nice properties of this function that are not necessarily common to all self-regular barrier functions. It might be mentioned that we extended the theory of IPMs based on self-regular barrier functions in [11] to Semidefinite Optimization, in [12] to Smooth Nonlinear Complementarity problems and in [13] also to optimization over the Second Order Cone.
Finally, let us discuss briefly how the ideas presented in this paper can be incorporated into practical implementations of IPMs. So far only very limited computational tests have been done. Based on these experiments we intend to believe that it might be a good idea to use a dynamic scheme such that when vm¡n is relatively large, then the classic Newton direction is used, while if it goes too close to zero, then we use our new search direction, with an appropriate value of the parameter q. It requires further investigation to work out the details of such a strategy.
References
[1] Andersen E. D., Gondzio J., Meszaros Cs., Xu X. Implementation of interior-point methods for large scale linear programming // Interior-Point Methods of Math. T. Terlaky (ed.). Programming. Dordrecht: Kluwer Acad. Publ. 1996. P. 189-252.
[2] Hiriat-Urruty J. B., Lemarechal C. Convex Analysis and Minimization Algorithms I. Vol. 305 of A series in Comprehensive Studies in Mathematics. Berlin: Springer Verlag, 1996.
[3] Hung P., Ye Y. An asymptotically O(y/nL)-iteration path-following linear programming algorithm that uses long steps // SIAM J. on Optimization. 1996. Vol. 6. P. 570-586.
[4] Jansen B., Roos C., Terlaky T., Ye Y. Improved complexity using higher-order correctors for primal-dual Dikin affine scaling // Math. Programming. 1997. Vol. 76. P. 117-130.
[5] Karmarkar N. K. A new polynomial-time algorithm for linear programming // Combinatorica. 1984. Vol. 4. P. 373-395.
[6] Megiddo N. Pathways to the optimal set in linear programming // Progress in Math. Programming: Interior-Point and Related Methods. N. Megiddo (ed.). N.Y.: Springer Verlag, 1989. P. 131-158. [Proc. of the 6th Math. Programming Symp. of Japan, Nagoya. 1986. P. 1-35.]
[7] Mehrotra S., Ye Y. On finding the optimal facet of linear programs // Math. Programming. 1993. Vol. 62. P. 497-515.
[8] Monteiro R. D. C., Adler I., Resende M. G. C. A polynomial-time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series extension // Mathematics of Operations Research. 1990. Vol. 15. P. 191-214.
[9] Peng J., Roos C., Terlaky T. New complexity analysis of the primal-dual Newton method for linear optimization // Annals of Operations Research. To appear.
[10] Peng J., Roos C., Terlaky T. A new class of polynomial primal-dual methods for linear and semidefinite optimization // Europ. J. of Operations Research. Submitted.
[11] Peng J., Roos C., Terlaky T. Self-regular proximities and new search directions for linear and semidefinite optimization // Math. Programming. 2000. Accepted.
[12] Peng J., Roos C., Terlaky T., Yoshise A. Self-regular proximities and new directions for nonlinear P*(k) complementarity problems // Math. of Operations Research. 2000. Submitted.
[13] Peng J., Roos C., Terlaky T. New primal-dual algorithms for second-order cone optimization based on self-regular proximities // SIAM J. on Optimization. 2000. Submitted.
[14] Roos C., Terlaky T., Vial J.-Ph. Theory and Algorithms for Linear Optimization. An Interior-Point Approach. Chichester: John Wiley & Sons, 1997.
[15] Sonnevend G. An "Analytic center" for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming / System Modelling and Optimization // Proc. of the 12th IFIP-Conf. A. Prekopa, J. Szelezsan, B. Strazicky (Eds). Budapest, Hungary, Sept. 1985. Vol. 84 [Lecture Notes in Control and Information Sci. Berlin: Springer Verlag, 1986. P. 866-876.]
[16] Vanderbei R. J. Linear Programming: Foundations and Extensions. Boston: Kluwer Acad. Publ., 1996.
[17] Wright S. J. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, USA, 1997.
[18] Ye Y. On the finite convergence of interior-point algorithms for linear programming // Math. Programming. 1992. Vol. 57. P. 325-335.
[19] Ye Y. Interior-Point Algorithms, Theory and Analysis. Chichester: John Wiley & Sons, 1997.
[20] Zhang Y., Zhang D. On polynomiality of the Mehrotra-type predictor-corrector interior-point algorithms // Math. Programming. 1995. Vol. 68. P. 303-318.
[21] Zhao G. Interior-point algorithms for linear complementarity problems based on large neighborhoods of the central path // SIAM J. on Optimization. 1998. Vol. 8. P. 397-413.
Received for publication February 1, 2001