Вычислительные технологии
Том 14, № 3, 2009
The method of interval entropy function for ill-conditioned system of linear equations
H. Wang
Science of College, China University of Mining and Technology, Xuzhou, China
e-mail: [email protected]
In this paper, a new algorithm for solving the ill-conditioned system of linear equations is given. The algorithm is based on the maximum entropy principle and the interval analysis theory. Moreover, it is proved that this algorithm is global convergent under mild conditions. The theoretic analysis and numerical results indicate that the algorithm is effective and reliable.
Keywords: numerical algebra, ill-conditioned system of linear equations, entropy function, interval algorithm.
Introduction
As is known, it is necessary to seek the solution to the ill-conditioned system of linear equations in many fields, in such as geophysical indirect problem, graphic information processing, de-convolution, parameter estimation of a model, etc. Although some iterative methods can yield many good results theoretically, the validity of iteration and its condition of convergence are closely related. In many practical problems, when we seek the solution of ill-conditioned system of linear equations by iteration, the divergence and slow rate of convergence usually appear in actual calculation. So the usage of iterative method is subjected to certain limit in fact. In this paper, we introduce a new tool and construct a new algorithm, expecting to obtain new results.
We consider the following system of linear equations
where A = (ain}nxn E Rn is nonsingular and Cond(A) ^ 1, x = (x1;x2, • • • ,xn)T E x(0) C Rn, b = (61,62, • • • , bn)T E Rn. Suppose, there exists x* E x(0) and x* is an isolated solution of Ax = b.
To solve system (0.1) with the idea of unconstrained optimization, usually we converted problem of solving (0.1) into the following optimization problem:
min f (x) = ||Ax - b||p, (0.2)
where || • ||p denotes the p norm of a vector space. Let p = to, then problem (0.2) is simplified as follows:
Ax = b
(0.1)
(0.3)
© ИВТ СО РАН, 2009.
where fi (x) = a ijXj — hi (i = 1, 2,..., n). Obviously, problem (0.3) is a nondifferentiable j=i
optimization problem.
Several interval methods are known that provide guaranteed bounds on the solutions of the unconstrained optimization problem (see, e.g., [1, 2]). Interval methods for global optimization are based on the branch-and-bound principle. These methods subdivide the search interval in subintervals (branches) and use bounds for the objective function to eliminate subintervals which cannot contain a global minimizer of given objective function. These interval methods do not necessarily provide the exact range of the function over the subintervals. Moreover, interval arithmetics is able to provide rigorous bounds on the range of the function over the subintervals. For a more thorough discussion (see [3-7]). In this paper, we propose a new idea of solving ill-conditioned system of linear equations that requires the following two steps.
First, we use the entropy function to reduce the minimax problem (0.3) to an approximative unconstrained optimization problem with a smooth objective function.
Second, we apply known interval techniques to solve the resulting unconstrained optimization problem.
As a particular case of this idea, we propose a new interval algorithm for solving this problem. The structure of the paper is as follows: In Section 1, we will describe entropy functions and their relationship with interval methods. In Section 2, the proposed algorithm is described, and the sufficient condition which guarantees that a given multidimensional interval contains exact solution of system (0.1) is given. In conclusions, we will show the numerical analysis results. Finally, the last section contains the conclusions and an outline of the directions for future research.
Denotations. The set of all interval vectors on Rn is denoted by IRn, the set of all interval vectors on x(0) is denoted by I(x(0)), Vx = (xi, ■ ■ ■ , xn)T G IRn, let xi = [xi, xi]. Denote the midpoint, width, radius, and absolute value of x by mid(x), wid(x), rad(x), and |x| respectively.
Definition 0.1. Interval function f : IRn — IR is called the interval extension of a real function f : Rn — R, if
1) f (x) = f (x) for any x G Rn from the domain of f ;
2) f(x) is monotonic by inclusion, i.e., x,y G IRn,x C y f(x) C f(y).
Therefore, if f (x) is an interval extension of the function f (x), then always
{f (x)|x G x}C f(x)
and we get an outer (by superset) estimate of the range of f over the bound x G IRn.
Definition 0.2. An inclusion function f of f : x — R is called of convergence order a > 0 if
wid(f (x)) — wid(f (x)) = O(wid(x)a), that is, if there exists a constant c > 0 such that
wid(f(x)) — wid(f (x)) < c(wid(x))a,
where f (x) = {f (x)|x G x}.
The further information about interval mathematics can be found in [1, 3].
1. Main Theory
If a problem is nondifferentiable, it is hard to solve it by traditional methods including derivative. Usually we handle it with a differentiable function that approximates the objective function. A maximum entropy function of uniform approximation to objective function is investigated in [4, 8-10].
1.1. Maximum Entropy Function
The maximum entropy function originated in the work of L. Boltzmann [11] who studied the relation between entropy and probability in physical systems. Subsequently, entropy concepts and ideas have been introduced in physics, chemistry, meteorology, information science, etc. The maximum entropy function, which was introduced by Jaynes [12] has proved to be useful for obtaining convergent upper bounds on nonsmooth objective functions.
Definition 1.1. Let p be a positive integer and let ^: D ^ R1, i = 1,..., k, be given continuously differentiable functions, and let <^(x) = max ^(x). The maximum entropy
function D ^ R1 of ... is defined by
1 <i<k
1
i'
^p{x) = - In < V exp (p^(x)) p
Obviously, ^p(x) is also continuously differentiable on D.
Lemma 1.1 [-2]. Given e > 0, zip > 0, such that Vp > p and Vx G D
!^p(x) - ^(x)| <e
(1.1)
(1.2)
Also, if 1 < p < q, then
Vq(x) <
A=1
A=1
(1.3)
Proof. Let Vj(x) = exp(^j(x)) (i = -,..., k) and let v(x) = (vj(x))kx1. For each x G D (Vp > -)
/ k \ 1/p / k \ 1/p iiv(x)iip = Y^ v*(x)p = X!exp(p^i(x))
So
Now
So
|v(x)||œ = lim ||v(x)||p = max(exp(^i(x))}.
p^œ l<i<k
ln(||v(x)||œ) = max ^(x) l<i<k
lim Vp(x) = max ^i(x).
p^œ 1<i<k
This proves (1.2). Finally, according to Jensen's inequality
(1 < p < q) ^ (||v(x)||q <||v(x)||p),
we obtain ln(||v(x)||q < ln ||v(x)||p), whence (q(x) < fp(x). So, (1.3) is proved. Lemma 1.2. For all x G D, we have
f(x) < fp(x) < f(x) + —. (1.4)
p
Proof. It follows
fp(x) - f(x) = 1ln \ ¿exp(pfi(x)) f - f(x) = 1ln \ ¿exp(p(fi(x) - f(x))) p It1 J p 11=1
By definition of f (x), (¿(x) — f(x) < 0 (i = 1,..., k) and there is at least one j G {1,..., k}
k
such that (j(x) — f(x) = 0. Therefore, 1 < exp(p(^(x) — f(x))) < k, and we have
i= 1
1 f k \ 1 0 < - ln < \ exp(p(fi(x) — f(x))) > < - ln(k).
p It1 J p
So, we complete the proof.
From Lemma 1.1 and Lemma 1.2, we conclude that
lim (p (x) = f (x).
Namely, fp(x) is uniform approximation of max (¿(x). Obviously, it is reasonable and fea-
1 <i<k
sible that we substitute a differentiable function fp(x) for f (x). Let g(x) = max {|fi(x)|} =
1<i<n
max{/¿(x), —/¿(x)}, with the idea of maximum entropy function, we can transform (0.3)
1<i<n
into the approximative unconstrained differentiable optimization problem
min /p(x), (L5)
where /p(x) = - ln y^(exp(p/i(x)) + exp(—p/j(x))) > , p > 1 is entropy factor.
p U=i
2n
Theorem 1.1. Given e > 0, 3pk > 0, such that p = pk > —,
e
0 < /p(x) — g(x*) < e, (1.6)
where x is a solution of problem (1.5), and x* is a solution of problem (0.3). Proof. Because x is a solution of problem (0.3), we have
/p(x) — g(x*) > /p(x*) — g(x*) > 0
and x is a solution of problem (1.5), so
g(x*) < g(x) < /p(x) < /p(x*).
From (1.4), we can prove
fp(X) - g(x*) < /p(x*) - g(x*) <
ln 2n p '
ln 2n
Let p = pk >-, then
0 < fp(X) - g(x*) < e.
So, we complete the proof.
Since we are interested in the minimum of the function g(x), we will use the following corollary of Theorem 1.1. Corollary 1.1. We have
and, therefore,
ln 2n
inf0) fP(x) - —— < inf0) g(x) < inf0) fP(x)
lim inf0) fp(x) = inL g(x)'
(1.7)
(1.8)
Therefore, if p is large enough, the optimal solution of problem (1.5) sufficiently approximates that one of problem (0.3) in theory.
Our main idea is to solve the problem (0.3) by solving the corresponding problem (1.5) with interval analysis method and by using the corresponding minimum and Corollary 1.1 to estimate the minimum of problem (0.3), and at last we get the approximate solution of system (0.1).
1.2. Interval Extension
Now, we consider the interval extension of /p(x). Above all, we define interval extension of /(x) (i =1, ■ ■ ■ , n) as follows, respectively:
fi(x) = a?xj - bi (i =1, ■ ■ ■ , n) j=i
and we have fi(x) = {/¿(x)|x G x}. Let
ui = inffi(x), ü = supfi(x) (i = 1, ■ ■ ■ , n)' Thus, we can define
1
p
(1.9)
fP(x) = p ln { X!(exp(pfi(x)) + exp(-pfi(x)))
i=l
1
p
in ^ ( n
y^(exp(pui) + exp(-püi)w , ln ^ y^(exp(püi) + exp(-pui))
n
E ai(exp(pfi(x)) - exp(-pfi(x)))
f'p(x) = ^-,
E(exp(pfi(x)) + exp(-pfi(x))
(1.10)
i=1
where ai = (ai1,..., ain), i = 1, 2,
n'
e
Obviously, the interval function fp(x) and f'p(x) are the interval extension of the function
fp(x) and the derivative
d/p(x) öx
on box x.
Suppose that x(0) is subdivided into N segments x(j) (j = 1, ■ ■ ■ , N) and x(0) = uNLix(j), such that the width of x(j) satisfies = max ||wid(x(j)) — 0(N — to). Let } be
monotonically increasing sequences of real numbers with > 1, N =1, 2,..., such that limN^^= +to, and let
aw = min {inffPN(x1)}.
i<i<N
(1.11)
Theorem 1.2. Given e > 0, 3N > 0, such that — to (VN > N)
min max{|/i(x)|}-xgx(o) 1<t<ra
< e.
(1.12)
Proof. From (1.4), we know that there exists N1 > 0 such that (VN > N1)
min max {| (x) |} - mini) fPN(x) x€x(0) 1<i<n xex(0)
minn) g(x) - min) fPN (X)
x€x(0) x€x(0)
< e.
(1.13)
On the other hand, there exists x(r) such that
mf fpN (x(p)) = mi^ {inf fpN (x(1))} = aw.
Let
v = min /(x), v = min (—/¿(x))
cex(|*)
exC)
then by (1.9), we have
min /pN (x) - aw
xex(0)
min /pN (x) - inffpN(x(r))
xex(0)
<
min, /pN (x) - inf fpN (x(1+) )
iln
PN
]C(exp(pN ■ v) + exp(pN ■ v))
¿=i
E(exp(pNfj(x(1t))) + exp(-pN fj(x(1t))))
¿=i
— ln1 = 0.
Pn
(1.14)
Therefore, for N = N1, from (1.13) and (1.14), we deduce that (N > N)
min max {|/i(x)|} - aN
xex(0) i<i<n
<
min max {| £(x) |} - min /pn (x) xex(0) i<i<n xex(0)
+
x
+
min £
xex(0)
pN
(x) — aN
< e + 0 < e.
So, we complete this proof.
2. Interval Algorithm
In actual computations, it is important for our algorithm how to choose the initial interval (box) that includes the solution of system Ax = b. Here, we give the following result.
Theorem 2.1. Let the matrix A be nonsingular and x* be some isolated solution of the system Ax = b, then
> + e)n
x* G E
A1/2
where E = (E0,..., E0)T, E0 = [-1,1], e > 0, and A is minimal eigenvalue of ATA. Proof. We have
n • ||Ax|U > ||Ax|2 = (xTATAx)1/2 > (AxTx)1/2.
Since A is nonsingular, then A > 0
||Ax|U >
(AxTx)1/2
n
and
x| 2
||Ax - b|U + ||b|U > ||Ax|U (A)
\ 1/2
n
For all e > 0, if ||x||2 >
+ e)n
A1/2
then ||Ax — > 0. So, we can deduce that
Ix||2 <
+ e)n
furthermore, we have
x* G E
A1/2 > + e)n
A1/2
2.1. Test Operator
For Ax = b we define interval test operator as follows:
K(x) = Db + (I - DA)x,
(2.1)
where D is an n x n nonsingular real matrix, I is the n x n unit matrix. It is easy to prove the following theorem.
Theorem 2.2. For each x G x(0):
1) every solution x* of the system Ax = b within the box x is also contained in K(x), so that x* G x U K(x);
2) if x fi K(x) = then the box x contains no solutions of the system Ax = b;
3) if K(x) C x, then the box x contains, with certainty, at least one solution of the system Ax = b;
4) if K(x) C int(x) and A is nonsingular, then the box K(x) contains exactly one solution of the system Ax = b.
In actual application, we can also improve operator K by applying the Gauss-Seidel technique
Ki(x) = pi + ^ Vj Xj (i = 1, ••• , n), j=i
(2.2)
DO
CO
where p = Db, V = I — DA. So, we have
i—1 n
Kj(x) = Pi + £ C*(x)) + £ Vjjxj (i = 1, ■ ■ ■ , n), j=1 j=i+1
(2.3)
where C*(x) = CCj (x) n xj (j = 1, ■ ■ ■ ,n — 1). The expression of the box K(x) can be rewritten in the following matrix form:
CC (x) = p — LC*(x) — U x,
(2.4)
where C*(x) = CC(x) n x and
L
0
«21
y an1
0 0
0 0
U=
( «11 — 1 0
an,n—1 0 J
«12 «22
—1
V
0
■ ■ a1n
■ ■ a2n
0 ann 1
/
CC is the Gauss-Seidel interval test operator, and we can also prove that CC(x) has the same property as Theorem 2.2, and it has smaller excess-width under certain condition, Therefore, replacing C(x) with CC(x) in the described algorithm, we usually can obtain better results.
In the process of actual computing, the matrix D can be chosen in different ways. For example, let D = (e1,e2, ■■■ ,em,e1 ,e2, ■■■ , en—m)T, where e» G Rn, or let D = diag(a11, a22, • • •, ann) and so on. The choice of D affects the result of iteration. Example 1. For
2x1 — x2 = 2, —4x1 + 7x2 = —5,
where x = ([0.8,1.2], [—0.2, 0.3])T, x* = (1.1, 0.2)T. Let
1/2 0
D
0 1/7
then
C(x) = Db + (I — DA)x
[0.9, 1.15] —3.1 1'
so, we have and
C(x) n x = ([0.9,1.15], [0.2, 0.25])
T
x* G C(x) n x, wid(C(x) n x) < wid(x). If let x = ([0.8,1.2], [—0.2, 0])T, then
C(x) = Db + (I — DA)x
[0.9, 1] —4.4 —0.2
7
7
7
4
and
-0.2
T
K(x) n x = ^[0.9,1], 0.2, Further, let x = K(x) n x, then
K(x) n x = 0,
so, we have result x* / ([0.8,1.2], [-0.2, 0])T according to Theorem 2.2.
2.2. Region Deletion Rules
Region deletion rules can reduce the computational cost and accelerate the convergence rate of algorithm.
Deletion rule 1 (feasible region test):
If 3j* G {1,..., n} such that 0 / fj» (x), then discard x, which is an infeasible region. Deletion rule 2 (midpoint test):
Let f be an upper bound of minxex(o) fp(x). For x G I(x(0)), if fp (x) > f, then there is no optimal point in x, and x can be discarded.
In the process of actual computing, f be selected and adjusted according to algorithm and interval extension of fp(x). In this paper, at first let f = fp(mid(x)) (suppose that feasible solution exists in x(0). For x C x(0), we have
f = min{f,fp(mid(x))}.
Deletion rule 3 (monotonicity test):
1) if fpk(x) < 0, then let x = (xi, ■ ■ ■ , xfc_i, xfc, xfc+i, ■ ■
2) if fpk(x,^) > 0, then let x = (xi, ■ ■ ■ , xfc-i, xk, xfc+i, Deletion rule 4:
1) if x n K(x) = 0, then the box x can be discarded;
2) if x n K(x) = 0, then the box x = x n K(x).
(2.5)
I xn)
T
I xn)
T
2.3. Interval Algorithm
With the preceding ideas in mind, we suggest the following interval algorithm for solving system (0.1). The algorithm starts from x(0) and generates a list L. The elements in the list L are of the form (x, fp, f) and are arranged in an increasing order of fp, in which f is selected and adjusted according to (2.5), and fp = fp(x). Each x can be obtained from the region of the first element in the list L in the following ways. The region is bisected normally to the coordinate direction, in which x has the maximal width. The detailed steps of the algorithm are the following.
Step 1. Set x = x(0), f = fp(mid(x)), fp = fp(x), f = f, B0 = Enter the item
(x, fp, f) into the list L.
Step 2. If the list L is empty, then go to Step 10.
Step 3. Extract the first element (x, fp, f) from the list L and remove it from the list L. Set B = min{B0, fp}.
Step 4. If f — B < e for given tolerance e > 0, then proceed with Step 8. Step 5. Bisect x normal to the coordinate direction, in which x has the maximal width, to obtain x(i), x(2) such that x = x(i) U x(2).
Step 6:
1) set i = 1;
2) if these exists j G {1, ■ ■ ■ , n} such that U / f (x'
3) compute K(x(i)), let x(i) = K(x(i)) n x(i). If x(i) = 0, then go to point (7);
n} such that U / fj (x(i)), then go to point (7);
4) if fpk (x(i)) < U, then set x(i) = (xl
(i)
x
(i) =(i) „(i)
then set x(i) = (x'
(i)
x
(i) , x(i) k-1, xfc
> xfc-i, xfc
x
(i) '■fc+i,
,xii))T, k
k+!>
>xii))T; if flfc(x(i)) > U,
1,
n;
5) compute / = fp(mid(x(i))); f_ = f„(x(i)), / = min{/, /};
6) enter the item (x(i), fp, /) in proper order into the list L;
7) if i < 2, then let i = i + 1, then go to point (2).
Step 7. Discard all the elements (x, fp, /) from the list L that satisfy fp > /, return to Step 2.
Step 8. Print [fp,f] and x.
Step 9. Set B0 = B, and return to Step 2.
Step 10. End.
When working in actual computing process, it is necessary to keep in the mind the following items:
ln 2n
1) according to Corollary 1.1, let p >-;
e
2) in numerical computation, in order to avoid deletion of optimization solution that is on the boundary of interval, we use rounded interval arithmetics;
3) in this paper, we select
«ii, «ii = U, 1, aii = U.
D
Conclusions
In order to test the efficiency of our algorithm, we consider the linear system Hx = b with
Hilbert n x n matrix. Recall that H is called the Hilbert matrix of order n if its entries 1
hi
ij
i + j - 1
i, j
1,
H
n:
( 1 1/2 1/3 1/4
1/2 1/3 1/4 1/5
1/3 1/4 1/5 1/6
V . . . .
/
The matrix is poor-conditioned even for small dimension. For this reason, serious difficulties arise when computing the solutions of linear equations system with this matrix because of the large sensitivity of the solution to errors in actual computing. Thus, the Hilbert matrix has been taken as a good test example for the algorithm of this paper. Let x = (x1,..., xn)T, b = A(1,..., 1)T and x(0) = [— 1, 2], i = 1,..., n. It is easy to check that the solution is x* = (1,..., 1)T. To solve the above problem with traditional algorithms, we can draw the results as follows.
By the matrix factorization method, we can not obtain satisfactory solution because of the condition number Cond(A) ^ 1(n > 1U). Jacobi iterative method is divergent, Gauss-Seidel iterative and SOR iterative methods are convergent, but the rate of convergence of both methods is very slow, and it is hard to obtain the solution with precision less than 1U-4
1
because the spectral radius of iterative matrix is less than 1 and very close to 1. With our interval algorithm, let entropy factor be p = 106 (or p be large enough), we can obtain the solution with any precision e > 0.
But it is necessary to note also the following problems:
1. Since of complexity of interval operations, we can spend more time with our algorithm to obtain a satisfying result (if given precision is e ^ 10-i0 or smaller), but the interval method has the global convergence and obtains both the solution and its error simultaneously. So, the direction for future research is in obtaining additional theoretical results on convergence with new rules of region deletion and in finding the way to produce the box sequence {x(k)} converging to the solution x* without any a priori information.
2. To obtain the initial box x(0) by Theorem 2.1, we need to compute eigenvalues of ATA. So, we expect to gain good result in future research, which is a direction for future research.
In this paper, a new algorithm based on the introduction of maximum entropy function and interval computation has been presented to generate the solution of ill-conditioned linear equations system. The algorithm can be applied to a number of problems such as solving systems of linear inequalities or nonlinear equations and nonlinear inequalities, global optimization, etc.
Acknowledgements: The author thanks the referee for his/her valuable comments and suggestions which improved the original manuscript of this paper.
References
[1] Ratschek H., Rokne J. New computer methods for global optimization. Chichester: Ellis Horwood Limited, 1988.
[2] Wolfe M.A. Interval methods for global optimization // Appl. Math. and Comput. 1996. Vol. 75. P. 179-206.
[3] Moore R.E. Methods and application of interval analysis. Philadelphia: SIAM, 1979.
[4] Shen Z., Huang Z., Wolfe M.A. An interval maximum entropy method for a discrete minimax problem // Appl. Math. and Comput. 1997. Vol. 87. P. 49-68.
[5] Hansen E. Global optimization using interval analysis the multi-dimensional case // Appl. Math. and Comput. 1980. Vol. 35. P. 505-512.
[6] Hansen E. Global optimization using interval analysis. N. Y.: Marcel Dekker, Inc. 1992.
[7] Hansen E., William G. Global optimization using interval analysis. Second Edition, Revised and Expanded. N. Y.: Marcel Dekker, Inc. 2004.
[8] Li X. The coherent function method solving nonlinear program // Sciences in China (A). 1991. Vol. 12. P. 1283-1288.
[9] Li X. The efficient methods for a sort of nondifferentiable optimization problem // Sciences in China(A). 1994. Vol. 24. P. 371-377.
[10] Wang H., Cao D. Smoothing iterative method for a class of nonsmooth equations // Comput. Technol. 2006. Vol. 11, N 6. P. 1-9.
[11] Boltzmann L. Weitere Studien uber das Warmegleichgewicht unter Gasmolekulen // Wien. Akad. Sitz. 1972. Vol. 66. P. 275-370.
[12] Jaynes E.T. Information theory and statistical mechanics // Phys. Rev. 1957. Vol. 106. P. 620-630.
Received for publication 31 Oktober 2008