ON ESTIMATION OF PARAMETERS BY THE MINIMUM DISTANCE METHOD
V. P. SHULENIN •
Tomsk State University, Tomsk, RUSSIA [email protected]
ABSTRACT
Parameter estimates, constructed by the minimum distance method, are briefly called the MD-estimates. The minimum distance method has been proposed by Wolfowitz (1957). An extensive bibliography was compiled and published by Parr (1981). In this paper the effectiveness of the shift parameter estimation based on the use of Cramer - von Mises weighted distance is discussed. The robustness of this kind of MD-estimates under various supermodels describing deviations from the Gaussian model is considered. Numerical results are given for the case of contaminated normal distributions.
Statement of the problem
Let us consider first a case when the statistical model (X, 3e) is given in parametric form. X = {x} denotes the sample space, the elements of which are realizations x = (x1,..., xn) of a random vectorX = (Xj,...,Xn); 3e = {F : F(x,e), e e 0} is a parametric set of admissible probability distributions for the experiment considered; X1v..,Xn is a sequence of i.i.d. random variables with the distribution function F (x, e) and the density f( x, e), x e R1, ee0. The functional form of the distribution is defined up to an unknown parameter (scalar or vector), which belongs to a given parameter set 0. It is required to construct the estimate of an unknown parameter e e 0based on a sample X1,...,Xn from a distribution F(x,e).
The essence of the minimum distance method
If a distance p(F, G) between any two distributions, F, G e 3, is given, then parameter e may be estimated by minimization of the distance between the empirical distribution function Fn (x), constructed from a sample X1,..., Xn, and the distribution function Fe (x) = FX (x, e) adopted in the model (X, 3e). Thus, for a chosen distance p(F, G) MD -estimator for e is defined as
e = argmin {p(Fn, Fe)}. Various distances could be used for constructing MD -estimates (see Parr,
e
and Schucany (1980)). For instance, the maximum likelihood method is based on a distance P( Fn, Fe) = -J ln f (x, e)dFn (x).
In this paper, we consider the estimates that are based on the weighted Cramer - von Mises distance
Pw (Fn, Fe ) = J [Fn (x) - Fe (x)]W (x, Fe ) dFe (x) (1)
where We = W (x, Fe) is a certain weight function, which may depend on d.f. Fe (or on density fe).
Assuming that pW (Fn, Fe) a differentiable function of the parameter 0, its derivative is XF (0) = dpW (Fn, Fe)/ 50 . With this notations, the estimation 0n for parameter 0 based on the use of weighted Cramer-von Mises distance (1) is a solution of the equation
Xf (0) = -2f [Fn (x) - Fe (x)] We (x)dF6 (x) + f [Fn (x) - F6 (x)]2 A [We f6 (x)]dx (2)
n j 50 50
In this paper we consider the MD -estimation of the location parameter; in this case Fe (x) = F (x -0). Let a family of reference distributions be designated as
30 = {F: Fe(x) = F0(x-0), 0 e R1}, where F0is a distribution with density f0. Rewrite (1) as
PFn,F0 (0, W) = f [Fn (x) - F)(x - 0)]2 W(x - 0)dx. (3)
Note that the choice of the weight function W in the form of the density of reference distribution, i.e., in the form W(x) = f0(x), corresponds to the Cramer-von Mises distance; the choice of the weighting function W(x) = f0(x)/ F0(x)(1 - F0(x)) gives the distance of AndersonDarling (see for example, Boos (1981), Shulenin (1993a)). Assuming that pF,F (0,W) is a differentiable function of the parameter 0, its derivative is XF (0) = 5pF ,F (0,W)/ 50 . Then the equation Xf (0) = 0 for the obtaining the MD -estimation, may be written in the form
2i -1 "
- W(X0) -0) = 0, (4)
o n
- I
n i=1
2n " F0(-e)
where X(1),..., X(n) the ordered statistics of the sample X1v..,Xn.
Asymptotic normality of the MD -estimators
The asymptotic properties of MD -estimators were studied by several authors (see, for example, Boos (1981), Wiens (1987), Shulenin (1992)). In this paper, we discuss the asymptotic properties of estimators 0n of the parameter of location 0, which, for a given reference d.f. F0, and given weight function W, is a solution of equation (4). There are two variants of parameter estimating: Version 1. The distribution function F of the observations X1v..,Xnis known and it coincides with the reference distribution function F0, that is F = F0 (or F e 30).
Version 2. The distribution function of the observations is not known and it is not necessarily the same as the reference distribution function, that is F * F0 (or F g30).
Note that the MD -estimator 0n of the location parameter 0 , which is the solution of equation (4), can be written as a functional of the empirical distribution function, in the form of 0n = 0(Fn). Here the functional 0(F) is defined either by relation
min p f,fo(0,W ) = pf,fo(0( F ),W ),
or may be given implicitly (as functional T (F) = 6( F)) by expression
2f [F(x + T(F)) - Fo(x)]fo(x)W(x)dx - f [F(x + T(F)) - F0(x)]2W'(x)dx = 0. (5)
For studying the asymptotic properties of the MD -estimators 0n =0(Fn) for the location
parameter 0, we use the approach of Mises (see Serfling, R. J. (1980), Shulenin (2012)). Let us consider the expansion of the form
0(Fn) = 0(F) + Vln + Rln, (6)
where V1n is approximation statistics, and R1n = 0(Fn) - 0(F) - V1n is the remainder of the expansion (6). Let us start from defining approximation statistics V1n and the remainder R1n. It is necessary to
compute the Gateaux differential of the first order dxT(F; G - F) for functional T(F) defined by (5). Let FX = F + X(G - F), 0 < X < 1. Replacing the distribution function F in (5) by the d.f. FX, we obtain the expression
2J{F(x + T(Fx)) + X[G(x + T(Fx)) - F(x + T(FX))] - F0(x)}f0(x)W(x)dx -
-J {F (x + T (Fx )) + X[G (x + T (Fx )) - F (x + T (Fx ))] - F0( x)}2 f0( x)W'(x)dx = 0.
Differentiating the expression on X, setting X = 0, and taking into account that
dTT (F; G - F) = ST (Fx )/ SX |x=0, T (Fx ) |x=0 = T (F) = e, we get
I" [G(x) - F(x)] {[F(x) - F0(x - 9)]W' (x - 9) - f (x - 9)W(x - 9)}dx
dxT (F; G - F) = L.-----
J f (x)f0 (x - 9)W(x - 9)dx - J [F(x) - F0(x - 9)] f (x)W- (x - 9)dx
From this expression, after replacing G by the empirical d.f. Fn , we get an approximation for statistics V1n :
Vm = dxT(F; Fn - F) = n-1 Z IF(X,.; F, F0, W). Here IF(u; F, F0, W) = dxT(F; Au - F), 0 < u < x , is the Hampel influence function for the MD -estimator 9n =9(Fn) of the location parameter 9 , which for a given reference d.f. F0 and given weight function W is a solution of equation (4). Note that the expression for the influence function also follows from the above formula by replacing d.f. G by degenerated at the point u distribution function Au. The resulting formulas, together with the expansion (6), are the basis for
the proof of asymptotic normality of the MD -estimators, which are solutions of the equation (4).
Note that the general conditions of regularity (which impose restrictions on the behavior of
the tails of d.f. F and the weight function W) under which the expression 4nRln ^p 0, n ^ x , and for which MD - estimator is consistent and asymptotically normal, given in Boos (1981). In addition, the considered here MD - estimates belong to the family of MDa - estimates whose asymptotic properties are described in Shulenin (1992).
To facilitate formulating further results, let us denote by 3S a family of absolutely continuous symmetric distributions. Let the class of weight functions WS consists of differentiable and even functions, that is W (-x) = W (x) and
J {F(x)(1 - F(x))}p W(x + c)dx < x, p > 0, c e (-x, + x). Theorem. Let (F, F0) e3S, W eWS. Then, under fulfillment of the inequalities 0 <a 2( F; F0, W) = J IF 2(x; F, F0,W)dF (x) <x , the asymptotic expression can be written in the form of
L{4n[9(Fn) - 9(F)]/o(F;F0,W)} = #(0,1),n ^ x. The asymptotic variance ofMD -estimate with the reference d.f. F0 and the weight function W under the distribution F of observations Xj,...,Xn, is equal to D(F;F0,W) = g2(F;F0,W)/n; the Hampel influence function IF (u; F, F0,W) = - IF (-u; F, F0,W) for the MD -estimates is calculated by formulas
IF(u;F,F0,W) = AF,Fo(u;W)/BFFo(W), 0<u <ot, (7)
u
Af,f0 (u; W) = jW(x)dF(x) - W(u)[F(u) - Fo (u)], (8)
0
00 00
Bf,f0 (W) = j fo (x)W(x)dF(x) - j [F(x) - Fo (x)]W7 (x)dF(x). (9)
-ot -OT
The proof can be found in Boos (1981), Wiens (1987), Parr and de Wet (1981). Note that for the first version of parameter estimation 0, when F e30 the influence function
IF(u; F, W), 0 < u < ot is given by
(• +OT (• u
j {F(x) -1[u < x]}W(x)dF(x) j W(x)dF(x) IF (u; F ,W) = — J0
/• +OT /• +OT
j f 2( x)W (x)dx j f (x)W (x)dF (x)
J -OT J -OT u
= 7_1(F,W)J f(x)W(x)dx ,0 < u <ot , (10)
0
and the asymptotic variance of -Jn MD -estimate is given by
/•+OT/<*+OT \ 2 /• + OT/ fx \
j (j {F (y) - I[u < y]}W (y )dF (y)) dF (u) j (jw (y )dF (y) ) dF (x)
(f , w ) = j-ot(j-ot -—-1-= ^-^^. (11)
( j-/ (x)W (x)dF (x) J (j-7 2( x)W (x)dx)
Efficient MD - estimators
For the first version of parameter 0 estimation (when the distribution function F of the observations X1,...,Xn is known and coincides with the reference function of a symmetric distribution F0) there is an effective parameter estimate in the class of MD - estimators. Its asymptotic variance is equal to the inverse of the Fisher information I (f0) about 0 in distribution F0(x - 0) with the density f0. This score is determined by the effective weight function of the form
d2{-lnf,(x)} _ _J_ dx 2 f0( x)
This effect was observed earlier in Boos (1981), Parr, De Wet (1981). Correctness of this fact can be seen from the following. Let us denote y(x) = -f (x)/ f (x); then y (x) = d2{-ln f (x)}/ dx2, and the expression (12) can be rewritten, taking into account that F = F0, as W (x) = a y/ (x) / f (x). Substituting this weight function W gWs in (11), and taking into account that F g3s , y(0) = 0, we obtain
02(FW) = j-dF(x) _ 0 j-.y2(x)dF(x) _I(f)_ 1
W*(x) = l. . (12)
7 2 P / \
(j+Jf 2(x)W(x)dxj a2 (j+V(x)dF(x)J I (f) I(f)
Example 1. Note that the use of (12) allows to find the distribution function F0, under which the Cramer - von Mises MD -estimator with the weighting function W(x) = f0(x) produces asymptotically efficient parameter estimates. In fact, solving the differential equation d2{- ln f0(x)} /dx2 = a • f02(x) under W(x) = f0(x), we obtain the density of the form
with the distribution function
f (x) = 2/[w(ex + e"x)] = (1/ w) sech(x), x g r\ F0 (x) = (2/ w) arctg(ex ), x g R1,
which is called the hyperbolic secant. Note that the Fisher information for the parameter 0 in the density f0(x) = (1/ K)sec h(x) is hyperbolic secant as for the Cauchy distribution, and is equal
I(f0) = 1/2. Hence g2(F0 ,W = /0) = 2. Note, in addition, that the influence function for MD -
estimation with the weighting function W = 1, with F = F0 is limited and defined as
IF(x;Fo,W = 1) =
F0 - (1/ 2) (2 / w)arctg(ex ) - (1/2)
[if0(F-l(t ))dt
(2/w2)
= warctg(ex) - (w2 /4) , x g R1.
The asymptotic variance of the MD - estimate with weight function W = 1 and F = F0 is the same as the asymptotic variance of Hodges - Lehmann estimate HL, and for distribution F0( x) = (2/ tc) arctg(ex) is given by
2 1
g2( fo,w = 1) =
12ÎJo1./0( Fo-1(t ))dt
1
_4
= — « 2,029 = G2(F0, ÄD . 48 0
12^ (2/ k) J q sin( k t /2)cos(K t / 2)dt J
Example 2. Let the supermodel 3* = {F(1), F(2), F(3), F(4), F(5)} be a finite set of distributions, where F(1) = 0 is the standard normal distribution, Fisher information I(/(1)) = 1; F(2) is logistic, I(/(2)) = 1/3; F(3) is Laplace, I(/(3)) = 1; F(4)is Cauchy ,I(/(2)) = 1/2; F(5)is hyperbolic secant, I (/(5)) = 1/2. Optimal weight functions of the form (12) for these distributions are given in Table 1 and in Figure 1.
Table 1. Optimal weight functions of the form W* (x) = a • y (x)/ /(x)
f(1) f(2) f(3) f(4) f(5)
W(1)( x) = 1/ x) iii 1? s c^ W(3) (x) = 2e|x|5( x - 0) W(*4)( x) = (1 - x 2)/(1 + x 2) W(5)( x) = (2/ x)(ex + e "x ) -1
Fig.1. Optimal weight functions-estimates for F g
Note that the asymptotic variance of MD - estimate with the reference distribution F0 (x) = F (x) and the weight function W (x) = 1/ f (x) coincides with the asymptotic variance of the
sample mean X , and is calculated by the formula
f "1 ¡W(y)dF (y) J dF (x) f 1 [(1/f (y))f (y)dy ) dF (x) . a2(F, W = 1/f) = ^-^^ = ^-^^ = f x*dF(x) .
[11/ 2( x)W (x)dx) (J-Jf 2(x)(1/f (x))dxJ -
For the weight function W (x) = 1/ ^ (x), where x) is the standard normal density, MD -estimator is an efficient estimate of the location parameter 0 of the normal distribution, but it has, like the sample mean X, the unlimited influence function IF(x; O,W = 1/ = x, x e R1 and its sensitivity to gross errors is not limited, that is y* (O, W = 1/ = ". Note also that the choice of the weighting function W(x) = 1 leads to asymptotically efficient MD - estimator for the logistic cdf F(2) (the variance in this case coincides with the variance ofHL - estimator), and the absolute efficiency of the MD - estimator with weight function W(x) = f(2)(x) is equal to AE (F(2), W = f(2)) = [3,036(1/3)]- = 0,988 . Recall that for the logistic distribution F(2) with density f(2), the equality f(2) = F(2)(1 - F(2)) holds, and therefore, the choice of the weighting function in
the form inherent in MD - estimation based on the use of the Anderson-Darling distance, W(x) = f0 / F0(1 - F0), also leads to an effective MD -estimation for the logistic distribution. For
the Laplace distribution with density f(3)(x) = (1/2)exp(-1 x |),x e R1 function y (x) = - f(3)( x)/ f(3)( x) = sign (x) and, therefore, the optimal weight function W* (x) = a -y/(x)/ f (x) defined by (12), takes the form
W(3)(x) = {sign(x)}/ / f(3)(x) = 5(x - 0)/ f(3)(x) = 2e|x| 5(x - 0). Using this expression for the optimal weight function, and (11), one may see that the asymptotic variance of MD - estimate coincides with the asymptotic variance of the sample median X1/2, which is asymptotically efficient estimate of parameter 0 for the Laplace distribution. In fact, from (11) with the weighting function W(x) = 8(x - 0) / f (x), we obtain:
f (f {F(y) - I[u < y]}W(y)dF(y)J dF(u)
a2 (F, W) = TJ-" J
f f(x)W(x)dF(x)
J — œ
f +œf c +œ \2
f—JJ—JF(y) — I[u < y]}ô(y — 0)dyJ dF(u)
f f ( x)S( x — 0)dx
VJ—œ y
Î+œ
{F (0) — I[u < 0]} 2 dF (u )
— œ _
' f 2(0)
Î+œ f+œ
I[u < 0]dF(u) + f I [u < 0]dF(u)
—œ —œ
2 — = a2 (F, X1/2).
f2(0) 4/(0) v 1/27
Note that for the Cauchy distribution the optimal weight function W(4)( x) = a (1 - x 2)/(1 + x2) is negative outside the interval [-1,1]. This fact can be explained as follows. From (10) it follows that the weight function W is expressed through the derivative of the influence function in the form W(u) = J(F,W)IF7(u;F,W)//(u), 0 <u < x . So, to "reduce" the influence outliers on the MD -estimation, it is necessary its influence function to decrease for large values of the argument and, consequently, the weight function should be negative, as is observed for the optimal weight function W(4)( x) = a (1 - x 2)/(1 + x2) for the Cauchy distribution.
Example 3. Consider the family of t-distributions 3r g3s , for which the density distribution / (x) with degrees of freedom r can be written as
fr (x) = A(r)(1 + (x2 / r))-(r+1)/2, x e R1, A(r) = r((r + 1) / 2) / ^fTK r(r / 2). Using (11), we can see that the optimal weight function for this family of distributions is calculated by the formula
W*(x) = a • r-(r+1)/2(r +1)A_1(r)(r - x2)(r + x2)(r-3)/2 . Hence, under r = 1 we obtain the optimal weight function for Cauchy distributions as W*(x)|r=1 = a• 2k(1 - x2)/(1 + x2) = W(4)(x). The case of r ^x corresponds to the normal
distribution. Given that under r ^ x, the expressions A(r) ^ 1 / V2K and (1 + (x2 /r))-(r+1)/2 ^ e-x2/2 are hold, from the general formula, we obtain:
lim rW* (x) = a • V2K exp(x2 /2) = a • 1/ 4 (x) = W(1) (x) .
Robustness of the MD-estimators
To study the properties of robustness, we consider two types of supermodels that describe deviations from the Gaussian model of observations. The first supermodel 3*S, which was used in Example 2, is defined as a finite set of given distributions, that is,
3S = {F(1), F(2), F(3), F(4), F(5)} .
Second supermodel 38T (O) called Gaussian model with scale contamination, is determined as 3e,T(O) = {F : Fe,T (x) = (1 - e)0(x) + e 0(x / t)} ,0 < 8 < 1, x > 1, where 0(x) is the standard normal distribution function with density 4(x), e - the proportion of sample contamination, and t is a parameter of the scale contamination.
Example 4. The first option. First, we consider the properties of MD -estimators within a supermodel under different types of reference cdf F0 and weighting functions W . For the first version of parameter 0 estimation (when the distribution function F is known and equals the reference distribution function F0, that is F e 30), the influence function of MD -estimation and its asymptotic variance are given by (10) and (11). Let us consider various types of the weighting function W eWS .
A) Let W (x) = 1, F (x) = F0(x). Under these conditions the MD -estimators with the weight function W(x) = 1 are B -robust, that is, they have limited influence functions, which are defined as IF(x; F,W = 1) = {2F(x) -1}/2 J /2 (x)dx. F = O function is
given by IF(x; 0,W = 1) = VK[20(x) -1]. The sensitivity to gross errors
y*(F, T) = sup| IF(x; F, T) | of MD -estimators with the weighting function W(x) = 1 is equal to
x
y* (O,W = 1) = 1,77.
if{x,<p,8)
3 -
w; = i/<p—/ 3
// w = 1 2
X0.S ^ r^y
3 ii ■s 1
if(x;f„e)
w = VU-/
w = l
x0.5 / .••'
MV"- —— — .
Fig. 2. Influence function of MD -estimators for the normal distribution
0 12 3 4
Fig. 3. Influence function of MD -estimators for the Cauchy distribution
B) Let the weight function coincides with the reference density, W(x) = f0 (x), and F(x) = F0(x). Under these assumptions the asymptotic variance of the MD - estimation is given by
g2( F ,W = f) =
LTJ> 2(y)dy1 dF (x)
2
£7 3( x)dx J
Note that for a Gaussian distribution F (x) = O( x) and the weight function W (x) = x) = (1 / ) exp { -x /2} we obtain from (10) the limited influence function
IF(x; O, W = = (V3rt / 2)O(x) = (V3rt / 2) [ 2O(xV2) -1], x e R\ where O( x) is the Laplace function given by
O(x) = (2/ V^)JQxexp{-x2 }dx, O(x) = 2O(xV2) - 1, x > 0, O(x) = (1/ V2rc)Jx exp{-x2 /2}dx .
Sensitivity to gross errors y*(F, T) of MD - estimation, with the weighting function W(x) = x), is equal to y*(O,W = = /2 = 1,53. In this case, the asymptotic variance estimation is
3^ 1
a2(O ,W = = 2J IF 2( x; O,W = dO(x)
Jo2( x)i
dx =
2
= (3/2) arctg(2 / V5) = 1,095 . The asymptotic variance of the MD - estimators for the cases (A) and (B) were calculated for the following distributions: F(1)- normal, F(2)- logistic, F(3)- Laplace, F(4)- Cauchy, F(5)-hyperbolic
secant. Numerical calculations derived from formulas forF0 = F(i),i = 1,...,5 and with different weight functions are shown in Table 2.
RT&A # 01 (28)
V. P. SHULENIN - ON ESTIMATION OF PARAMETERS BY THE MINIMUM DISTANCE METHOD (Vol 8) 2013 March
Table 2. The asymptotic variance of ^¡nMD -estimators for the supermodel 3S at F0 = F(i), i = 1,...,5
The weight function Fm =® f(2) f(3) f(4) f(5)
W -1 1,047 (0,96) 3,000 (1,00) 1,333 (0,75) 3,287 (0,61) 2,029 (0,98)
W(i)(x) = f(i)( x) 1,095 (0,91) 3,036 (0,99) 1,200 (0,83) 2,573 (0,78) 2,000 (1,00)
W(0( x) = f(0/ F(i) (1 - F(i)) 1,035 (0,97) 3,000 (1,00) 1,262 (0,79) 2,317 (0,86) 2,020 (0,99)
W(i)( x) = 1/ f(i)( x) 1,000 (1,00) 3,290 (0,91) 2,000 (0,50) œ (0,00) 2,467 (0,81)
W(*4)( x) = (1 - x 2)/(1 + x 2) 1,109 (0,90) 4,204 (0,71) 1,230 (0,81) 2,000 (1,00) 2,103 (0,95)
The absolute values of efficiency oiMD -estimates are given in parentheses, they were calculated according to the formula AE(F ,W) = [ct2(F , W) I (f)]-1. Note that for distributions with "heavy tails" (Cauchy and Laplace), the absolute efficiency of MD -estimators depends mainly on the choice of the weighting function W. For normal distribution, the optimal weight function is W(1)(x) = 1/ f(1)(x). Weight functions W - 1 and W(2)(x) = f(2)/ F(2)(1 - F(2)) are optimal for the
logistic distribution F(2). Weight function W(4)(x) = (1 - x2)/(1 + x2) is optimal for the Cauchy
distribution. Weight function W(5)( x) = f(5)( x) is optimal for distribution F(5) - hyperbolic secant.
Example 5. The second option. Consider the case when F ^ F0, and the supermodel
3S = {F(1), F(2), F(3), F(4), F(5)} is the finite set of distributions, F e 3S . In this case, the asymptotic
variance of -JnMD -estimators under the weight function W = 1 is given by
21 [ Fo(u) - (1/2)]2 dF (u)
a2(F, Fo,W -1) =—V-^-, F e3S . (13)
(j fo( x) f (x)dx J
The numerical values of the asymptotic variance of -JnMD -estimators for F e3S and the weight function W = 1, calculated using the formula (13). are shown in Table 3.
Table 3. Asymptotic variance of-JnMD -estimators, for 0(i) = 0(F0 = F(i), W - 1),i = 1,...,5,F e 3*S
ê \ F f(1) f(2) f(3) f(4) f(5) d (ê, )
0(1) 1,047 (0,96) 3,051 (0,98) 1,383 (0,72) 2,911 (0,69) 2,008 (0,99) 0,42
0(2) 1,016 (0,98) 3,000 (1,00) 1,524 (0,66) 3,679 (0,54) 2,069 (0,97) 0,57
0(3) 1,059 (0,94) 3,048 (0,98) 1,333 (0,75) 2,957 (0,68) 2,006 (0,99) 0,41
0(4) 1,046 (0,96) 3,025 (0,99) 1,385 (0,72) 3,290 (0,61) 2,017 (0,99) 0,48
0(5) 1,031 (0,97) 3,011 (0,99) 1,439 (0,70) 3,276 (0,61) 2,029 (0,98) 0,49
Note that in the table (3) in parentheses the absolute efficiency estimates are presented, calculated by the formula AE(F,G) = {a2(F,F0, W - 1)I(f )}-1. In the last column of the table, the defects of the estimates in the supermodel 3S, calculated from (19), are given.
Note 1. One of convenient means for comparing qualities of estimates 01v..,0k of a given parameter 0 of a symmetric distribution F is a concept of defect of the estimator (see, for example, Andrews at all. (972), Shulenin (2012)). Let 01v..,0k be a finite set of asymptotically normal and unbiased estimates of the location parameter0, based on a sample X 1v..,Xn from the distribution F , obeying the expression
(êi - i
= N(0, 1), n ^œ , i = 1,..., k .
i ^ f (0t)
Defect of estimator 0i, i = 1,..., k among the compared parameter estimates 01; symmetrical distribution F is defined as
DEf(0i) = 1 -min{aF(01),...,aF(0k)}/aF(0i),i = 1,...,k .
.., 9 k for a
(14)
Note that if among the estimators 01v..,0k there is an effective estimate, for which a F (0 *) = 1/1 (f) and, therefore, min{a F (0j),..., a F (0 k)} = 1/I(f), then the absolute defect of the estimator 0i is equal to one minus its absolute efficiency, i.e.,
ADEf (9i ) = 1 - AЭF (9i ), i = 1,..., k .
(15)
Note 2. Studying robustness of compared estimates 01v..,0k of the location parameter 0 in the supermodel 3 consisting of a finite set of symmetric distributions, 3 = {F1v.., Fr}, usually is made by observing the disposition of estimates' defects on the plane of two distributions. The defect for basic (ideal, usually a Gaussian) model is laid along the horizontal axis, and along vertical axis the defects for an alternative model, which is a part of a supermodel 3 = {F1v..,Fr}, is laid. With this visual representation of the defects count on the plane of the two distributions, the preference is given to the estimate, which is closest to the origin. As examples, the absolute defects of estimates are presented on the plane of distributions "Gauss-Laplace" and "Gauss-Cauchy", see Figures (4) and (5).
DE( F(3>0)
0,5 0,45 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0
A. ^0,05,
V ^0,1
—
a 6j(AiD — oij)
\
\
01 \*o.<
.45
»A
0,1
0,2
0,3
0,4
DE(d>, §)
Fig. 4. Defects estimates in the plane "Gauss-Cauchy"
DE(FW,6)
0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0
V ^0,05
\*0,1
\
\
\
S, (MD - ou) HL,v
»^0,2
HL,y
\ X
A0,3
^0,5
0,4____
et e3
0,1
0,2
0,3
0,4
df(<M)
Fig. 5. Defects estimates in the plane "Gauss-Laplace"
L
The advantages of the MD -estimates 0(i) = 0(FO = F(i ),W = f(J)), i = 1,...,5 for F g3"s before the family Xa - Winzor-means and family HLa -estimates Hodges-Lehmann 0 < a < 1/ 2 . are clearly seen in these figures (they are placed closer to the origin).
Note 3. If we want to draw a conclusion on the preferenced estimator among compared estimates 01v..,00k of the parameter 0 within the entire supermodel 3 = {Fx,...,Fr}, we can use the Euclidean metric using the above notations:
f ^ 1 1/2 d(0i; 3) = ^[DEP] (00i )]H , (16)
or
f ^ 1 1/2 Ad (00 i; 3) = f£ [ADEFj (0 i )]2 L , i = 1,..., k .
(17)
The preference is given to the estimator di with the minimal value of d(00i; 3), that is
d (00,; 3) = min{ d(0^ 3),..., d(0 k; 3)}. (18)
For the supermodel 3S = {F(1), F(2), F(3), F(4), F(5)}, the formula (16) can be written as
( 5 ~ Y^ r 5 ~ Y^
^ [1 - {a 2(F(j),0(i))/Cf(j))}-1]2 = £ [1" A3 (F( j ), 0(i ))]2 , i = 1,...,5. (19)
d (0,}, 3; ) =
(i )
V1=
(i)
V1=
According to the criterion (18), the preference among estimators 0(1),..., 0(5) in the supermodel 3;, should be given to the MD - estimator for F0 = F(3)with reference Laplace distribution, and with weight function W = 1, since this estimator has the minimum value of
d(00(3), 3;) = min{d(00(i), 3;), i = 1,...,5} = 0,41
(see the last column of Table 3). Compare it with that of Hodges-Lehmann d(HL, 3; ) = 0,47, of
Xa- Winzor-mean d(X0 45 ,3;) = 0,41; of the sample median d(X1/2,3;) = 0,51; of the sample
mean d(X, 3; ) = 1,14, Shulenin (2012, p.256).
Example 6. The second option. Consider the Gaussian model with a scale contamination 3eT (O). Let the reference distribution be a normal distribution F0 = O, and the distribution of the
observations is characterized by normal distribution with a scale contamination, F e3eT (O). Under these assumptions, the asymptotic variance of .JnMD -estimation for W = 1 is calculated by the formula
f + œ
2} [O( x) - (1 / 2)]2 [(1 - eM x) + (e / x)^( x / x)]dx
a2(Fe,T,O,W -1) = —0
/•œ
I x)[(1 - e)^( x) + (e / t)^( x / x)]dx
J—œ
[^(1 — e)/6] + [earctg(t2/^2t2 +1 )] {[(1 — e)/V2] + (e /V t 2 +1 )}2 '
RT&A # 01 (28)
V. P. SHULENIN - ON ESTIMATION OF PARAMETERS BY THE MINIMUM DISTANCE METHOD (Vol 8) 2013 March
For the weight function W(x) = f° (x) = x) the asymptotic variance of „¡nMD -estimator is given by
210° ii>(x)f, x (x)dx - <K«)[FB x (u) - O(u)f)2 dFe T (u)
a2(Fe,x,O,W = <>) = J° U0 }
i00 * 2(x)dFeT (x) - T * ( x)[FeT (x)— O(x)]dFeT (x)
J —o ' J —o ' '
2 • B 2(s, x)'
1 2°
£ fe x),
i=1
where B(s,x) and Ai(s,x),i = 1,...,20 are certain functions of the parameters s andx . The numerical values of the asymptotic variance of .JnMD -estimators for F e3£I(O) at different weight functions are given in the Table. 5.
Table 5. The asymptotic variance of„[nMD -estimators for F ¿3°,F = FST,F° = O
W , T \ £ 0,00 0,01 0,05 0,10 0,15 0,20 0,25 0,30
t = 3 W = 1, 5 1,047(0,95) 1,047(0,95) 1,071(0,96) 1,078(0,95) 1,171(0,97) 1,210(0,93) 1,307(0,97) 1,395(0,90) 1,458(0,95) 1,607(0,86) 1,625(0,94) 1,851(0,83) 1,811(0,93) 2,132(0,80) 2,019(0,93) 2,459(0,78)
t = 3 W = 0 , 5 1,095(0,91) 1,095(0,91) 1,117(0,92) 1,122(0,92) 1,209(0,93) 1,237(0,91) 1,333(0,94) 1,393(0,90) 1,470(0,95) 1,562(0,89) 1,620(0,95) 1,749(0,88) 1,786(0,95) 1,956(0,87) 1,972(0,96) 2,187(0,87)
The absolute efficiency of MD -estimates calculated using the formula AE(FET,6) = {<j2(F£T, W)I(/£T)}—1, where /(/ex)is the Fisher information about the location parameter of distributions from the supermodel 3e x (O), are given in the table in parentheses.
Fig. (6) shows the absolute efficiency of estimates for F e3£x(O). It is clearly seen that MD -estimates with the reference function F0 = O and the weight function W(x) = x), as well as the weight function W(x) = 1, provide high absolute efficiency when 0<s< 0,3. The absolute efficiency of the sample mean X decreases sharply, and the median for the sample X1/2 is slowly growing, remaining at low levels.
A3(F, <9), F 6 1PeF0 = <t>
_______ — - w = <p
\ w = 1
__
O 0,1 0,2 0,3
£
Fig. 6. Absolute efficiency estimates for F e3£ I (O), t = 3
Example 7. Adaptive version. Properties of the MD - estimates depend strongly on the choice of the weighting function W for distributions with "heavy tails". Therefore, the study of the properties of the efficiency and robustness of MD -estimates (for the case F ¿30) opens the possibility of an adaptive approach to the choice of the reference distribution F0 and weighting function W within the given supermodel, based on the sample estimates of functionals that determine the "degree of heaviness of tails" of distributions (see Shulenin (1993a)). Adaptive selection of the weighting function can provide the required quality of MD -estimates for a given supermodel.
Let us consider an example of the supermodel 3s T (O) = {F: F(x) = Os T (x) }. We assume that the proportion of contamination s may vary in limits 0 < s < 0,3, and the scale parameter x is x = 3. For this supermodel with the reference function F0 = O, let us define an adaptive weighting function W as
'1/ x), 1,71 < Q(Fn) < 1,76
1, 1,76 < Q(Fn) < 1,86, (20)
J(x), 1,86 < Q(Fn) < 1,91
where Q(Fn) is the sample estimate of the functional Q(F; v, which characterizes the "degree of heaviness of the distribution tails" and is defined in Shulenin (1993a). Sample estimate of Q(Fn) is based on a sample X 1v..,Xn and may be written as
k \ f n m \
JV( x ; x lv.., X n ) =
Q (Fn ; v, |) = m | Z X,)X,) / | Z X,)X,) , k = [v n], m = n]. (21)
i= n-m+1 i=1
t| ¿J - (i ) Z-l ^ (i ) k Vi=n-k+1 i=1 y
Here the parameters v and ^ satisfy inequalities 0 < v < ^ < 0,5, v = 0,2, ^ = 0,5 and X(1),...,X(n) are the order statistics of the sample X1v..,Xn.
Note that the choice of the weighting function in the form of (20), the absolute efficiency of adaptive MD - estimates do not fall below the level of 0.95 when the proportion s of contamination is 0 < s < 0,3 . It means that within a given supermodel the absolute efficiency satisfies inequalities 0,95 < A3(OeT, W) < 1 if t = 3, 0 < e < 0,3, n > 40 (see Figure 6). If we choose not to adapt the weighting function, and use, for example, the Anderson - Darling weight function in form of x,0) = x)/ O(x)(1 -O(x)), then the absolute efficiency of MD - estimates with such a weight function in the framework of the supermodel 3sx (O) could fall to the level of 0.47.
Conclusion
We studied the asymptotic properties of the MD - estimators of the location parameter 0, based on the use of a weighted Cramer - Mises distance. It is shown that these estimates are B -robust, that is, their influence functions are limited, and therefore, they are "protected" against outliers in the sample. For the case F e 30, the optimal weight functions are given that make MD -estimates asymptotically efficient. For the Gaussian model with a scale contamination (for F e 3s x (O), x = 3) the absolute efficiency of MD - estimates with the weight function W = 1 does
not fall below 0.93 at 0 < s < 0,3, and it increases from 0.91 to 0.96 for the weight function W = ^ .
Summarizing, we note that there is a close connection of MD -estimators of parameter 0 with the other robust M -, L -,andR - estimators (see Shulenin and Tarasenko (1994), Shulenin and Serykh (1993), Shulenin(1995)). Properties of MD - estimators in some cases coincide with those of many well-known estimates of the location parameter 0; for example, with the properties of the Hodges - Lehmann estimates, the sample mean and median. Note also that the abovementioned asymptotic results is quite good approximation for properties of MD-estimators for sample sizes
n > 20. This is confirmed by the numerous computer simulation results. Studied properties of the efficiency and robustness of MD -estimates open (for the case F g 30) the possibility to use an adaptive approach to the choice of the reference distribution function F0 and the weighting function W within the given supermodel, based on sample estimates of functionals that determine the "degree of heaviness of tails" of distributions (see Example 7 and Shulenin (2010), Shulenin(2010a)).
Acknowledgement
The author expresses gratitude's to Prof. F.P. Tarasenko for valuable comments and help in preparing English version of the paper.
REFERENCES
Andrews D.F., Bickel P.J., Hampel F.R., Huber P. J., Rogers W.H., Tukey J.W. (1972). Robust estimation of location: survey and advances. Princeton. N.Y.: Princeton Univ. Press. 1972, 375 p. Bickel, P. J. (1976). Another look at robustness: a review of reviews and some new development. Scand. J. Statist. Theory and Appl. 3, 145-168.
Boos, D. D. (1981). Minimum distance estimators for location and goodness of fit. J. Amer. Statist. Assoc. 76, N.375, 663-670.
Parr, W.S. and Schucany, W.R. (1980). Minimum distance and robust estimation. J. Amer. Statist. Assoc. 75, No. 371, 616 - 624.
Parr, W. C. (1981). Minimum distance estimation: a bibliography. Comm. Statist. A10, 1205-1224. Parr, W.C., De Wet. (1981). On minimum weighted Cramer-von Mists statistical estimation. Comm. Statist. A10(12), 1149 - 1166.
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. N. - Y.: Wiley, 371p.
Shulenin, V. P. (1992). ( Шуленин В. П. Асимптотические свойства и робастность MD-оценок. Теория вероятностей и её применение. 1992 т. 37, в. 4, с. 816-818).
Shulenin, V. P., Serykh, A.P. (1993). (Шуленин В. П. , Серых А. П. Робастные и непараметрические алгоритмы обработки данных физических экспериментов. Известия вузов Физика. 1993, № 10, с.128-136).
Shulenin, V. P.(1993a). (Шуленин В. П. Введение в робастную статистику. Томск: Изд-во Том. ун-та,1993.-227с.)
Shulenin, V. P. and Tarasenko, F. P. (1994). Connections of MD-estimates with classes of robust estimates of location parameter. 12th Prague Conf. on Inform. Theory. August 29 - September 2, 220-223.
Shulenin, V. P. (1995). ( Шуленин В. П. Границы эффективности оценок, построенных методом минимума расстояния Крамера-Мизеса. Известия вузов Физика. 1995, № 9, с.84-89).
Shulenin, V. P. (2010). ( Шуленин В.П. Свойства адаптивных оценок Ходжеса - Лемана в асимптотике и при конечных объемах выборки. Вестник Томского государственного университета. Управление, вычислительная техника и информатика. 2010, № 2(11), с. 96 -112).
Shulenin, V. P. (2010a). (Шуленин В.П. Эффективные и робастные MD-оценки Крамера-Мизеса. ВестникТомского государственного университета. Управление, вычислительная техника и информатика. 2010, № 3(12), с. 107 - 121).
Shulenin, V. P. (2012). (Шуленин В.П. Математическая статистика. Ч. 3. Робастная статистика: учебник. - Томск: Изд-во НТЛ, 2012. - 520 с.)
Wiens, D.P. (1987). Robust weighted Cramer-von Mises estimators of location, with minimax variance in e - contamination neighbourhoods. The Canadian Journal of Statistics. 15, N. 3, 269278.
Wolfowitz, J. (1957). The minimum distance method. Ann. Math. Statist. 28, 75-88.