ВЕСТНИК ТОМСКОГО ГОСУДАРСТВЕННОГО УНИВЕРСИТЕТА
2009 Управление, вычислительная техника и информатика № 3(8)
UDK 519.25
Sergey S. Tarima, Yuriy G. Dmitriev
STATISTICAL ESTIMATION WITH POSSIBLY INCORRECT MODEL ASSUMPTIONS
We combine a consistent (base) estimator of a population parameter with one or several other possibly inconsistent estimators. Some or all assumptions used for calculating the latter estimators may be incorrect. The suggested in the manuscript approach is not restricted to parametric families and can be easily used for improving efficiency of estimators built under nonparametric or semiparametric models. The combined estimator minimizes the mean squared error (MSE) in a family of linear combinations of considered estimators when all variances and covariances used in its structure are known. In real life problems these variances and covariances are estimated generating an empirical version of the combined estimator. The combined estimator as well as its empirical version are consistent. The asymptotic properties of these estimators are presented. The combined estimator is applicable when analysts can use several different procedures for estimating the same population parameter. Different assumptions are associated with the use of each of non-base estimators. Our estimator is consistent in the presence of wrong assumptions for non-base estimating procedures. In addition to theoretical results of this manuscript, simulation studies describe properties of the estimator combining the Kaplan-Meier estimator with the censored data exponential estimator of a survival curve. Another set of simulation examples combine semi-parametric Cox regression with exponential regression on right censored data.
Keywords: model misspecification, robust estimation, minimum mean squared error, multimodel inference.
In many applied problems researchers are challenged with statistical estimation of a population parameter 9.
In some cases 9 is expressed as a functional 9 = J g (y)dF (y), where the real valued
and possibly multidimensional function g (y) is known but the distribution F (y) may be either completely unknown (nonparametric case) or unknown with some restrictions (for example, symmetric or belongs to a parametric family). Different degrees of uncertainty about F (y) are expressed by different sets of assumptions and lead to different estimating procedures. The common part is that all of these procedures attempt to estimate the same 9. The quality of estimation highly depends on how well assumptions are used in an estimating procedure and whether these assumptions are correct or not.
There are situations when 9 is not easily expressed via J g(y)dF(y), for example,
when 9 is a regression coefficient or a distribution parameter. Then, a model dependent interpretation should be applied to 9. For example, 9 may be defined as a hazard ratio between two groups in a proportional hazards regression model. Different assumptions on a baseline hazard lead to different estimating procedures. Cox model deals with a nonparametric baseline hazard, Weibull and exponential baseline hazards lead to parametric regression models. We emphasize that the interpretation of 9 stays the same. Thus different estimating procedures can compete for being used for 9 estimation.
Often researchers choose a single estimating procedure and proceed as if underlying assumptions are correct. These procedures start with choosing a functional form of a model and then proceed with variable selection. A detailed review of model selection procedures can be found in [1]. The major focus of recent statistical research is on variable selection methods, see Fraiman [5], Radchenko [8] and Fan [4].
Multimodel inference avoids reliance on a single model via combing several models. Bayesian model averaging is discussed by Hoeting et al. [6]. The frequentist counterpart in presented by Hjort and Claeskens [7].
Hjort and Claeskens performed averaging over a set of parametric models with the same parametric form but different number of variables.
In our work we attempt to improve properties of our base estimator by guessing on additional restrictions and thus creating grounds for using the other possibly more efficient estimators of 9. Our approach can deal with misspecification of a functional model form as well as with misspecification of the set of variables.
Section 2 derives the estimator. Its asymptotic properties are considered in Section
3. Section 4 illustrates performance of the combined estimator for various scenarios of survival function estimation.
1. Estimator
Let Y1,Yn be an independent sample from an unknown distribution. If there are no additional information of any kind we can estimate 9 via a base estimator 9no). Hereafter n in a subscript highlights dependence on a sample of size n. We assume that the base estimator is asymptotically unbiased estimator of 9. Further we assume that there exist S sets of possibly incorrect assumptions. Each of these S assumptions can be used for building another estimator of 9, 9^-', 5 = 1,...,S . If an assumption, say (s')th, is correct, we may reasonably expect that 9^- is a more efficient estimator of 9 than 9(n0). However, it is not known which of the S sets of assumptions are correct and which are not. It is also possible that all sets are false.
In order to avoid dealing directly with different sets of regularity conditions we assume (1)Vs E9ns) = 9(ns- ^ 9(5-, where 9(0) =9, (2) E[9(ns)] <ro, Vs , Vn , (3) un-
n^ro
der a correctly chosen model ans (9ns) - 9(ns)) has a finite variance for every n including its limiting case at n = +ro, where ans is a diverging to +ro sequence of positive numbers as n . Consider a family of estimators of 9
5=1
where An = (^no ,—Ans). The mean squared error of 9 n(An) is
2
file!,0’-e«+1^, (-él,™)
V 5=1 J
-mse(„(A„))=^[(éf -é<„0))(-é(„0))]+2X-éiK-é(„0))]>
For every Xn
d
or, in a matrix form
d
dA„
where
MSE ( „(A„ ) ) = 2E [(<3(„0)-0(„O) ) „ ] + 2AnE [A „AI ],
 = () A(5 ) )T = (0(') -0(°) 0(5 ) -0(°) )T A n \A n ’•••’¿A n ) \0 n 0 n ’***’0 n 0 n ) *
From
we find
d
MSE ( n(A n )) °,
dA n
An0 =-E [(9(n0)-9(n0) )A Tn ] E-1 [A nA Tn ]•
Since det{E[(An-a)(An-a)T]} is minimized at a = EAn and det{cov (A n,AT )}^ 0,
the matrix of second derivatives —MSE (n(An)) = E [A nA Tn ] is nonnegative
T
_d_
dA n AT
definite, which assures that An0 defines the smallest MSE among (9 n(An). The case when the determinant is equal to zero corresponds to multiple solutions for A0 , but the
MSE stays at its minimum for each of them. The Moore-Penrose generalized inverse
can be used for selecting one of these solutions. Then,
9 n (A no ) = 9no) - E [(9no)-9no) )AT ] E- [A nAT ]A n, (i)
provides the smallest MSE among all (9 n (A n) which is
E(9no)-9no) )2 - E [(9(n0)-9no) )AT ]E- [A nAT ] E[(9(n0)-9(n0) )^T ]T. (2)
Due to the quadratic form at the right hand side the mean squared error of (9 n(A n0) is never higher than MSE(<9n0)). The formulas (1) and (2) cannot be used directly because E [•] in their expressions are not known. Applying E [•] instead of E [•] leads to
9 n (A n0 ) = 9no) - E [(9no)-9no) )AT ] E-1 [A nA T ]A n (3)
and
n (A n0))=E [(9no)-9no))2]-E [(9no)-9no) )^T] ^-1[A nA T ] 4(9n0)-9(0) )AT^T. (4)
Quantities E[•] can be plug-in estimators, where the unknown distribution is substituted by its empirical estimator. Further the estimator with E [•] instead of E [•] will be called empirical combined (EC) estimator. Following Efron [3] and Davidson and Hinkley [2] we used a nonparametric bootstrap for estimating the unknown E[•]. See
simulation studies in Section 4. Asymptotic properties of (9 n (An0) and (9 n (A n0) are presented in Section 3.
2. Asymptotic properties
Large sample properties of 9(ns) (s = 1,...,S) are usually known, which allow describing asymptotic results for the combined estimator and its empirical version. Theorem 1. Let E|9(J)(9Tj) < +ro (i, j = 0,S) and as n ^ro
1. sequences of positive real numbers ani ^ ro,
2. =ani (-9T)) )i, where E (n ) = 0, E (n n j ) = CTj <ro, and 9(0) =9,
3. aTd an0 = km ^ ki ^ 1.
Then, In =an0 ((9 n (An0 )-9) I with E (|n E (0 = 0 and
MSE(|n) ^ MSE(|) = CT00 -1\k,+ CT001li =1,...,S x
HIWjCTj + kCT0i + kjCT0j +CT00 +Tj||-1 =1,^,S \kjCTj0 +CT00||T =w ,
where Tj. = limn^ro a20AT!)ATj) .
Proof. Since MSE ((9n(An0 ))< MSE (9T0)) by construction, the condition 2 leads to
E (In H 0.
MSE(In) = a„20E[(9(n0)-9)2] -an20E[(9(n0)-9)aT]E- [AnAT]E[(9(n0)-9)aT]T =
= E ( )-E [(n0an0 An ] E_1 [an0 A nan0 A Tn ] E [nn0 an0 An ]T. (5)
Taking into consideration
a T(i) = a (9(i)-9(0)) = a (9(i)-9(i)) + a (9(0)-9(0)) + a (9) -9(0)) =
an0 An an0 y9n 9n ) cini ani \9n 9n } + an0 \9n 9n / + an0 \9n 9n }
= kTi (ni + (n0 +an0 AT° > and denoting kn = knS )T , Kn = diag (kn ) , (t = (rln1,...,nnS )T the mean squared
error can be rewritten as
MSE (|n ) = E (nT0 ) - E [nn0 (nnn + nn0 + an0 An )] X E-1 [(nnn +nn0 + an0An )((n(n +nn0 + an0An ^^
E[(n0 (n(t + (n0 + an0 An )f =
= E (n20 )-E [(t0 (n (t + (n0 + an0 A n )]
E_1 \{Kn(t + (n0 + an0An )((n(t + (n0 + an0An ^1
X E[(n0 ((n + (n0 + an0 An )f =
= E(nT0 ) -|lkniE[(n0(ni ] + E1^0 +(n0an0AT)
IIkn,knjE(nn,%) + kmE(in„0) + knjE(n„0%) + E() + «20A<«!)A<n')I[. 1=1
X
j=1,-,5'
X WknjE [^0% ] + E\n2n0 +nn0an0 A!j) ]||
X
Then, if the inverse exists for every n
MSE (|n ) —' CT00 -|lkiCT0i +CT00l[=1 S x
n——ro ’ ’
x IlkikjCTij + ki CT0i + kj CT0j +^00 +..=1,...,S X ||kj j0 +CT00|TT=1 S,
where Tij = limn—ro an20An°ATj) . Q.E.D.
A note on the inverse. The t. depends on rates of convergence of A(n!) and ATj) and their limits.
1. If A(n!) converges to a const ^ 0 (9T!) is not a consistent estimator of 9) and al0A(T) does not converge to zero then ti. = -ro or t. = +ro .
2. If A(T) converges to a const ^ 0 and ATj) = 0 (unbiased estimator Vn ) then
Tj = 0.
3. If convergence rates of A(n!) and A(nj) to zero are faster than an0 each then ti. = 0 .
4. In some cases when rates of convergence are the same (for example, aT. =4n , Vj ) Tij- is a different from zero constant.
Hence, some Ti. may be infinite. However, on a practical side, for every finite n the ro is never reached. In the expression for MSE(|n) the matrix with elements
kniknjE(nTi(nj ) + kniE((»(n0 ) + knjE(nn0(nj ) + E(nT0 ) + an20A2')AnT)
should be inverted. Denote this matrix as BT = \\bi. . The inverse of BT always
n II ij lli, j=1,...,s n J
exists but not necessarily unique, which is usually a result of linearly dependent rows (and columns) of Bn . This comes from linear dependence among some 9T0, i = 1,..., s .
If det(Bn) = 0 then the use of the Moore-Penrose generalized inverse resolves the multiplicity problem.
Another problem comes when Bn is of high dimensionality. Then, a large sample size is needed for estimating Bn . A possible solution is to use only those principle components which correspond to eigenvalues above some cutoff, say 10% of the sum of eigenvalues.
Theorem 2. If conditions of Theorem 1 hold then the use of (9 n(A n0) instead of 9 n(An0) does not change its asymptotic properties.
Proof. The optimal parameter
An0 = \kmE [nn0(ni ]+ E [n20 +(n0 an0 ^ ][. =1,.,S X X I\kniknjE ((ni (nj ) + kn.E (nrann0 ) + knjE (nn0(nj ) +E ( ) + an20 A T ) A T ) 11,.,1=1,.,S (6)
is a continuous function of E(nninTj) and E(nni) Vi, j, because An0 can be represented as a ratio of two polynomials of E(nninTj) and E(nn0). The multiplicity of in-
verses is resolved through the Moore-Penrose generalized inverse, which is unique. Similarly, 9n(An0) is also a continuous function of <JniJ = E (nninnj) and
Pn0 = E (nno).
From delta method
s d0 (A* )
9 n(An0) = 0 n(A„0 ) + I d n0 (Onj - Gnij ) +
i, j=0 da nij
+ d0d(An0) (An0 -^n0) = 0n(An0 ) + Op (n0 ), ()
dPn0
where A„0 is located between A n0 and An0. Thus,
an0 (9 n(A n0) - 0)) = an0 (9 n(An0) - 0 n(An0 )) + an0 (0 n(An0 ) - 0)) =
= Op(1) + a„0 (0n(An0) -0)) , (8)
which assures that asymptotic distributions of 0 n(A n0) and 0 n(An0) are the same. Q.E.D.
3. Simulation studies
3.1. Combining the Kaplan-Meier estimator with the censored exponential likelihood estimator of a survival function
d
Consider a right censored sample T1,...,Tn, where T = min(Xi,Ci), Xi = X ~ FX,
d
Ci = C ~ FC , X is independent of C. This sample is accompanied by 8,- = I (T < Ci). If
the exponential family is assumed for X then it depends on a single parameter a. The censored data likelihood is
i(A) = n [exp(-AT )]8 [exp(-AT )]1-8' =11 exp(-M-T' ^8' •
i=1 i=1
The maximum of Z(|a)is reached at p. = ^I”=j 8;- ”=1 T ^ leading to
S n1(t) = exp(-pt) an estimate of S (t), which is consistent if the data actually came
from an exponential distribution. Otherwise, the Kaplan-Meier (KM) estimator [9], Sn0(t), can be used. Our objective is to estimate the survival curve by combing Sn0(t)
and S n1(t) .
At every time point KM estimator is asymptotically normal. For finite sample sizes some normalizing transformations may be needed, see Klein et al. [10] for details. The combined estimator can be used with the transformed estimators of the survival in a similar manner.
The combined estimator 0 n (An0) becomes
S n (t) = S n0(t) - (E [ n0(t)A n ] - Sn0 (t )A„ )a nE [^n]-1,
where An = Sn0(t)-Sn1(t) and Sn(t) is the combined estimator of S(t). Estimating
(E[Sn0(t)An]-Sn0(t)AI) with cov(Sn0(t),An) and E[A2] via var (An) + An we con" struct the EC estimator
and
Sn (t) = Sn0(t) - C0V (Sn0 (t)> Ân) Ân (var (Ân) + (Ân)2 ) 1
MSE n(t)) = var (sn0(t)) - COv2 (n0 )> Ân) (var (Ân) + Â2)
Simulation settings. To assess performance of S (t ) we consider two scenarios: (1) Xj,...,Xn ~ exp(-t) (standard exponential) and (2) X1,...,Xn ~ exp(-t2) (Weibull with the scale parameter equal to 1 and its shape is set to 2). In each case, censoring follows exponential distribution with the rate of 0,75. For estimating the unknown quantities in A0 we used 100 bootstrap samples. Single experiment estimators of the survival curve under different sample sizes (30 and 300) and different distributional assumptions (Exponential and Weibull) are presented on Figure 1. In order to assess MSEs of the estimators 10,000 simulations were performed in each of two Monte-Carlo experiments (exponential and Weibull) for both sample sizes. MSEs from these Monte-Carlo simulations are plotted on Figure 1.
Exp(1), n = 30
Exp(1), n = 300
Time Weib(2,1), n = 30
Time Weib(2,1), n = 300
Time
Time
Fig. 1. Four experiments with different sample sizes and distribution. The dotted line is the parametric EXP(1) estimator. The thin solid line is the Kaplan-Meier estimator. The thick solid line is the combined estimator
The behavior of the KM estimator is bounded between 0 and 1 at a fixed sample size and time point is not necessarily normal, which means that it may take a large sample to be able to rely on normal approximation.
Figure 2 shows dynamics of the mean squared error for Sn0 (t), Sn1 (t), and Sn (t). The data were drawn from the standard exponential distribution (correct model is used
for building Sn (t)) and Weibull (incorrect model assumption is used for Sn (t)). Not surprisingly, the MSE of the Sn1 (t) is always smaller then the MSEs of the other two estimators, for the first two pictures. On the other hand the Kaplan-Meier produces the highest MSE among the estimators (this is the price we pay for not using parametric assumptions). The MSE of Sn (t) is located between the other two estimators and its advantage against the KM estimator is clearly seen till t = 2.5.
1000 simulations, n = 30, the data are drown from Exp(1)
0.015 H
0.010 -
1000 simulations, n = 300, the data are drown from Exp(1)
0.005 ■
0.015 -
0.010 - w K S 0.005 -
I 0 - /
t---r
0 12 3
Time
1000 simulations, n = 30, the data are drown from Weib(2,1)
l---r
0 12 3
Time
1000 simulations, n = 300, the data are drown from Weib(2,1)
Time
Time
Fig. 2. Monte-Carlo MSEs. The dotted line is the Monte-Carlo MSE of the parametric EXP(1) estimator. The thin solid line is the Monte-Carlo MSE of the Kaplan-Meier estimator. The thick solid line is the Monte-Carlo MSE of the combined estimator
The last two pictures calculate Sn (t) with a wrong parametric assumption at n = 30 and n = 300. The standard exponential assumption is violated, the data are coming from a Weibull distribution. In the ranges where the MSE of the fitted standard exponential is much higher than the MSE of the KM estimator, the MSE of Sn (t) is close to
the MSE of the KM estimator. In the place where the exponential survival crosses the KM survival curve the MSE of the Sn (t) is slightly smaller than the KM MSE: at this time point a wrong parametric assumption leads to an unbiased estimation of the survival probability (weibull and standard exponential survival curves cross). To the left
(t e (0.3,0.8)) and to the right (t e (1.1,1.6)) of the crossing area, the MSE of Sn (t) is slightly higher than the MSE of the KM estimator. This slight increase in MSE does not violate Theorem 2, the MSE increase comes from the variability associated with A n0 estimation. At the same time, moving further away from the crossing point this MSE difference goes to zero. In all these situations (except probably the crossing point area)
the use of Sn (t) is preferable to the use of Sn1 (t).
3.2. Combining regression parameter estimators from proportional hazards and exponential models
Cox proportional hazards model allows estimating log hazards ratios (regression coefficients of the model) under a nonparametric baseline hazard. However, if a parametric form of the baseline hazard is known then a Cox model is not a most efficient model for estimating log hazards ratios. For example, if the baseline hazard is constant then Cox proportional hazard model can be safely substituted by censored data exponential regression. Censored data Weibull regression can be used with the Weibull baseline hazard. If the proportional hazard assumption is violated for one or several covariates, the stratified on these covariates Cox proportional hazards model can be used. All these models (stratified Cox, Cox, Weibull, and exponential regressions) adjust for confounding effects of the same variables and the interpretation of regression parameters continue being the same. The difference between models is solely incorporated in the baseline hazard assumptions. The least restricted of these four is the stratified Cox model, which is built on stratum specific nonparametric baseline hazards. If we assume that the hazard in all strata is the same, the Cox model can be used. Further, assigning Weibull hazard to the baseline only two baseline hazard parameters are to be estimated (shape and scale). Setting the scale parameter equal to one the Weibull hazard becomes constant.
In this section we present a Monte-Carlo study with 10,000 repetitions. Cox model regression parameters will be improved under the constant baseline hazard guess.
Simulation settings. We generate N independent multidimentional observations
(Y, A, B, C, D, E, F), where A ~ Bern(0.5), B ~ Bern(0.5), C ~Bern(0.5), D ~ Bern(0.5),
E ~ N(0,1), F ~ N(0,1) and Y ~ Exp(PaA + PbB + PcC + PdD + PeE + PfF), P = (PA ,PB ,PC ,PD ,PE ,PF ) = (-1,0,1,0.5,0.2,-0.5). Actual sample sizes (N) used with different simulation settings are presented in captions for Tables 1, 2, 3, and 4.
Monte-Carlo Experiment 1: Constant baseline hazard, six predictors, constant baseline hazard. Results of 10,000 Monte-Carlo experiments are given in Table 1. This table shows that MSEs of exponential regression parameter estimates are up to 25% smaller than MSEs of Cox model regression parameter estimates. This is not surprising since the baseline hazard is constant in our experiment and a censored data exponential regression is a valid alternative to the Cox model. MSEs of parameter estimates of the
EC estimator are up to 10% smaller than MSEs of Cox model regression parameter estimates.
Table 1
Regression parameter estimates under an exponential baseline, P = (-1,0,1,0.5,0.2,-0.5) , the sample size N = 75
Monte Carlo means ß ^ ß * ß c ß ß ß E ß F
and root MSEs (RMSE) (RMSE) (RMSE) (RMSE) (RMSE) (RMSE)
Cox -1.0768 0.0119 1.0680 0.5412 0.2244 -0.5407
(0.4232) (0.3755) (0.4062) (0.3824) (0.1872) (0.2085)
Exponential -1.0208 0.0124 1.0159 0.5109 0.2119 -0.5101
(0.3497) (0.3096) (0.2987) (0.2988) (0.1716) (0.5373)
EC -1.0618 0.0185 1.0690 0.5457 0.2222 -0.5279
(0.3994) (0.3503) (0.3687) (0.3526) (0.1850) (0.2009)
In order to estimate the unknown expectations we use nonparametric bootstrap. After this bootstrap based estimation the EC estimator is not optimal in terms of the smallest MSE. Moreover, the higher dimensionality of An the more expectations should be estimated. To balance the dimensionality of An and the amount of noise associated with its estimation we set to zero all eigenvalues contributing less than 10% from the sum of all eigenvalues. Thus, only several (less or equal than six, often two) principal components are used in the estimating procedure.
Monte-Carlo Experiment 2: constant baseline hazard, six predictors, an incorrect number of parameters for the exponential model (assumed Pc = PD =PE =PF = 0). Results of 10,000 Monte-Carlo experiments are given in Table 1. From this table we see that a wrong assumption shows a minor influence on the EC estimator. Moreover, since maximum likelihood estimators do not provide the smallest MSE and we may occasionally see a better MSE for the EC estimator.
Table 2
Regression parameter estimates under an exponential baseline, P = (-1,0,1,0.5,0.2,-0.5) , N = 75 . We use an incorrect assumption (PC =Pfl =PE =PF =0) for the exponential model
Monte Carlo means and root MSEs ß ^ ß * ß c ß ß ß E ß F
(RMSE) (RMSE) (RMSE) (RMSE) (RMSE) (RMSE)
Cox -1.1056 -0.0055 1.0930 0.5521 0.2224 -0.5431
(0.4227) (0.3734) (0.4118) (0.3787) (0.1983) (0.2117)
Exponential -0.5101 0.4995 0 0 0
(0.6020) (0.5992) (1.0000) (0.5000) (0.2000) (0.5000)
EC -1.0628 0.0178 1.0168 0.5148 0.2145 -0.5250
(0.4059) (0.3629) (0.3889) (0.3538) (0.1906) (0.2018)
Tables 1 and 2 show only a minor improvement associated with the use of the EC estimator.
Monte-Carlo Experiments 3 and 4: constant baseline, 2-predictor case. Consider a simpler case with P = (PA ,PB) = (-1,1) and N = 40 . Table 3 presents results of Experiment 3: a correct guess, constant baseline hazard. Table 4 shows the results of the Experiment 4: an incorrect guess, we correctly assumed constant baseline hazard but we also assumed PB = 0. We observe a higher decrease of MSE comparing with the 6-predictor case (Tables 1 and 2).
In Tables 3 and 4 the MSE of the EC estimator is not larger than the MSE of Cox regression parameter estimators. Moreover, we observe an interesting effect when partially incorrect model assumptions may actually make MSE of the EC estimator smaller comparing to the Cox model.
Table 3
Regression parameter estimates under different model assumptions (Cox, exponential, or combined estimators), P = (-1,1) , N = 40 .
In this model we use a correct guess that the baseline hazard is constant
Monte Carlo means ß ^ ß *
and root MSEs (RMSE) (RMSE)
Cox -1.0933 1.0420
(0.5543) (0.5364)
Exponential -1.0368 1.0168
(0.4317) (0.3399)
EC -1.0353 1.0011
(0.4830) (0.4061)
Table 4
Regression parameter estimates under an exponential baseline and P = (-1,1) , N = 40 . We guessed that the baseline hazard is constant (correct) and Ps = 0 (wrong)
Monte Carlo means ß ^ ß *
and root MSEs (RMSE) (RMSE)
Cox -1.0700 1.0976
(0.6093) (0.5920)
Exponential -0.4745 0
(0.6842) (1)
EC -0.8662 0.6820
(0.4930) (0.5008)
Monte-Carlo Experiment 5: the same as Table 1 but with N = 500. Results of Experiment 5 are presented in Table 5, where we can see results similar to our previous experiments findings. Slight improvement of MSE are seen for the EC estimator, except for only one case: the MSE of p became a little bit higher. This is a result of either a
simulation error or An0 estimation.
Table 5
Regression parameter estimates under constant baseline hazard, P = (-1,0,1,0.5,0.2,-0.5), N = 500.
Monte Carlo means ß ^ ß * ß c ß ß ß E ß F
and root MSEs (RMSE) (RMSE) (RMSE) (RMSE) (RMSE) (RMSE)
Cox -1.0180 -0.0030 1.0089 0.5057 0.2001 -0.5063
(0.1334) (0.1265) (0.1293) (0.1224) (0.0623) (0.0706)
Exponential -0.0122 0.0032 1.0041 0.5035 0.1987 -0.5032
(0.1204) (0.1097) (0.1089) (0.1037) (0.0615) (0.0655)
EC -1.0164 0.0037 1.0087 0.5061 0.2003 -0.5063
(0.1291) (0.1204) (0.1209) (0.1149) (0.0624) (0.0694)
Conclusion
In this manuscript we suggest an estimating procedure combining a consistent estimator of a population parameter with one or several others possibly inconsistent estimators. The combined estimator provides the smallest MSE among all possible linear combinations between the base and the other estimators. If assumptions used for constructing one or more of the non-base estimators are correct than we expect that the combined estimator will have a smaller MSE than the MSE of the base estimator. If there is no correlation between the base and non-base estimators, the combined estimator is equal to the base estimator. The combined estimator depends on unknown second moments. Their estimation is preformed via nonparametric bootstrap leading to the empirical combined (EC) estimator. This estimator uses the first two moments of the estimators incorporated in its structure.
The EC estimator is consistent and can be used for improving efficiency of non-parametric estimators in the presence of a possibly more efficient parametric estimator. If the parametric estimator is calculated under an incorrect model assumption, its limiting risk is not higher than the limiting risk of the original nonparametric estimator. A simulation example for improving efficiency of the Kaplan-Meier estimator with a parametric model guess illustrates the use of the EC estimator.
Our approach allows to perform multimodel inference in the presence of model mis-specification. Comparing to Hjort and Claeskens [7] model averaging approach, we do not restrict our estimating procedure to variable selection in a family of parametric models. Simulation studies show how a Cox regression parameter estimates can be combined with a parametric model regression estimates.
REFERENCES
1. Burnham P.B., Anderson D.R. Model selection and multimodel inference. Springer, 2002.
2. Davidson A.C. & Binkley D.V. Bootstrap methods and thier applications. Cambridge: Cambridge University Press, 1997.
3. Efron B. Censored data and the bootstrap // J. Amer. Statist. Ass. 1981. V. 76. P. 312 - 319.
4. Fan J., Li R. Variable selection via penalized likelihood // J. Amer. Statist. 2001. Ass. 96. P. 1348 - 1360.
5. Fraiman R., Justel A., Svarc M. Selection of variables for cluster analysis and classification rules // J. Amer. Statist. 2008. Ass. 103. P. 1294 - 1303.
6. Hoeting J.A., Madigan D., Raftery A.E., Volinsky C.T. Bayesian model averaging: a tutorial // Statistical Science. 1999. V. 19. P. 382 - 417.
7. HjortN.L., Claeskens G. Frequentist Model Average Estimators // J. Amer. Statist. Ass. 2003. V. 98. P. 879 - 899.
8. Radchenko P., James G.M. Variable inclusion and shrinkage algorithm // J. Amer. Statist. Ass. 2008. V.103. P. 1304 - 1315.
9. Kaplan E.L., Meier P. Nonparametric estimator from incomplete observations // J. Amer. Statist. Ass. 1958. V. 53. P. 457 - 481.
10. Klein J.P., Logan B., Harhoff M., Andersen P.K. Analyzing survival curves at a fixed point in time // Statistics in Medicine. 2007. V. 26. P. 4505 - 4519.
11. Klein J.P., MoshenbergM.L. Survival Analysis. Springer, 2003.
Sergey S. Tarima
Division of Biostatistics Department of Population Health Medical College of Wisconsin Milwaukee, WI, 53226, USA E-mail: [email protected]
Yuriy G. Dmitriev
Informatics Problems Department of Siberian Branch
of Russian Academy of Sciences (Tomsk) and Tomsk State University
E-mail: [email protected]
Поступила в редакцию 16 апреля 2009 г.