ВЕСТНИК ТОМСКОГО ГОСУДАРСТВЕННОГО УНИВЕРСИТЕТА
2009 Математика и механика № 3(7)
УДК 519.2
V. Konev, S. Pergamenshchikov
NON-PARAMETRIC ESTIMATION IN A SEMIMARTINGALE REGRESSION MODEL.
PART 1. ORACLE INEQUALITIES1
This paper considers the problem of estimating a periodic function in a continuous time regression model with a general square integrable semimartingale noise. A model selection adaptive procedure is proposed. Sharp non-asymptotic oracle inequalities have been derived.
Keywords: Non-asymptotic estimation; Non-parametric regression; Model selection; Sharp oracle inequality; Semimartingale noise.
AMS 2000 Subject Classifications: Primary: 62G08; Secondary: 62G05
1. Introduction
Consider a regression model in continuous time
dyt = S(t)dt + dt,t, 0 < t < n, (1)
where S is an unknown 1 -periodic R ^ R function, S e L2[0,n]; (^t)t>0 is a square
integrable unobservable semimartingale noise such that for any function f from
L2 [0, n] the stochastic integral
en
In if) = J0 fs d ^ (2)
is well defined with
EIn (f) = 0 and ElП (f) < a* £ f ds , (3)
where a* is some positive constant.
An important example of the disturbance (^t )t >0 is the following process
wt + 9izt, (4)
where and g2 are unknown constants, | | + | g2 |> 0, (wt )t>0 is a standard
Brownian motion, (zt )t >0 is a compound Poisson process defined as
N
Z = I Yj , (5)
j=1
where (Nt )t>0 is a standard homogeneous Poisson process with unknown intensity X> 0 and (Yj) j>j is an i.i.d. sequence of random variables with
EYj = 0 and EYj = 1. (6)
1 The paper is supported by the RFFI - Grant 09-01-00172-a.
Let (Tk )k a1 denote the arrival times of the process (Nt )t >0, that is,
Tk = inf{{ > 0: Nt = k}. (7)
As is shown in Lemma A.2, the condition (3) holds for the noise (4) with
a* = ^X .
The problem is to estimate the unknown function S in the model (1) on the basis of observations (y )0<,<„ .
This problem enables one to solve that of functional statistics which is stated as follows. Let observations (xk )0<k<n be a segment of a sequence of independent identically distributed random processes xf = (xf )0<t S1 specified on the interval (0,1), which obey the stochastic differential equations
dxtk = S(t)dt + d, x0 = x0 , 0 < t < 1, (8)
where )j<k <„ is an i.i.d sequence of random processes )0<t S1 with the same
distribution as the process (4). The problem is to estimate the unknown function f (t) e L2[0,1] on the basis of observations ()1sk<n . This model can be reduced to (1), (4) in the following way. Let y = (yt )0<t >n denote the process defined as :
f x, if 0 < t < 1;
y H k
[yk+ xt_k+1 - x0, if k -1 < t < k, 2 < k < n.
This process satisfies the stochastic differential equation
dyt = S (t )dt + d 11,
where S (t) = S ({t}) and
- %, if 0 < t < 1;
^ +tt-k+1- if k-1 <t<k- 2<kn;
{t} = t - [t] is the fractional part of number t.
In this paper we will consider the estimation problem for the regression model (1) in L2 [0,1] with the quality of an estimate S being measured by the mean integrated squared error (MISE)
R(S, S) := Es (S- S )2, (9)
where ES stands for the expectation with respect to the distribution PS of the process (1) given S;
ii/ii2 := £ f2 (t )dt.
It is natural to treat this problem from the standpoint of the model selection approach. The origin of this method goes back to early seventies with the pioneering papers by Akaike [1] and Mallows [16] who proposed to introduce penalizing in a log-likelihood type criterion. The further progress has been made by Barron, Birge and Massart [2, 17] who developed a non-asymptotic model selection method which enabled one to derive non-asymptotic oracle inequalities for a gaussian non-parametric
regression model with the i.i.d. disturbance. An oracle inequality yields the upper bound for the estimate risk via the minimal risk corresponding to a chosen family of estimates. Galtchouk and Pergamenshchikov [6] developed the Barron - Birge - Massart technique treating the problem of estimating a non-parametric drift function in a diffusion process from the standpoint of sequential analysis. Fourdrinier and Pergamenshchikov [5] extended the Barron - Birge - Massart method to the models with dependent observations and, in contrast to all above-mentioned papers on the model selection method, where the estimation procedures were based on the least squares estimates, they proposed to use an arbitrary family of projective estimates in an adaptive estimation procedure, and they discovered that one can employ the improved least square estimates to increase the estimation quality. Konev and Pergamenshchikov [14] applied this improved model selection method to the non-parametric estimation problem of a periodic function in a model with a coloured noise in continuous time having unknown spectral characteristics. In all cited papers the non-asymptotic oracle inequalities have been derived which enable one to establish the optimal convergence rate for the minimax risks. Moreover, in the latter paper the oracle inequalities have been found for the robust risks.
In addition to the optimal convergence rate, an important problem is that of the efficiency of adaptive estimation procedures. In order to examine the efficiency property one has to obtain the oracle inequalities in which the principal term has the factor close to unity.
The first result in this direction is most likely due to Kneip [13] who obtained, for a gaussian regression model, the oracle inequality with the factor close to unity at the principal term. The oracle inequalities of this type were obtained as well in [3] and in [4] for the inverse problems. It will be observed that the derivation of oracle inequalities in all these papers rests upon the fact that by applying the Fourier transformation one can reduce the initial model to the statistical gaussian model with independent observations. Such a transform is possible only for gaussian models with independent homogeneous observations or for the inhomogeneous ones with the known correlation characteristics. This restriction significantly narrows the area of application of such estimation procedures and rules out a broad class of models including, in particular, widely used in econometrics heteroscedastic regression models (see, for example, [12]). For constructing adaptive procedures in the case of inhomogeneous observations one needs to amend the approach to the estimation problem. Galtchouk and Pergamenshchikov [7, 8] have developed a new estimation method intended for the heteroscedastic regression models. The heart of this method is to combine the Barron-Birge-Massart non-asymptotic penalization method [2] and the Pinsker weighted least square method minimizing the asymptotic risk (see, for example, [18, 19]). Combining of these approaches results in the significant improvement of the estimation quality (see numerical example in [7]). As was shown in [8] and [9], the Galthouk -Pergamenshchikov procedure is efficient with respect to the robust minimax risk, i.e. the minimax risk with the additional supremum operation over the whole family of addmissible model distributions. In the sequel [10, 11], this approach has been applied to the problem of a drift estimation in a diffusion process. In this paper we apply this procedure to the estimation of a regression function S in a semimartingale regression model (1). The rest of the paper is organized as follows. In Section 2 we construct the model selection procedure on the basis of weighted least squares estimates and state the main results in the form of oracle inequalities for the quadratic risks. Section 3 gives the proofs of all theorems. In Appendix some technical results are established.
2. Model selection
This Section gives the construction of a model selection procedure for estimating a function S in (1) on the basis of weighted least square estimates and states the main results.
For estimating the unknown function S in model (1), we apply its Fourier expansion in the trigonometric basis j) 7>j in L2 [0,1] defined as
< = 1, <j (x) = ^2 Trj (2n[j/2\x), j > 2, (10)
where the function Trj (x) = cos(x) for even j and Trj (x) = sin(x) for odd j ; [x]
denotes the integer part of x . The corresponding Fourier coefficients
(*1
9j = (S,* j) = JQ S(t) * j (t) dt (11)
can be estimated as
1 pn
6 j,n = n Ji ♦ j ) ddy ■ (12)
In view of (1), we obtain
0jn = 0j +~r^j’« ’ =~TIn^, ()
v« v«
where In is given in (2).
For any sequence x = (Xj)Jkl, we set
I x I2 = X XJ and x) = Z 1{|xj|>0} • (14)
j=1 j=1
Now we impose the additional conditions on the noise (£t )t>0.
C) There exists some positive constant a > 0 such that the sequence
Sj,n = Ej -ct, j > 1, for any n > 1, satisfies the following inequality
ci*(«) = sup | Bl n(x) |<® ,
xeH ,# (x)< n
where H = [-1,1]* and
B1,n (x) = Z xj ? j,n • (15)
j=1
C2) Assume that for all n > 1
c2 (n) = sup E B2 n (x) <<»,
|x|<1 , # ( x)<n
where
B2,n (x) = I Xj I with | j „ = j - E j . (16)
j=1
As is stated in Theorem 2, Conditions C\) and C2) hold for the process (4). Further we introduce a class of weighted least squares estimates for S(t) defined as
Sy = Z Y(j)0j>j ’ (17)
j=1
where y = (y( j)) 7>j is a sequence of weight coefficients such that
0 <y(j) < 1 and 0 < #(y) < n. (18)
Let r denote a finite set of weight sequences y = (y(j))7>j with these properties, v = card(r) be its cardinal number and
^ = max # (y). (19)
yer
The model selection procedure for the unknown function S in (1) will be constructed on the basis of estimates (S y)yer . The choice of a specific set of weight
sequences r will be discussed at the end of this section. In order to find a proper
weight sequence y in the set r one needs to specify a cost function. When choosing an appropriate cost function one can use the following argument. The empirical squared error
Err„ (y) = (S?Y - S)2
can be written as
ro ^ ^
Err„ (Y) = Z Y2 (j)j - 2 Z YUWjn + Z 0? • (20)
j=1 j=i j=1
Since the Fourier coefficients (0j)j>j are unknown, the weight coefficients (yj)j>j
can not be determined by minimizing this quantity. To circumvent this difficulty one needs to replace the terms § . n 0 j by some their estimators § j n . We set
0 J,n = j-iIn, (21)
J n
where ^n is an estimator for the quantity a in condition C1).
For this change in the empirical squared error, one has to pay some penalty. Thus, one comes to the cost function of the form
Jn (Y) = Z Y2 ^./)^2 n - 2 z y(J) 0j,n + p Pn(Y), ^2)
j=1 j=l
where p is some positive constant, P (y) is the penalty term defined as
2. | v |2
P „(Y) = 2^-. (23)
n
In the case when the value of a in C1) is known, one can put ^n = and
ct| y I2
Pn (y) = -LLL- (24)
n
Substituting the weight coefficients, minimizing the cost function, that is
Y = argminyer (y) , (25)
in (17) leads to the model selection procedure
S* = Sy ■ (26)
It will be noted that y exists, since r is a finite set. If the minimizing sequence in
(25) y is not unique, one can take any minimizer.
Theorem 1. Assume that the conditions C1) and C2) hold with o> 0 . Then for
any n > 1 and 0 < p < 1/3 , the estimator (26) satisfies the oracle inequality
R(SS)<1 + 3P~minR(Sy,S) + -B„*(p), (27)
1 - 3p yer n
where the risk R(-, S) is defined in (9),
6^ Es 19 n-o\
1 - 3p
and V„(p) = 2°°Vt 4°c'(n) + 2vc2 (n). (28)
CTp(l- 3p)
Now we check conditions Ct) and C2) for the model (1) with the noise (4) to arrive at the following result.
Theorem 2. Suppose that the coefficients and g2 in model (1), (4), are such that
a + el > 0 and EYj < <x. Then the estimation procedure (26), for any n > 1 and
0 < p < 1/3 , satisfies the oracle inequality (27) with
a = a* = gj2 + Xq2 , cj* (n) = 0, and sup c2 (n) < 4a* (ct* + £2EY^).
U> 1
The proofs of Theorems 1, 2 are given in Section 3.
Corollary 3. Let the conditions of Theorem 1 hold and the quantity a in C\) be known. Then, for any n > 1 and 0 < p < 1/3 , the estimator (26) satisfies the oracle inequality
R(S*,S)<1 + 3P~2p2 minR(Sy,S) + -W„(p),
1 - 3p y^r n
where ¥n (p) is given in (28).
1. Estimation of a
Now we consider the case of unknown quantity a in the condition C). One can estimate a as
n
n = Z §2j,n with 1 = ^] + !• (29)
j
Proposition 4. Suppose that the conditions of Theorem 1 hold and the unknown function S(t) is continuously differentiable for 0 < t < 1 such that
|S| ! = JjS(t)| dt <+». (30)
Then, for any n > 1,
K„ (S)
Vn
Es |~„-a|<-^-, (31)
where Kn(S) = 41 S' I2 +CT + V(n) + 41 S I1/4 + Clijt ■
n n
The proof of Proposition 4 is given in Section 3. Theorem 1 and Proposition 4 imply the following result.
Theorem 5. Suppose that the conditions of Theorem 1 hold and S satisfies the conditions of Proposition 4. Then, for any n > 1 and 0 < p < 1/3 , the estimate (26) satisfies the oracle inequality
R(S.,S)<1 + 3p~ 2p2 minR(Sy,S) + -Dn (p), (32)
1 - 3p y^r n
where Dn (p) = 2 (p) + 2p(1 -p}^K"(S}.
(1 - 3p)V n
2. Specification of weights in the selection procedure (26)
Now we will specify the weight coefficients (y( j)) 7>j in a way proposed in [7] for a heteroscedastic discrete time regression model. Consider a numerical grid of the form
An = {I,-, k *} x {tm } , where tt = is and m = [1/s2 ]. We assume that both parameters k* > 1 and 0 < s < 1 are functions of n, i.e. k* = k* (n) and s = s(n), such that
lim k* (n) = +», lim k* (n) ln n = 0,
lim s(n) = 0 and lim nss(n) = +»
(33)
for any 8 > 0 . One can take, for example,
s (n) =----1----- and k * (n) = <Jin(n + Y)
ln(n +1)
for n > 1.
For each a = (P, t) e An , we introduce the weight sequence
Ya = ( Ya (J)) j> 1
given as Ya (J) = !{1< j< j0 } + I1 - (Ja )P ) !{ j0 < j<ma } , (34)
where jo = jo (a) = [®a/ln n] ,
■»„ = <st t»),wl and = (P + 1)2(2bP + 1) .
np P
We set
r = {Y a-ae A }• (35)
It will be noted that in this case v = k*m .
Remark 1. It will be observed that the specific form of weights (34) was proposed by Pinsker [19] for the filtration problem with known smoothness of regression function observed with an additive gaussian white noise in the continuous time. Nussbaum [18] used these weights for the gaussian regression estimation problem in discrete time.
The minimal mean square risk, called the Pinsker constant, is provided by the weight least squares estimate with the weights where the index a depends on the smoothness order of the function S . In this case the smoothness order is unknown and, instead of one estimate, one has to use a whole family of estimates containing in particular the optimal one.
The problem is to study the properties of the whole class of estimates. Below we derive an oracle inequality for this class which yields the best mean square risk up to a multiplicative and additive constants provided that the the smoothness of the unknown function S is not available. Moreover, it will be shown that the multiplicative constant tends to unity and the additive one vanishes as n ^ w with the rate higher than any minimax rate.
In view of the assumptions (33), for any 8 > 0, one has
v
lim —— = 0.
n^-o> n°
Moreover, by (34) for any a e An
Z 1{Ya(j)>0} ““a j=1
Therefore, taking into account that A - A < 1 for P ^ 1, we get Therefore, for any 8 > 0 ,
lim-^- = 0.
n
Applying this limiting relation to the analysis of the asymptotic behavior of the additive term Dn (p) in (32) one comes to the following result.
Theorem 6. Suppose that the conditions of Theorem 1 hold and S e -^[0,1]. Then, for any n > 1 and 0 < p < 1/3 , the estimate (26) with the weight coefficients (35) satisfies the oracle inequality (32) with the additive term Dn (p) obeying, for any 8 > 0 , the following limiting relation
lim = 0.
n°
3. Proofs
1. Proof of Theorem 1 Substituting (22) in (20) yields for any y e r
Err„(y) = Jn(y) + 2^ y(j)Q'j,n+ || S ||2 -pPm(y), (36)
j=1
1 1- 1 CT-CT
where 0' jn = 0 j.„-0j 0 =-J= 0j j + - $ .,n + - j + —^
and the sequences (gjn)j>j and (|).a1 are defined in conditions C\) and C2).
Denoting
ro i ro
L(Y) = Z Y (j), M (Y) = ^ Z Y( j')0 j $ j,n , (37)
j=1 vn j=1
and taking into account the definition of the "true" penalty term in (24), we rewrite (36) as
Err„ (Y) = Jn (Y) + 2 L( Y) + 2 M (y) + - Bln (y) +
n n
,--------B2 n (e(Y)) 2
+^VPn(y) 2’^ + II s ||2 -PPn(Y), (38)
Van
where e(y) = y/1 y |, the functions B1n and B2n are defined in (15) and (16).
Let y0 = (y0 (j))j>\ be a fixed sequence in r and y be as in (25). Substituting y0
and y in the equation (38), we consider the difference
Err„ (Y)- Errn (Yo) = J (Y)- J (Y o ) + 2 L( x) + - Bln (x) + 2M (x) +
n n
I o I D /T, \ B2,n (e) 0 nTVTT^ B2,n (e0) _ „C: \
+2 VPn (y) I----------- 2V Pn (Y0 ) I-- PPn(Y) + PPn(Y0 )’
van van
where x = y - Yo, e = e(y) and e0 = e(y0). Note that by (19)
|L( X )| < | L(y )| + |L(y )| < 2^.
Therefore, by making use of the condition C\) and taking into account that the cost function J attains its minimum at y , one comes to the inequality
Err„ (Y)- Err„ (yq ) < 4^ + 2Cl + 2 M (x) + n n
+2 VPS -pPn( Y) + ppn(Y 0) - • (39)
van van
Applying the elementary inequality
2 | ab |< sa2 + s-1b2 (40)
with s = p implies the estimate
2 A7T7-VI B2n(e(Y)) I < „ ( , + Bl,n (e(Y))
2V Pn(Y)------4=-------< PPn (Y) + —-----------•
Van nap
We recall that 0 < p < 1. Therefore, from here and (39), it follows that
2B2,n , 2c* (n)
ErTn (Y) < EiTn(Yo ) + 2M(x) + + r^L +
Г0 )
nap n
1 i +—| a-a| n
(l Y I2 + 1 Yо |2 +4H) + 2pP„ (Yо),
where B’2,n = supyEr Bf>n (e(y)). In view of (19), one has
sup IyI2<Й-
уеГ
Thus, one gets
Errn (Y) < Errn(Yo ) + 2M(x) + —^ + 2Cl (n) +
nap n
+^|~ „ -a|+2pP„ (Y 0). (41)
n
In view of Condition C2), one has
Es B*2,n < X Es B{n(e(Y))<vc*(n), (42)
уеГ
where v = card(r).
Now we examine the first term in the right-hand side of (39). Substituting (13) in (37) and taking into account (3), one obtains that for any non-random sequence x = (x( j))7>j with #(x) < да
1 да 1
EsM2(x) <a*-X x2(j)92 -|| Sx ||2, (43)
n j=1 n
where Sx = ^ j=1 x( j)0jфj . Let denote
* nM 2 (x)
Z = sup-----------—,
xeri || Sx ||
where Г = Г - у0. In view of (43), this quantity can be estimated as
* x—^ nEv M2 (x) * *
ES Z - Z „2 - Z =CT V- (44)
XG^ II Sx II XG^
Further, by making use of the inequality (40) with s = p || Sx ||, one gets
Z *
2|M (x) |< p || Sx ||2 +—. (45)
np
Note that, for any x e r ,
I I Sx | | 2 - | | Sx I I 2 = I x2 {№) -0 2 ) < -2M (x),
j=1
where
Since | x( j) |< 1 for any x e r , one gets
ES M2 (x) <ct*
(46)
Denoting
one has
* nM{ (x)
Z = sup------------L—~,
x.r1 || Sx ||2
ES Z* < a*v.
By the same argument as in (45), one derives
(47)
Z *
2| Mi(x) |<p || Sx ||2 + .
np
From here and (46), one finds the upper bound for || Sx ||, i.e.
I ISxl | 2 <
I Isxi I 2+ z; .
i-p np(i-p)'
Using this bound in (45) gives
2M(x) <
pIISXW2 . Z* + z*
i-p np(i-p;>
Setting x = x in this inequality and taking into account that
we obtain
II s;ll2 = S - sY0 II2 ^ 2 (Err« (Y) + Err« (Yg )),
2M (x) < 2P(Errn (y)+Errn (y o )) + _Z * + Z*
1 -p
«p(i -p;>
From here and (41), it follows that
Err„ (Y) ^ 1+- Err„ (Yo) + ^j1 P 1 - 3p n(1 - 3p)
b*
ap
Z * + Z* 2p(1 -p)
- + c*(n) + 3|a | a-a |
np(1 - 3p) 1 - 3p
Pn (Y 0).
(48)
n
Taking the expectation yields
R( S *, S) < R( S Y0, S) + ■ 2(1 -p)
1 - 3p
n(1 - 3P)
vc* (n) ap
+ c* (n) + 3^ ES | ct — ct|
2a v 2p(1 - p)
+-------------+ P (Yo).
np(1 - 3p) 1 - 3p
Using the upper bound for Pn (y0) in Lemma A.1, one obtains
R(S*, S) <1 + 3p- 2^ R(Sy0,S) + -B„* (p),
1 - 3p n
where B^ (p) is defined in (27).
Since this inequality holds for each y0 e r, this completes the proof of Theorem 1.
2. Proof of Theorem 2 We have to verify Conditions C\) and C2) for the process (4).
Condition C) holds with q (n) = 0 . This follows from Lemma A.2 if one puts f = g = ^j , j -1. Now we check Condition C2). By the Ito formula and Lemma A.2, one gets
d/2 (f) = 2It_ (f)dlt (f) + a2f2 (t)dt + @2 X f2 (s)(Azs )2
0<s <t
and
Therefore, putting we obtain
E /2 (f) = a* f 2 (t)dt.
/1( f) = It2 (f) - E/2 (f),
d'jt( f) = £2 / (t) dmt + 27t- (/)f(t)dkt, 70(/) = 0 and ^ (Azs)2 -Xt.
0< s <t
Now we set
(49)
j=1
where x = (Xj)j>j with #(x) < n and | x |< 1. This process obeys the equation
djt(x) = qI dmt + 2Zt- (x)d^, J0(x) = 0,
where (x) = Z xj $(t) and Ct(x) = Z xj (^ j № j(t) •
j>i j>i
Now we show that
E
rn _ _
J0 11-(x)dIt(x) = 0 •
(50)
Indeed, note that
J0 11-(x)d1t(x) = & Z Xj {J 1t-(^ j (x)dmt +
j>1
+2Z xj J0 71-(i- )Zt -(x)d ^ •
j >1
Therefore, Lemma A.4 directly implies
E J” 7 tj (x)dmt=z xi e J” 7t2- j ^2 (t )dmt
/>1
-Z X/E J” (E It2- (* j ))(t)dmt =0.
/>1
Moreover, we note that
F 7t-(*j )Zt- (x)d%t = Z X JT 7t-(*j )1t- (fo) (t) dSt
l >1
and £ 71-(^j)7t- (<h) ^(t)dSt = £ /2- j)1 - (<h) (t)dSt-
- m (I2 ^ ))/t- ^(t) d St.
From Lemma A.5, it follows
E J0 11-(<fr j)7t- (f) f(t)dSt = 0
and we come to (50). Furthermore, by the Ito formula one obtains — 2 — —
I„(x) = 2 J0 I1-(x)dIt(x) + 4£2 J0 z2 (x)<* + ^24 Z ®2k (x) Yk4 l[Tk <n} +
k=1
+4£l2Z Z2k-(x)Y2 1{3k<«} +4^23Z ®2k (x)ZTk-(x) Yk1{Tk<n} • k=1 k=1
By Lemma A.3 one has E (ZTk _ | Tk) = 0 . Therefore, taking into account (50), we
calculate
E12„(x) = 4ft2E Z2 (x)dt + ft4 E Dl n (x) + 4^2D2>„ (x), (51)
<» ^
where D1,n (x) = Z E(x)1{?i <n} and D2,n (x) = Z E- (x)Vk <n} •
k=1 k=1
By applying Lemma A.2, one has
E c2 (x)dt=Z xi xj I” ^(t m j(t)E 7t (i) 7t j)dt =
ij
=z xi xj I” i(t)^ j(t) (^i(s)^ j(s)ds)dt=
=“2 z xi xj (i” m j(t )dt) - «2 • (52)
Further it is easy to check that
D1,n = X JJ ®2 (x)dt = X f Z Xj ^ (t)
I j>1
Therefore, taking into account that #(x) < n and | x |< 1, we estimate D1n by applying the Causchy - Schwarts - Bounyakovskii inequality
Di <4Xn
Z Xj j>i
< 4Xn#(x) < 4Xn
(53)
Finally, we write down the process (x) as
Ct(x) = Jo Qx(t-s)dSs with Qx (t,s) = Z xj^j(s)ij(t) •
j> 1
k-l
By putting D 2,n = E Z !{rk <n} Z Qx2 (Tk > ■Tl)
k=2 l=l
and applying Lemma A.3 we obtain
D2,n = ft Z E j) (Tk ,S)ds1{Tk <n} + 02 D 2,n =
k=1
= ^12 Jo Qx (t-s)dsdt + ft2 D 2,n •
Moreover, one can rewrite the second term in the last equality as
<» 1»
D2,n = ^ E V, <n} ^ Qx (Tk , Tl )1{Tk <n} =
/=1 k=l+1
=X2 Jo (Jo S ((s + z’s)dz)=
= X2 Jn (J0t Qx2 (t, s)ds) dt.
Thus, D2,n ^ (Xft + X2& ) JJ ( £ Ql (t> s)ds ) dt =
= (Xa2 + x 2fe2 )n2 = Xa*n2. (54)
The equation (51) and the inequalities (52), (53) imply the validity of condition C2) for the process (4). Hence Theorem 2.
3. Proof of Proposition 4 Substituting (13) in (29) yields
j=
Further, denoting
x j — 1{/< j<n}
we represent the last term in (55) as
n "■) n 1 n
S n = Z e? + _/= Z j + - Z %
j=/ V- j=/ - j=/
= 1/.
and x" j = -,=1 Vn
1{/< j<n} -
2
1 r2 1 n / t \ 1 r» / ff \ n — l +1
Z ,n ~~ l,n (X ) +~^ B2,n (x ) + CT>
n j=! n \ n
j=
where the functions B1n ( ) and B2n ( ) are defined in conditions C\) and C2). Combining these equations leads to the inequality
-v2
2 n
Es\o n-a\< Z 9? +~r Es \Z 9 j % j,n\ + j>i Vn j=i
+-\Bi,n (x') \ +-1 E \ B? (x")| +— a. n y/n n
By Lemma A.6 and conditions Cx), C2), one gets
2)
n
E s|S n-s|< 2 e; + Es\± 0,5 j,n |+ ^ ^ + -f-
j>l sn j=i n yIn \ln
In view of the inequality (3), the last term can be estimated as
n I n ,- 2
EsiZ0jSj,n I ^>*£0j <Va*|S.
Hence Proposition 4.
4. Appendix
A.1. Property of the penalty term (24)
Lemma A.1. Assume that the condition CJ holds with o> 0 . Then for any n > 1 and y e r ,
Pn (y) < Es Err„ (y) + ^.
n
Proof. By the definition of ErrM (y) one has
Err„ (Y) = ZI (y(J) -1)0j +J(J)~^Sj j=1 V Vn
In view of the condition C) this leads to the desired result
Es Err, (y) >1 £ Y2 (j) E %,n = Pn (y) - ^. n j=1 «
A.2. Properties of the process (4)
Lemma A.2. Let f and g be any non-random functions from L2 [0, n] and (It (f ))t>o be the process defined by (4). Then, for any 0 < t < n,
EIt (f)I (g) = a* f (s)g(s)ds ,
where a* = gj2 + .
This Lemma is a direct consequence of Ito’s formula as well as the following result.
Lemma A.3. Let Q be a bounded [0,®)R function measurable with respect to B[0, +<») ® Gk , where
Gk =a{Tlt...,Tk} with some k > 2. (A.1)
Then
E (Irt _ (Q)\Gk) = 0
T k—1
and E ((_ (Q) | Gk) = ) )k Q2 (s)ds + &2 £ Q (T).
l= 1
Now we will study stochastic cadlag processes n = (n )o<t<n of the form
nt = Z u)1{rl<t<T+1} - (A.2)
1=0
where u0 (t) is a function measurable with respect to a{ws, s < t} and the coefficient U (t), l > 1, is a function measurable with respect to
a{ws,s < t, Y,..., Y,T,...,T,}.
Now we show the following result.
Lemma A.4. Let n = (nt)0<t<n be a stochastic non-negative process given by (A.2), such that
m
E Jo nu du«».
Then
m
E J0 - dmu = 0 ,
where the process m = (mt) is defined in (49).
Proof. Note that the stochastic integral, with respect to the martingale (49), can be written as
J0 n- dmu = Z n- (Azu )2 -x j0 n du =
0 <u<n
Z2 m
- Yk l{Tk <n} -X J0 nudu •
k=1
Therefore, taking into account the representation (A.2), we obtain
en
JQ nu _ dmu =Yl -XT 2, (A.3)
where = Z uk-i (Tk ~) Yk \rk <n] and Y2 = J0 nudu.
k=1
Recalling that EYj2 = 1 and uk > 0 , we calculate
E= £ E Uk- (T -)1{:
L{Tk <«} • k=1
Moreover, the functions (uk) are cadlag processes, therefore the Lebesgue measure of the set {t e R+ : uk (t-) ^ uk (t)} equals zero. Thus,
E uk-1 (Tk )!{rk <n} E !{rk _J<«} J0 uk-1 (Tk-1 +u) e du-
f«-T
Jo
This implies
r«-T
Jo
EYj = E \{Ti Sn} JJ- 1 u, (2} + u) e^“ dw. (A.4)
}=0
Similarly we obtain
ET2 = Z E\t, <n} JT U1 (t)!{t<T,+1} dt =
1=0 1
= Z E 1{2JSn} ' u (T + u) e-_u du. (A.5)
1=0
Substituting (A.4) and (A.5) in (A.3) implies the assertion of Lemma A.4.
Lemma A.5. Assume that E Yj4 < <» . Then, for any measurable bounded non-
random functions f and g, one has
E |" It- (f) I - (g) g (t) d %t = 0.
I"
Jo
Proof. First we note that
J"
Jo
E JO It2- (/)/,_ (g) g (t) dzt = E X ^ - (f) Itj - (g) g (Tj )1{rj S"} E Yj = 0.
j>i
Therefore, to prove this lemma one has to show that
E r 7t2^7t(g^ g') dw = 0' (A.6)
To this end we represent the stochastic integral It (f) as
I (f) = It" (f) + 02 I/ (f) ,
where ^ (f) = Jo f dws and /f (/) = £ f dzs.
Note that
E | Itz (f) |4 < M4 E Y14 E Nn2 = M4 E Y14 (Xn + X2 n2) <«, where M = sup (| f (t) | + | g(t) |).
0 <t <n
Therefore, taking into account that the processes (wt) and (zt) are independent, we
get
E f It4(f)(Itw(g))2 g(t)dt <«,
i.e. E Jjn It2 (f)Itw(g) g(t) dwt = 0.
fn Jo
Similarly, we obtain
en
E J[ (1^ (f))2 If (g) g(t) dwt = 0 and E £ It (f) If (f)If (g) g(t) dwt = 0.
Therefore, to show (A.6) one has to check that
en
E J0 n = 0> (A.7)
where nt = (z (f) )) (g) g(t) •
Taking into account that the processes (nt) and (wt) are independent, we get
Jo nt dwt ^ E^j £ nt dt E suP K I-
E
0<t <n
Here, the last term can be estimated as
(N V
E sup | n ^ M4I ZI Yj
0<t<n j
<M4 E I Y I3 ENn <<».
m
Hence the stochastic integral Jq ntdwt is an integrable random variable and
E J” nt dwt = EE (J” nt dwt In >0 ^^ n) = 0 •
Thus we obtain the equality (A.7) which implies (A.6). Hence Lemma A.5.
A.3. Property of the Fourier coefficients
Lemma A.6. Suppose that the function S in (1) is differentiable and satisfies the condition (30). Then the Fourier coefficients (11) satisfy the inequality
sup IX e? < 4| S1.
l> 2 j=l
Proof. In view of (10), one has
1 f1 A
1 •
02p =—-j^— I S{t)sm(2npt)dt n
2np
1 f1 •
02p+1 =~T — Jo S(t)(cos(2npt)- V)dt -v2np 0
and
V2 fi • 2
=---------I S(t)sin (npt)dt, p > 1.
np Jo
From here, it follows that, for any j > 2
e? ^4lS l?. j
Taking into account that
sup / X -1 < 2,
l>2 j>l j2
we arrive at the desired result.
Acknowledgments
This research has been executed in the framework of the State Contract № 02.740.11.5026.
REFERENCES
1. Akaike H. A new look at the statistical model identification // IEEE Trans. on Automatic Control. 1974. P. 716 - 723.
2. Barron A., Birge L., and Massart P. Risk bounds for model selection via penalization // Probab. Theory Relat. Fields. 1999. P. 301 - 415.
3. Cao Y. and Golubev Y. On oracle inequaliies related to a polynomial fitting // Math. Meth. Stat. 2005. No. 4. P. 431 - 450.
4. Cavalier L., Golubev G.K., Picard D. and Tsybakov A. Oracle inequalities for inverse problems // Ann. Statist. 2002. P. 843 - 874.
5. Fourdrinier D. andPergamenshchikov S. Improved selection model method for the regression with dependent noise // Ann. Institute Statist. Math. 2007. No. 3. P. 435 - 464.
6. Galtchouk L. and Pergamenshchikov S. Non-parametric sequential estimation of the drift in diffusion processes // Math. Meth. Stat. 2004. No. 1. P. 25 - 49.
7. Galtchouk L. and Pergamenshchikov S. Sharp non-asymptotic oracle inequalities for non-parametric heteroscedastic regression models // J. Non-parametric Stat. 2009. V. 21. No. 1. P. 1 - 16.
8. Galtchouk L. and Pergamenshchikov S. Adaptive asymptotically efficient estimation in heteroscedastic non-parametric regression // J. Korean Statist. Soc. 2009. URL: http://ees. elsivier.com/jkss
9. Galtchouk L. and Pergamenshchikov S. Adaptive asymptotically efficient estimation in heteroscedastic non-parametric regression via model selection. 2009. URL: http://hal. archives-ouvertes.fr/hal-00326910/fr/
10. Galtchouk L. and Pergamenshchikov S. Adaptive sequential estimation for ergodic diffusion processes in quadratic metric. Part 1. Sharp non-asymptotic oracle inequalities // Prepublication 2007/06, IRMA, Universite Louis Pasteur de Strasbourg, 2007.
11. Galtchouk L. and Pergamenshchikov S. Adaptive sequential estimation for ergodic diffusion processes in quadratic metric. Part 2. Asymptotic efficiency // Prepublication 2007/07, IRMA, Universite Louis Pasteur de Strasbourg, 2007.
12. Goldfeld S.M. and Quandt R.E. Nonlinear Methods in Econometrics. North-Holland, London, 1972.
13. Kneip A. Ordered linear smoothers // Ann. Stat. 1994. P. 835 - 866.
14. Konev V.V. and Pergamenshchikov S.M. General model selection estimation of a periodic
regression with a Gaussian noise // Ann. Institute Statist. Math. 2008. URL: http://dx.doi.org/ 10.1007/s10463-008-0193-1
15. Jacod J. andShiryaev A.N. Limit theorems for stochastic processes. V. 1. N.Y.: Springer, 1987.
16. Mallows C. Some comments on Cp // Technometrics. 1973. P. 661 - 675.
17. Massart P. A non-asymptotic theory for model selection // ECM Stockholm. 2004. P. 309 -
323.
18. Nussbaum M. Spline smoothing in regression models and asymptotic efficiency in L2 // Ann. Statist. 1985. P. 984 - 997.
19. Pinsker M.S. Optimal filtration of square integrable signals in gaussian white noise // Probl. Transimis. Inform. 1981. P. 120 - 133.
СВЕДЕНИЯ ОБ АВТОРАХ:
Konev Victor, Department of Applied Mathematics and Cybernetics, Tomsk State University,
Lenin str. 36, 634050 Tomsk, Russia. E-mail: [email protected]
Pergamenshchikov Serguei, Laboratoire de Math'ematiques Raphael Salem, Avenue de
l’Universit'e, BP. 12, Universit'e de Rouen, F76801, Saint Etienne du Rouvray, Cedex France
and Department of Mathematics and Mechanics,Tomsk State University, Lenin str. 36, 634041
Tomsk, Russia. E-mail: [email protected]
^атья принята в печать 26.08.2009 г.