A Pareto II Model With Inliers at Zero and One Based on TYPE-II Censored Samples

Bavagosai Pratima; K. Muralidharan

A Pareto II Model With Inliers at Zero and One Based on

TYPE-II Censored Samples

Bavagosai Pratima1 and K. Muralidharan2 •

Department of Statistics, Faculty of Science The Maharaja Sayajirao University of Baroda, Vadodara 390002, India Email: [email protected], [email protected]

Abstract

Inliers (instantaneous or early failures) are natural occurrences of a life test, where some of the items fail immediately or within a short time of the life test due to mechanical failure, inferior quality or faulty construction of items and components. The inconsistency of such life data is modeled using a nonstandard mixture of distributions; where degeneracy can happen at discrete points at zero and one. In this paper, parameters estimation based on Type-II censored sample from a Pareto type II distribution with a discrete mass at zero and one is study. The Maximum Likelihood Estimators (MLE) are developed for estimating the unknown parameters. The Fisher information matrix, as well as the asymptotic variance-covariance matrix of the MLEs, are derived. Uniformly Minimum Variance Unbiased Estimate (UMVUE) of model parameters as well as UMVUE of the density function, reliability function, and some other parametric function are obtained along with the standard error of estimators. The model is implemented on various real data sets and compared with Weibull inliers model.

Keywords: early failures; failure time distribution; infant mortality rate; inliers; instantaneous failures; type-II censored sample.

I. Introduction

There are a plethora of examples of phenomena concerning nature, life and human activities where the real data do not conform to the standard distributions. In such cases, we either use mixtures of standard distributions of similar types or non-standard mixtures of degenerate distribution and a standard distribution, which may be again a discrete or continuous one. Since inliers are inconsistent observations, which are generally the results of instantaneous and early failures, modeling with inliers involve non-standard mixtures of distributions. In the former case, the random variable will have a discrete probability mass at the origin (that is life will be zero) and some positive lifetimes, and in the latter case, the failure times may be smaller in relation to other lifetimes. These occurrences may be due to mechanical failure, inferior quality or faulty construction or defective parts of items and components. Such failures usually discard the assumption of a single mode distribution and hence the usual method of modeling and inference procedures may not be accurate in practice. [2] was the first to discuss the inference problem of instantaneous failures in life testing. The author has provided the efficient estimation of parametric functions under various probability models. [13] have introduced the term inliers in connection with the estimation of (p, 8) of early failure model with modified failure time distribution (FTD) being an exponential distribution with mean 8 assuming p known. Later on, many authors have

studied these kinds of models (see [18], [13] and [15]).

There are many practical contexts, where inliers can be natural occurrences of the specific situations involved and degeneracy can happen at two discrete points and a positive distribution for the remaining lifetimes. Some of the situations are as follows:

1. The size of tumor lesions is of interest to treat Hematologic malignancy patients. The measurement effect is zero who have lesions absent (or due to disappearance of tumor during treatment), though who have lesions present at baseline that are measurable but do not meet the definitions of measurable disease may be considered as measurement 1, otherwise lesions can be accurately measured as longest diameter to be recorded in at least one dimension by chest x-ray, with CT scan or with calipers by clinical examination. Similarly, in studies like Bone lesions, leptomeningeal disease, ascites, pleural/pericardial effusions, lymphangitis cutis/pulmonitis, inflammatory breast disease, and abdominal masses, either the effect is absent or present but not followed by CT or MRI, are considered as non-measurable otherwise accurately measurable on a continuous scale.

2. In the mass production of technological components of hardware, intended to function over a period of time, some components may fail on installation and therefore have zero life lengths, some component that does not fail on installation but fails with negligible life (may be coded as one for simplicity), and others that will have a life length a positive random variable whose distribution may take different forms.

3. In a clinical trial laboratory, a particular drug is designed and given to certain species of hens so that the new chicks have a weight greater than usual. The possible weight of chicks may be modeled as a continuous distribution, with discrete mass at 'zero' and 'one', where zero measures those chicks having no gain of weight, and one measures those chicks with negligible gain of weight than usual, and the remaining chicks having weight gain in some continuous measurement.

4. The rainfall measurement at a place recorded during a season is modeled as a continuous distribution, with a discrete mass at 'zero' where zero measures those days having no rainfall, and at 'one', one measures those days with no rain but humid and cloudy conditions, and a continuous variable having some positive amount of rain.

5. In the studies of genetic birth defects, children can be characterized by three variables: first, a discrete variable to indicate whether a child is affected and born dead; second, a child is affected and has a neonatal death; and third, a continuous variable measuring the survival time of affected children born alive. We may consider this as a nonstandard mixture of the mass point at "zero" (for children born dead), at "one" (for children born and neonatal death), and a nontrivial continuous distribution for other surviving children. Similarly, one can contemplate many such examples in practical situations involving

degeneracy at two or more points and positive configurations of observations. Authors [16] and [17], have modeled the above situation using exponential distribution and Weibull distribution respectively. In this article, we model the inliers situation using the type-II censored lifetime data from a Pareto II distribution. As per the scheme, if n units are placed on the test and the experiment is terminated after a prefixed number of failures say, c < n, then the observed failure times are X(1),X(2),...,X(C) where X(C) < X(n). The remaining n — c items are regarded as censored data. The family of the Pareto distribution is well known in the literature for its capability in modeling the heavy-tailed distributions. The Pareto Type II distribution (also called Lomax distribution with location parameter zero) has the probability distribution function (pdf)

f(x,a)=1-+^, x>0,p>0,9>0 (1)

where a = (J3,8), p > 0 is a scale parameter and 8 > 0 is a shape parameter. The Pareto distribution has been used in connection with studies of income, property values, insurance risk, migration, size of cities and firms, word frequencies, business mortality, service time in queuing

systems, etc. The paper by [1] contains a detailed list of important areas where heavy-tailed distributions are found applicable. There are also recent applications of the Pareto distribution in data sets on earthquakes, forest fire areas, fault lengths on Earth and Venus, and on oil and gas fields sizes, see [22] for details.

The presentation of the paper is as follows: The model description is given in Section II. In Section III, we derive the MLE of the unknown parameters along with the interval estimation of parameters. The UMVU estimation of model parameters and various parametric functions are given in Section IV. For illustration, we consider four real datasets for implementing the proposed model in Section V.

II. Model description

If 0 and 1 are natural occurrence of a life test as described above with other positive observations, then the distribution function of such a inliers model can be written as:

H(x;p1,p2,a) =

0, Pi.

Pi + P2.

x <0 0 <x <1 x = l

(2)

Pi + P2 + (1-Pi-P2)F(X;x>i

The fact is that the probability measure generated by H(.) is composed of three measures, say H2, and n3, where n3 is absolutely continuous with respect to the Lebesgue measure on R and and ^2 are singular with respect to the Lebesgue measure on R. The corresponding likelihood function of the model is

h(x;pi,p2 ,a) =

(1-pi-p2)

f(*; s)

i - F(i; a).

x = 0 x = 1

x>1

(3)

where p1 and p2 are the proportion of 0 and 1 observations respectively. For p =1, the Pareto Type II inliers distribution has the likelihood function

(p-t, x = 0

h(x;Pi,P2,e) = \P2. s 2 e X = 1

S1-Pi-P2) — {—) , X>1

(4)

The parameter estimates are obtained in the next section.

III. The Maximum Likelihood Estimation of 6 = (p1, p2,0)

Suppose n items placed on life test, where r1 items have life zero where as r2 items have life 1 and remaining n — r1 — r2 items have life greater than 1, is denoted by X1,X2, ...,Xn-ri-r2. By applying the technique of 'Type-II censored sample', the experiment terminates after prefixed number of failures n — r1 — r2—c out of n — r1 — r2 items, where, n — r1 — r2 — c < n — r1 — r2. Clearly, if n — r1 — r2 — c = n — r1 — r2, then the experiment is not terminated and all n — r1 — r2 lifetimes are observed. Let n — r1—r2 — c* = min(n — r1 — r2—c, n — r-t — r2) and X(1),X(2),... , X(n-ri-r2-c*) denote ordered observed failure time of these n — r1 — r2 — c* items from h £ K as given in (4).

Then the likelihood equation can be written as

L&e) = nUKxc.e)

If we define

(1, x = 0

. 0, otherwise

and

(1, x = 1

I l(x) = {1

h(x) {0, o th erwise

Then the likelihood equation can be written as

= PS'P2H1—Pl—P2)("-^(n — l—r2)'8^-C- n -1

C ! II 1 + x(t)

T-H —r^ — C

n.-r-1 —r

log(l+^1)] + c*[logl

(5)

where ^ = Y,i=1l1(x(q) and r2 = Ti=1l2(x(q), denotes the number of zero and one observations respectively. We now investigate the following four possible cases of likelihood estimates:

Case (i). r2 = 0, that is ^ = n. The likelihood function simply reduces to L(xc,6) = pi. Obviously, this is maximum when p1 = 1. This corresponds to the maximum likelihood estimator p1 . Since L(x;8) = Pi is free from the other parameters, the maximum likelihood estimator of other parameters do not exist.

Case (ii). ^ = 0, that is r2 = n. The likelihood function simply reduces to L(x;8) = Pi. Obviously, this is maximum when p2 = 1. This corresponds to the maximum likelihood estimator p2 . Since L(x;8) = Pi is free from the other parameters, the maximum likelihood estimator of other parameters do not exist.

—2

Case (iii). r-t < n, r2 < n but r1 + r2 = n . The likelihood function simply reduces to L(xc,6) = pT1p2 Here p1+p2 < n. Then the likelihood function L(x_;9) < (-1) So p1 = and p2 = The

maximum likelihood of other parameters do not exist. Case (iv). r1+r2 < n. The log-likelihood function is given by

logL(x; 8) = T1 logp1 + r2 logp2 +(n — r1 — ^ log(1 — P1— P2) + log(n — r1 — —logct! + (n — r1—r2 — ct)log8— ^ log(1 + X(i))

=1

—8 fer^ [log (^)] + C [log C^r^)]} (6)

The maximum likelihood estimator of parameter 8= (p1,p2,8) is obtained by solving the following likelihood equations:

dlogL(-;e) _ i —2 _ q /y\

dpi Pi 1—P1—P2 (

dlogLQ-;£) _ —2 i —1 —2 _ q /g-,

dp2 P2 1—P1—P2

and

dlogL(—; (f) _ i——1——2—c*

dd = e

-i—r2—C* [log{^)] + c* [log(^-f^)]} = 0 (9)

Solving (7) and (8) simultaneously, we get

Pi = ri (10)

P2=ri (11)

i

From (9), the estimate of 9 is

1+x(n-r1-r2-c*)

2

Q = _' ' i ' 2 L__(12)

Y?}-ri-r2-c [log(:L+x(j))-log2] + c'[log(1+X(n-ri-r2-c,))-log2]

The approximate (1 — a)% confidence interval for p1, p2 and 8 are respectively given by h ± za2 J^, ^2 ± j^ and 8 ± zV2 where p* = 1 — P1 — P2.

IV. Unbiased estimation

Many authors have studied the problem of minimum variance unbiased estimation for different classes of distributions. [23], [12] and [5] have studied the estimation problem for power series distribution, [20] has studied the same for generalized power series distribution, [7] and [5] have studied for modified power series distribution. [19] has studied the UMVUE of parameters for the multivariate modified power series distribution. All these studies include discrete distributions only. [9] has studied the problem of MVU estimation in one parameter exponential family of distributions which includes power series distribution, modified power series distribution and univariate continuous distributions. Further, a characterization property of power series distribution using one and two moments was given by [14]. [8] extended this for the one-parameter exponential family of distribution which includes all earlier cases. [10] have further studied MVU estimation in the multi-parameter exponential family of distributions. Here, we propose the distributional properties of complete sufficient statistic and study UMVU estimation for various parametric functions of the model. The model in (4) can be expressed as

( '-) ((1 + x') (—--1--)

(a(x))(1-ci(x'-c2(x') (13)

where, a(X) = h1(e) =—^--; h2(d) =--; h3(d) = e-e; g(e) --; C1(X) =

l1(X); C2(X) = l2(X) and C3(X) = [log (^ (l — I^X) — h(X'). Also a(X) > 0, Ci(X), i = 1,2 and 3 are nontrivial real- valued statistics, g(d) and hi(6) are at least twice differentiable functions of 9i, ¿=1,2 and 3. Here g(d) = ^1(a(x))(1-ci(x'-c2(x') n33=1(hi(§))Ci(x) dx. The density in (13) so obtained is defined with respect to a measure ^(x) which is the sum of Lebesgue measure over (1,M) a well-known form of a three parameter exponential family with natural parameters

(V1,V2,V3) = (log(j(1:^-^) ,log(e-9)) generated by underlying indexing

parameters e = (pllV2,9). Hence C(X) = (C1 (X), C2 (X), C3 (X)) = (X), I2 (X), [log (^ (l —

I1(X) — I2(X')) is jointly complete sufficient for ^ = (p1,p2,d). The distributional properties of

C(X) = (C1(X),C2(X),C3(X)) are presented in appendix A. We now propose some uniformly minimum variance unbiased estimators for parameters and some parametric function of the model (13) in various subsections below.

I. Uniformly Minimum Variance Unbiased Estimation of parameters

For the Type-II censored sample discussed in the previous section, consider the following transformation

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Y1 = (n — T1 — ^(M1^)]),

and

Yi = (n-r1-r2-i + 1) {[log (!+&)] - [log (t^)]};

i = 2,3,...,n-r1-r2-c* (14)

It can be seen that

^in=:i:1-r2-c"Yi = ^in::i:1-r2-c"[iog(i+l(i1)] + c*[iog(i+X(n_r1_r2_^)]

and

= c*. nU1-2- ji+*w)

( 11—1—2).

(15)

Using (14) and (15),

/ v n_r_rn_Q*

h (y; 9) = p1rip2r2 (1-p1- p2)(n-ri-r2) e(n-ri-1'2-c*) e-e zi=i yi (16)

\B(1-P1-P2)J

where

Z1 = Yi=1C1(Xi) k(Yi)=T1

i i— C *

Z2=^C2(Xd= ^h(Yi) = r2

=1 =1

and

Z3=Yi=iC3(Xi)=Yi^r11—r2—C'Yi

Hence by Neyman Factorization theorem Z = (Z1,Z2,Z3) is jointly sufficient for 8 = (P1>P2>8). Also,

i \ n. w ß(n_r1_r2_Q*) o^n_r1_r2_Q* „

h (y;d) = PiriP2r2(1 -Pi- P2) (n-ri-r2) (9 e-9^-1 y

( Yl r2l (n_Y1_r2)! y

= P(Z1 = V1,Z2 = r2) h(y-,8\Z1 = T1.Z2 = r2) Here distribution of (Z1,Z2) is trinomial and is a complete family of distribution and

h (y;e\zi = ri,Z2 = ^ =--f-nr

9(n_Y1_r2_Q*) e_№H-1 yl

W-1 Y2. (n_Y1_Y2)>./

which belongs to the one-parameter exponential family. Hence Z3\Z1,Z2 is complete sufficient for 8 and also a member of the exponential family. The distribution of Z3\Zi,Z2 is Gamma with parameter (n — ^ — r2 — c*,8) with pdf

(n-r1-r2-c*-1) en-r1-r2-c* e-6z3

h(z3;8\n — r1 — r2—c*) =--e-e-,z3 > 0; 8 > 0

v 3,1 1 2 ' ri—^—^—C* 3

which depends only on 8 and is also a complete family of distribution. Therefore, using result of [11] Z = (Z1, Z2,Z3) is complete sufficient for 8 = (Pi_, p2,8). The Joint distribution of Z = (Z1,Z2, Z3) is

n—q

1

n! z (n-r1-r2-c'-i)

hz(z;&) = --—:-—pirip2r2(1 - pi - p2) (n-ri-r2)—-en-n-r2-c' e-8 Z3

ri! r2! (n-ri-r2)! ^ H2 V ^ H2J Yn-ri-r2-c*

0 <ri,r2 < n — c*; z3 > 0; 0 <pi,p2 <1; 8 > 0

= B(zi,z2,z3,c*,n) nl=f0l (1 — Pi—P2)c'

where

!n! zJn-ri-^-C*-1)

--——-—, z3>0; ri + r2 — 1<n — c*

ri! T2! (n-ri-r2)! rn-ri-V2-C ^7)

1, z3 = 0; ri = 0 or r2 = 0

zL E T(n — c*) C e D. Here z = (zi,z2,z3,c*,n) and B(zi,z2,z3,c*,n) are such that

i-^-pJc* = LiET(n-c*) heTCn-C) fz3ET(n-C) B(Zi, Z2, Z3, C*. n) UU(hi( §_ ^ dZi dZ2 dZ3

Since (Ci(x)) = pi, E(C2(x)) = p2 and E(C3(x)) = (i ^ p2 (see Appendix A for details). Hence, E(Zi) = E(XJ=i Ci(xj)) = TJ-f E (Ii(yj)) = (n — c*) Pi, E(Z2) = E(TU C2(x)) = E (l2(yj)) = (n — c*) P2,

and

E(Z3) = Efö^xj)) = Tni:rr2-C' E(Yi) = (n — ,

which in turn give UMVUE's of pi, p2 and 8 as

7-F (18)

V2=^?= ^ (19)

and

Cn-C*)C1-pl-p2)

s =

Z3

(20)

For variance computation, see Appendix A. Note that, the likelihood estimate and minimum variance unbiased estimate of the parameters coincides everywhere when c*=0.

II. Uniformly Minimum Variance Unbiased Estimation of parametric functions

Let X1,X2, ...,Xn-c* be Type-II censored random sample from (13), then there exists an UMVUE of H( 8 ) if and only if H( 8 )[g(8)] can be expressed in the form

H e )№)Y ' "

(1l — p — p2y = 1 1 1 U(*1,*2,*3,^ ,"■) I l(hi( ^ ))

Thus, the UMVUE of a function H( 8 ) of 8 in h(x\ 8_) is given by

= I I I a(zi,z2,z3,c*,n)n(hi( 8 ))ZidZidZ2 dZ3

JZiET(n-C*) JZ2ET(n-C*) JZ3ET(n-C*)

^(Zi,Z2,Z3,c*,n)=}i 2 3 ' B(Zi,Z2,Z3,c*,n) ± 0

a(Zi,Z2,Z3,c*,n) B(Zi.Z2.Z3

The following results are now obvious.

k • / 1 \ +

Result 1 The UMVUE of U3=i(hi(£)) 1 = ( _ _ ) piki p2k2e-9k3 is given by

. ... i

S(i-Pi-P2V B(zi — ki, Z2 — k2, Z3 — k3, c*,n)

Hkl,k2,k3(Zi,Z2,Z3,C*,n)= *

B(Zi,Z2,Z3,c*,n)

^\(n-r1-r2-c*-1) 2 \i~l (¿S-^)

(Ti)ki (T2)k2 (i-3) (z3-k3)ki+k2

[n-ri-r2 + i]ki + k2[n-ri-r2-C*]ki + k2 ,

where k1 < r1; k2 < r2; k3 < z3; k1 + k2 < n — ^ — r2 — c*; ^ + r2 — 1 < n — c*, and (r)k = r n r—+k

Corollary 1 If k1^0,k2 = 0 and k3 = 0, then UMVUE of (h1(8))k1 = (0il_p1 J'1 is given by

Hk1(z1,Z2,Z3,c*,n)=B(z1—k1'Z2'Z3'C"'i) 1 P1 P2

^ 2 3 y B(Z1,z2,z3,c*,n)

(r!)k1 z3k1

[n-r1-r2 + i]k1[n-r1-r2-C*]k1'

ki <ri; ki<n-ri-r2- c*; ri + r2-1<n-c

Corollary 2 If k1 = 0,k2^0 and k3 = 0, then UMVUE of (h2(8)f2 = is given by

Hk (z1,z2,z3,c*,n)=B(z1'Z2—k2'Z3'C*'i) 1 2

k2\ 1 2 3 J B(z1,Z2,Z3,C*,i)

(r2)k2 z3k2 , ^ , ^ * . ,, . *

= --—2--—, k7 <r2; k7 <n — r, — r2 — c ; r, + r7 — 1<n — c

[i—rl — r2+l]k2[i—rl — r2 — C*]k2' 222 12 1 2

Corollary 3 If k1 = 0,k2 = 0 and k3 ± 0, then UMVUE of (h3(8))k3 = e—ek3 is givenby

B(Zl,Z2,Z3 —k3, c*,n)

Hk(zi,z2,z3,c*,ri) = ^ ^ 2 3 ■ 3

fc3^2,,3,-,n, B(zltz2,z3,c*,n)

i23

= (1-*) , k3<Z3;ri+r2-1<n-c*

Result 2 The UMVUE of the variance of Hk± k2 k3(Z1,Z2,Z3, c*,n), is given by

™r[Hk1,k2,k3(Zl,Z2,Z3,C*,n)] = Hl1k2jk3(Zl,Z2,Z3,C*,n) — H2k1,2k2.2k3(Zl,Z2,Z3,C*,n)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

, u An-r1-r2-C*-1) 2

(r1)k1(—2)k2 (l—k3) (Z3 — k3)k1 + k2

[i—r1—r2 + 1]k1+k2[i—r1—r2—C*]k1+k2

(r1)2k1(—2)2k2 (l—2Zk3f-r1-r2-C*-1)(Z3 — 2k3)2(k1+k2) [i—r1 — r2 + l]2(k1 + k2)[i—r1—r2 — C*]2(k1 + k2) , where 2k1 < ^; 2k2 < r2; 2k3 < z3; 2(k1 + k2) <n — r1 — r2—c*; r1+r2 — 1 <n — c*

Corollary 4 The UMVUE of the variance of Hk (Z1,Z2, Z3,c*, n), is given by

'k1(zi,z2,z3,

var[Hk1 (Zi, Z2, Z3, c*, n)] = Hk1 (Zi, Z2, Z3, c*, n) - H2k1 (zv Z2,Z3, c*, n)

(r1)k1 z3k1

2

(r1)2k1 Z32k1

[n-r1-r2 + i]k1[n-r1-r2-C*]k1 [n-r1-r2 + i]2k1[n-r1-r2-C*]2k1 2ki < ri; 2ki <n-ri-r2-c*; ri+r2-1<n-c

Corollary 5 The UMVUE of the variance of Hk2 (Z1,Z2, Z3,c*, n), is given by

„ *

1k2(zi,z2,z3, '

var[Hk2 (Zi, Z2, Z3, c*, n)] = Hl2 (Zi, Z2, Z3, c*, n) - H2k2 (Zi, Z2,Z3, c*, n)

22

(r2)k2 Z3k2

(r2)2k2 *32k2

[n-r1-r2 + i]k2 [n-r1-r2-C*]k2\ [n-r1-r2 + i]2k2 [n-r1-r2-C,]2k2 ' 2k2 < r2; 2k2 <n-ri-r2-c*;ri+r2-1<n-c*

Corollary 6 The UMVUE of the variance of Hk3 (Z1,Z2, Z3,c*, n), is given by var[Hk3 (Zl, Z2, Z3, c*, n)] = H"3 (Zl, Z2, Z3, c*, n) — H2k3 (zv Z2,Z3, c*, n)

3 ^2(n-r1-r2-C*-i) ^ 2k3\n-r1-r2-C*-i

= (1—k3.)2(i—r1—r2—C —1, — (1 — 2j3)

2k3 < z3; r1 + r2 — 1<n — c*

k k

Result 3 The UMVUE of [g(8_)] = (e(1 J -k*0 asper the model given in (13) is

*

kKi' 2' 3' , J B(Zi,Z2,Z3,c*,n)

_ [n+i]k z3k

„* ^ _ B(Zi,Z2,Z3,c*,ri + k)

■, k < n—r-t —r2 — c*; ri + r2 — 1<ri — c*

[n-ri-r2+i]h [n-ri-r2-C*]k

Result 4 The UMVUE of the variance of Gk(Z1,Z2, Z3,c*, n) is given by var[Gk(Z1,Z2,Z3,n)] = Gl(Z1,Z2,Z3,c*,n) — G2k(Z1,Z2,Z3,c*,n)

[n+1]k z3k f [n+1hk z32k

[n-ri-r2 + i]k [n-ri-T2-C*]kl [n-ri-r2 + i]2k [n—ri—r2 — C*]2k '

I — r\ — r- — /• ■ r\ r- - I c r 1 - ,

Result 5 For fixed x, the UMVUE of the density given in (13) is

™v 1 2 3 ' v J B(zl,z2,z3,c*,n)

_ r 1 \ (rlLh(xL (r2)l2(x) (n-ri-r2L(i-Il(xL-l2(xL-) (n-ri-r2-C*-1)(i-!i{x)-!2{x))

= (1 + j n[zз-[log(1f)],1-1l(X'-12(X''|1-,l(X)-,2(X))

(l — [log(1^)],121,^'-'2,^'')(n-rl-r2-CÍ-lL,Z3 > [log ffl] ;ri+r2—1<n — C

Result 6 The UMVUE of the variance of <px(Z1,Z2, Z3,c*, n) is given by var[{px(z1,z2,z3,c*,n)] = Z1,Z2,Z3,c',n)

—<px(Zi,Z2,Z3,c*,ri) (px(Zi — Ci(x), Z2 — C2(x), Z3 — C3(x),c*,ri — 1)

_ , 2( * ^ ( i \2 Cr1>2IlCx)Cr2)2I2Cx)Cn-r1-r2)2(l-llCx)-l2Cx))Cn-r1-r2-C'-i>2(l-llCx)-l2Cx))

= «^Z^C ,ri — (iJ nCn-1)[z3-2[l0g(1+^)^1-h(X)-12Cx)]2(l-■lCX)-■2CX^)

( n—ri—r2 — C*-i)

— ~1~2~ — ,Z3>2[log(if)]; ri + r2 — 1<ri — c*

Result 7 For a fixed z = (z1,z2,z3,c*,n), the UMVUE of the survival function S(x) = p(X > x), x > 0 is obtained as

f(riLli(xL(r2Li2(xL(n-ri-r2L(i-Ii(xL-I2(xL)in-ri-r2-c*-1L(i-Ii(xL-I2(xL-)-\ n [(n-ri-r2-C*L-(1-Ii(xL-l2(xL)]

(n-ri-r2-C* -1)

IZ, — \log(^)\(l — UX) — L ^V'1-^-'^

(X) = ( n[(n-ri-r2-C*L-(1-ll(xL-l2(xL)] )

(Z3 — [,og(1?)] (1 — ll<,)-,2<S))ih,*+h,*) (l^g^1^-^

Z3>[log(1f)]'r1 + r2 — 1<n — c*

Result 8 For the fixed z = (z1, z2,z3, c*, n), the UMVUE of the var(S(x)), is obtained as

n\ (n-r1-r2-C*-1)

fCrlhIlCx)Cr2hI2Cx)Cn—r1—r2)2(l-IlCx)-I2Cx)) Cn—r1—r2 — C — 1*'>2(l-Il(x)-I2Cx))\ ( [Cn—rl—r2 — C*) — 2(l—llCx)—l2Cx))}[Cn—rl—r2—C* + i) — 2(l—llCx)—l2Cx))} )

(Z3—2[log(if)](1 — Ii(x)—l2(x)) ,

Z3 > 2 [log^] ;ri+r2 — 1<ri — c*

III. Real data illustration

*

In this section, we have considered four inliers prone data set to illustrate our proposed work. The motivation behind considering a different variety of data sets is to show the flexibility of the proposed model in different situations. The detailed description regarding the data sets is given below:

Dataset 1: The data in Table 1 shows the loss ratios (yearly data) for earthquake insurance in

California from 1971 through 1993. The data are taken from [6] and also used by [4] for their study. Note that, for four years there was no loss for earthquake insurance and the information where loss of less than 1 billion dollars per year is considered as 1, for simplicity. The analysis of this data is carried out at the end of this section.

Table 1. California earthquake insurance data

Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982

Loss ratios 17.4 0.0 0.6 3.4 0.0 0.0 0.7 1.5 2.2 9.2 0.9 0.0

Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993

Loss ratios 2.9 5.0 1.3 9.3 22.8 11.5 129.8 47.0 17.2 12.8 3.2

Dataset 2: The National Family Health Survey (NFHS) is a large-scale, multi-round survey conducted in a representative sample of households throughout India. The First National Family Health Survey (NFHS-1) was conducted in 1992-93, the Second National Family Health Survey (NFHS-2) was conducted in 1998-99 and the Third National Family Health Survey (NFHS-3) was carried out in 2005-06. The survey is based on a sample of households that is representative at the national and state levels. The NFHS-3 fieldwork, conducted by 18 research organizations between December 2005 and August 2006, interviewed women at age 15-49. We consider the data on child's age at death from the woman's questionnaire of NFHS-3. For comprehensive data, one may visit [24]. For Gujarat state, there are 15 stillbirths (the death of a baby before or during the birth after 28 weeks of gestation) considered as observation 0, 37 neonatal deaths (the death of a baby within the first 28 days of life) considered as observation 1 and other observations of age at death in days as: 30, 30, 30, 31, 31, 60, 62, 62, 62, 90, 90, 90, 92, 93, 150, 182, 213, 242, 272, 273, 300, 303, 333, 334, 335, 356, 360, 365, 366, 450, 730, 731, 732, 732 and 1462. This is a perfect data for inliers model with two discrete point at zero and one. Authors of this paper had already modeled this data using exponential and Weibull distribution. The analysis based on Pareto Type II distribution is presented below.

Dataset 3: [23] have analyzed and quantified forest burnt area in India using AWiFS data for the year 2014. The burnt area map from AWiFS data involves Forest type map of 2013 at 56 m resolution prepared as part of the national carbon project. India has a geographical area of about 3,287,263 sq. km. It comprises 29 states and 7 union territories. The country has 21% of the geographical area under forest cover. Forest fires occur in India mainly between January and June. They are more frequent between February and May in different biogeographic zones of India. State/Union Territory-wise analysis of the percentage of forest burnt area (area in sq. km) is available in [23], page 1531. We consider State/Union Territory burnt area from February to May 2014. There are six State/Union Territory (Delhi, Andaman and Nicobar, Chandigarh, Daman and Diu, Lakshadweep and Pondicherry) having burnt area zero, five State/Union Territory (Goa(0.04) , Jammu and Kashmir (0.11), Dadra and Nagar Haveli (0.23), Punjab (0.85) and Himachal Pradesh (0.91)) having percentage burnt area less than 1 sq. Km. conveniently considered here as observation 1, and the remaining 25 State/Union Territory burnt area in sq. Km. are: 6611.86, 102.70, 941.11, 1773.22, 4606.69, 487.81, 1.84, 2587.40, 1920.35, 82.01, 3342.66, 5066.66, 1974.23, 457.50, 421.03, 975.79, 8186.46, 364.17, 2.50, 4275.64, 2955.23, 739.00, 459.07, 42.01 and 386.37. The analysis is reported below.

Dataset 4: This data is about the amount of snowfall in all 50 states of US. According to the National Climatic Data Center, the data were populated considering the average snowfall for almost three decades from 1981 to 2010, available at [25]. The average amount snowfall per year (in inches) for 50 states of US are: 5.2, 0.5, 1.6, 74.5, 0.3, 0.0, 19.1, 40.5, 20.2, 0.0, 0.7, 0.0, 19.2, 24.6, 25.9, 34.9, 14.7, 12.5, 0.0, 61.8, 20.2, 43.8, 51.1, 54, 0.9, 17, 38.1, 25.9, 21.8, 60.8, 16.5, 9.6, 123.8, 7.6, 51.2, 27.5, 7.8,3, 28.2, 33.8, 43.9, 6.3, 1.5, 56.2, 81.2, 10.3, 5.0, 62.0, 50.9 and 91.4. It is observed that there

are three decades having an average amount of snowfall zero and for four states having decades average amount of snowfall less than 1 inches (coded as observation1).

For all the data sets above we have calculated parameter estimates, goodness-of-fit criteria values, goodness-of-fit statistics and corresponding P-values (see Table 2 for details) for positive observations only. It may be noted from the table that for all the considered data sets, the Pareto Distribution fits well (see P-values).

Table 2. The parameter estimates, goodness-of-fit criteria and corresponding p-value for various datasets

(Pareto distribution).

Data MLE (SE) AIC BIC K-S (p-value) CVM (p-value) AD (p-value)

Earthquak e insurance ß = 19.5743 (19.2742) 8= 2.0113 (1.4153) 124.7323 126.2778 0.1213 (0.9498) 0.0362 (0.9563) 0.2901 (0.9448)

NFHS-3 ß=18557.4806 (34321.4861) 8= 65.5015 (119.8512) 470.3576 473.4683 0.1210 (0.6848) 0.0898 (0.6400) 0.6150 (0.6327)

Forest burnt area ß=3418.3510 (4828.3362) 8= 2.6249 (2.7363) 431.6623 434.1000 0.1446 (0.6214) 0.0984 (0.5964) 1.0663 (0.3236)

Snow fall ß=2907.8650 (8293.9850) 8= 87.5320 (247.1416) 383.029 386.5043 0.1049 (0.7447) 0.0933 (0.6208) 0.5532 (0.6922)

(* Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Kolmogorov-Smirnov (K-S) Statistic, Cramer-Von Mises (CVM) statistics, Anderson-Darling (AD) statistic).

The plot of pdf, h(x) and survival function, S(x) for all four datasets under study, is displayed in Figure 1 and Figure 2 respectively for varying censoring schemes under Pareto II and the Weibull distribution. For the data sets under study, the summary of the various estimates of parameters and parametric functions along with their standard error (shown in bracket) and 95 % confidence interval considering censoring schemes at value c* is given in Table 3. Whereas Table 4 shows, the UMVU estimate of pdf and survival function with Pareto II and the Weibull distribution for varying censoring schemes. It is observed that Pareto distribution has a heavier tail than Weibull.

Fig. 1 Density plot to various data sets censored at value *

Fig. 2 Survival function plot to various data sets censored at value *

Table 3. Summary of estimates of parameters/parametric functions of Pareto II distribution censored at c*.

Parameter/Parametric function Earthquake insurance data NFHS-3 data Forest fire burnt area data Snowfall data

c*=1 c"=5 c*=1 c"=2

MLE (SE) of Pl 0.17391 (0.07904) 0.17241 (0.04050) 0.16667 (0.06212) 0.08000 (0.03837)

MLE (SE) of p2 0.13043 (0.07022) 0.42529 (0.05300) 0.13889 (0.05764) 0.08000 (0.03837)

MLE (SE) of 9 0.61420 (0.15857) 0.19539 (0.03402) 0.16667 (0.03380) 0.38550 (0.06095)

95% CI of Pl (0.01901, 0.32882) (0.09304, 0.25179) (0.04493, 0.28841) (0.00480, 0.15520)

95% CI of p2 (0.00000, 0.26807) (0.32140, 0.52917) (0.02592, 0.25186) (0.00480, 0.15520)

95% CI of 9 (0.30648, 0.92191) (0.12871, 0.26206) (0.10041, 0.23292) (0.26651, 0.50449)

UMVUE (SE) of Pl 0.18182 (0.08223) 0.18293 (0.04269) 0.17143 (0.06370) 0.08333 (0.03989)

UMVUE (SE) of p2 0.13636 (0.07317) 0.45122 (0.05495) 0.14286 (0.05915) 0.08333 (0.03989)

UMVUE (SE) of 9 0.61420 (0.13812) 0.19539 (0.02791) 0.16667 (0.02967) 0.38550 (0.05643)

95% CI of UMVUE Pl (0.02065, 0.34299) (0.09925, 0.26660) (0.04657, 0.29629) (0.00515, 0.16152)

95% CI of UMVUE of p2 (0.00000, 0.27976) (0.34351, 0.55892) (0.02693, 0.25879) (0.00515, 0.16152)

95% CI of UMVUE 9 (0.34348 0.88492) (0.14069, 0.25008) (0.10850, 0.22483) (0.27489, 0.49610)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

3 2 n^U-J w i = 1 ki — 1, k2 — 1, k3 — 1 0.04992 (0.04299) 8.62541 (4.73007) 1.240801 (0.89610) 3.07667 (0.02775)

h1(9)= Bpi ,k1 — 1,k2 — 0, k3 — 0 1-Pi-P, 1 2 3 0.38309 (0.22204) 2.13250 (0.74236) 1.38463 (0.66351) 0.24131 (0.12880)

h2(9)— Bp2 , k1 — 0,k2 — 1,k3 — 0 i-pi-p, 1 2 3 0.28732 (0.18391) 5.26018 (1.52329) 1.15386 (0.58886) 0.24131 (0.12880)

h3(9) — k1 — 0,k2 — 0,k3 — 1 0.55693 (0.08844) 0.82738 (0.00109) 0.85191 (0.02856) 0.68545 (0.04162)

g(9) k — 1 2.29856 (0.64068) 12.51070 (2.73276) 8.53853 (1.92013) 3.07667 (0.51361)

Table 4. Summary of estimates of pdf and reliability function of the various data sets censored at c*.

Earthquake insurance data NFHS-3 data Forest fire burnt area data Snowfall data

Function c" =1 c" =5 c" =1 c" =2

Pareto-II Weibull Pareto-II Weibull Pareto-II Weibull Pareto-II Weibull

$10 = 0.01415 (0.00185) $10 = 0.02091 (0.00295) $1(0 = 0.00030 (5.043e-05) $100 = 0.0011 (1.968e-04) $650 = 6.912e-05 (7.524e-06) $650 = 0.00022 (3.235e-05) $25 = 0.00469 (0.00028) $25 = 0.01383 (0.00114)

pdf $15 = 0.00784 (0.00112) $15 = 0.01317 (0.00113) $500 = 5.473e-05 (7.032e-06) $500 = 0.0003 (6.096e-05) $1350 = 2.966e-05 (3.199e-06) $1350 = 0.00012 (8.101e-06) $50 = 0.00186 (0.00013) $50 = 0.00676 (0.00068)

$40 = 0.00175 (0.00046) $40 = 0.00273 (0.00046) $1000 = 2.387e-05 (3.185e-06) $1000 = 1.851e-05 (1.282e-05) $2500 = 1.451e-05 (1.624e-06) $2500 = 6.421e-05 (3.185e-06) $100 = 0.00072 (7.193e-05) $100 = 0.00108 (0.00044)

S10 = 0.25585 (0.07038) S10 = 0.17398 (0.05400) S100 = 0.18996 (0.03638) ^100 = 0.29777 (0.04249) .W0.27042 (0.07114) ^650 = 0.40 0 38 (0.06434) ^25 = 0.31647 (0.05347) ^25 = 0.43451 (0.05291)

Survival function S15 = 20013 (0.07279) S15 = 0.00183 (0.04218) ^500 = 0.139 03 (0.03296) S500 = 0.05063 (0.02039) •^1350 = 0.23941 (0.06868) •^1350 = 0.05022 (0.06206) •^50 = 0.24389 (0.05082) •^50 = 0.18368 (0.04572)

S40 = 0.10957 (0.05592) 54( = 2.139e-06 (0.00953) •^km = 0.12136 (0.03136) '1000 = 0.00293 (0.02039) ^2500 = 0.21592 (0.05811) ^2500= 0.01212 (0.05510) ^100 = 0.18692 (0.04659) ^100 = 0.02462 (0.01331)

Appendix A. Distributional properties of C(X)

Since the moments of C(X) = (C1(X), C2(X), C3(X)) are functions of (p1,p2,d), and ft assumed known, they are MVUE's of these functions. Hence, in order to find the moments, differentiating g(d_) partially with respect to p1, p2 and 9 under the regularity conditions, we get

where

G =

G= Aß , IAI ± 0 dlogg(d)

dpi dlogg(e)

dp2 dlogg(e)

de

1

1 -V1- P2

1

1 -P1- P2

1

-

(i)

and

V =

E(C1(x)) E(C2(X)) E(C3(x))

E(li(x)) E(l2(x))

E ([log(1 +x)- log2](l - Ii(x) - ^(x))]

A =

dlog h1(£) dlogh2(S) dlogh3(£)

dp1 dp1 dp1

d log h1(e) dlogh2(e) dlogh3(e)

dp2 dp2 dp2

d log h1(e) dlogh2(e) dlogh3(£)

d8

dB

de

Pi 1-P1-P2

1-P1-P2

1

1-P1-P2

P2 1-P1-P2

1

8

- I

Equation (i) gives

E(Ct(x)) = ^ ,i = 1,2 and 3

where At is obtained by replacing ith column of A by the elements of G. Hence,

~E(C1(x)) ■ P1 '

E(C2(X)) = P2 (1-p1-p2)

E(C3(x)) 8

(ii)

Now joint moments of C^1 (x), C22(x) and C^3(x) are given as

EfrM Ck2(x) C3*3(x)) = fC?1(x) C^2(x) #00 a(x)

nh(hi(D) 3(0)

Ct(x)

■dx

which on differentiating with respect to p1, p2 and 8 and using (iv), gives a system of three non-homogeneous equations

G1 = A V , IAI ± 0 (iii)

where

1

e

X

Gi =

dlogE(ckl(x) C%2(x) c33(x))

dpi

dlogE^Hx) C*2(x) C*3(x)*)

dp2

dlogE(ckl(x) C*2(x) C*3(x))

d9

V =

E (c*1+1(x) C*2(x) C*3(x)) - E(C1(x))E (c^1(x) C*2(x) C*3(x)) E (C^1(x) C2k2+1(x) C3k3(x)) - E(C2(x))E (c21(x) C22(x) C33(x)) E (C21(x) C22(x) C33+1(x)) - E(C3(x))E (C21(x) C22(x) C33(x))

°1(1,2,3) O2(1,2,3) .°3(1,2,3).

, (say).

Using Cramer's rule for the solution of a system of linear equations (iii) gives

Al

°i(i,2,s) = "m" ,i = 1,2 and 3

where At is obtained by replacing ith column of A by the elements of G1. For kt = 1 and kj = 0V i ± j = 1,2 and 3, we get covariance between Q (x) and Cj (x) as

lAi l(ki = l;kj = 0),i*j

(1,2,3)

\A\

Thus, we have the variance-covariance matrix V as

V = [O ,1 =

(\Al\(kl = 1;kj = 0),l*!)

^3X3 \.A\

If Atj is the cofactor of the element ay of A, then

A \(3i=1-3j=0),i*j=1,2,3 = Au-^-E(Ci(x)) + A2i-^-E(Ci(x)) + A3i-^E(Ci(x))

and hence

V=

P1CI-P1) -P1P2 -QP1Q-P1-P2) P1P2 P2 (1-P2) -S P2 (1-P1-P2)

P1(1-P1-P2) P2(1-P1-P2) [1-(P1 + P2)2]

e

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

e

e2

(iv)

where \ A \ =

P1 P2 (1-P1-P2)

1

Acknowledgements

This work is supported by DST-FIST No. SR/FST/MSI-104/2015. Authors also thank the reviewers for their useful comments.

References

[1] Aban, I. B., Meerschaert, M. M. and Panorska, A. K. (2006). Parameter estimation for the truncated Pareto distribution, Journal of American Statistical Association, 101(473): 270-277.

[2] Aitchison, J. (1955). On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of American Statistical Association, 50: 901-908.

[3] Charalambides, C. H. (1974). Minimum variance unbiased estimation for a class of left truncated distributions. Sankhya A, 36: 392-418.

[4] Embrechts, P., Resnick, S. I. and Samorodnitsky, G. (1999). Extreme value theory as a risk management tool. North American Actuarial Journal, 3 (2): 30-41.

[5] Gupta, R. C. (1977). Minimum variance unbiased estimation in modified power series distribution and some of its applications. Communication in Statistics, 6: 977-991.

[6] Jaffe, D. M. and Russell, T. (1996). Catastrophe Insurance, Capital Markets and Uninsurable Risk. Philadelphia: Financial Institutions Center. The Wharton School, pp. 96-112.

[7] Jani, P.N. (1977). Minimum variance unbiased estimation for some left-truncated modified power series distributions. Sankhya, 39: 258-278.

[8] Jani, P. N. (1993). A characterization of one-parameter exponential family of distributions. Calcutta Statistical Association Bulletin, 43 (3-4): 253-256.

[9] Jani, P. N. and Dave, H. P. (1990). Minimum variance unbiased estimation in a class of exponential family of distributions and some of its applications. Metron, 48: 493-507.

[10] Jani, P. N. and Singh, A. K. (1995). Minimum variance unbiased estimation in multi-parameter exponential family of distributions. Metron, 53: 93-106.

[11] Jayade, V. P. and Prasad, M. S. (1990). Estimation of parameters of mixed failure time distribution. Communication in Statistics - Theory Methods, 19(12): 4667-4677.

[12] Joshi, S. W. and Park, C. J. (1974). Minimum variance unbiased estimation for truncated power series distributions. Sankhya A, 36: 305-314.

[13] Kale, B. K. and Muralidharan, K. (2000). Optimal estimating equations in mixture distributions accommodating instantaneous or early failures. Journal of the Indian Statistical Association, 38: 317-329.

[14] Khatri, C. G. (1959). On certain properties of power series distributions. Biometrica, 46: 486490.

[15] Muralidharan, K. (2010). Inlier prone models: A review. ProbStat Forum, 3: 38-51.

[16] Muralidharan K. and Bavagosai, P. (2017) Analysis of lifetime model with discrete mass at zero and one. Journal of Statistical Theory Practice, 11(4), 670-692.

[17] Muralidharan K. and Bavagosai, P. (2018). A new Weibull model with inliers at zero and one based on type-II censored samples. Journal of Indian Society of Probability and Statistics, 19: 121151.

[18] Muralidharan, K. and Lathika, P. (2006). Analysis of instantaneous and early failures in Weibull distribution. Metrika, 64(3): 305-316.

[19] Patel, S. R. (1978). Minimum variance unbiased estimation of multivariate modified power series distribution. Metrika, 25: 155-161.

[20] Patil, G. P. (1963). Minimum variance unbiased estimation and certain problem of additive number theory. Annals of Mathematics and Statistics, 34: 1050-1056.

[21] Reed, W. J. and Jorgensen, M. (2004). The double Pareto-lognormal distribution—A new parametric model for size distributions. Communication in Statistics- Theory Methods 33(8): 1733-1753.

[22] Reddy, C, S., Jha, C. S., Manaswini, G., Alekhya, V. V. L. P., Pasha, S. V., Satish, K. V., Diwakar, P. G. and Dadhwal, V. K. (2017). Nationwide assessment of forest burnt area in India using Resourcesat-2 AWiFS data. Current Science, 112(7): 1521-1532.

[23] Roy, J. and Mitra, S. K. (1957). Unbiased minimum variance estimation in a class of discrete distributions. Sankhya, 18: 371-378.

[24] http://www.dhsprogram.com/data/dataset/India_Standard-DHS_2006.cfm?flag=0

[25] https://thetoolboss.com/average-snowfall-us-states.

A Pareto II Model With Inliers at Zero and One Based on TYPE-II Censored Samples Текст научной статьи по специальности «Науки о Земле и смежные экологические науки»

Аннотация научной статьи по наукам о Земле и смежным экологическим наукам, автор научной работы — Bavagosai Pratima, K. Muralidharan

Похожие темы научных работ по наукам о Земле и смежным экологическим наукам , автор научной работы — Bavagosai Pratima, K. Muralidharan

Текст научной работы на тему «A Pareto II Model With Inliers at Zero and One Based on TYPE-II Censored Samples»