Научная статья на тему 'Principles of classification reliability statistical data of the electric equipment of power supply systems'

Principles of classification reliability statistical data of the electric equipment of power supply systems Текст научной статьи по специальности «Математика»

CC BY
64
16
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук

Аннотация научной статьи по математике, автор научной работы — E. M. Farhadzadeh, Y. Z. Farzaliyev, A. Z. Muradaliyev

The result of comparison of criteria, which statistics characterize differing properties of random variables of sample, depends on the importance of these properties. In turn, the importance of properties can essentially change for modeled analogues of sample

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Principles of classification reliability statistical data of the electric equipment of power supply systems»

PRINCIPLES OF CLASSIFICATION RELIABILITY STATISTICAL DATA OF THE ELECTRIC EQUIPMENT OF POWER SUPPLY SYSTEMS

E.M.Farhadzadeh, Y.Z.Farzaliyev, A.Z.Muradaliyev

Azerbaijan Scientific-Research and Design-Prospecting Institute of Energetic AZ1012, Ave. H.Zardabi-94, e-mail:[email protected]

ABSTRACT

The result of comparison of criteria, which statistics characterize differing properties of random variables of sample, depends on the importance of these properties. In turn, the importance of properties can essentially change for modeled analogues of sample

I. INSTRUCTION

It is known, that the basic requirements shown to the decision of numerous operational problems in electro power systems (EPS), is maintenance of reliability of work and decrease in operational expenses [1]. The bright example are problems of the organization of maintenance service and repair (MS&R) electric equipments. In turn, maintenance of reliability provides an opportunity of comparison estimations parameters of reliability (PR) concrete electric equipment, i.e. transition from average PR to parameters of individual reliability. Average PR electric equipments are important and traditionally used, for example, at comparison PR of schemes of projected switching centre, at an estimation of size of a reserve of capacity in EPS. At calculation PR of the concrete equipment on known, simple enough, to formulas and algorithms, experts meet essential difficulties. PR calculated on population (i.e. the average estimations) unsuitable for the decision of operational problems. In addition, data on refusals and defects of the concrete equipment are so poor, that or do not allow to calculate PR, or accuracy of estimations appears unacceptable. Therefore, maintenance of reliability of work in practice carried out, unfortunately, mainly at an intuitive level. The certain contribution to the decision of this problem is brought by the automated information systems providing information support of dispatching personnel EPS. But the objective estimation of parameters is still actually on individual reliability.

II. FEATURES OF STATISTICAL DATA

Necessary to note, that at the analysis of reliability of equipment EPS classification of statistical data of operation on one, and sometimes and to the two attributes, set by nameplate data and data of conditions of operation, it is spent. For example, in [2] are resulted PR electric equipments of a various class of a voltage. Are occasionally resulted PR electric equipment grouped as or to purpose, a design, service life, a manufacturer and other attributes. Classification of statistical data more than to two attributes does not practice. The reason for that is the variety of versions of attributes (VA) and decrease in accuracy of estimations PR (increase in width of a confidential interval). Decrease in accuracy occurs within the limits of the assumption of conformity of statistical data to casual sample of some general population.

Actually:

1. The statistical data describing reliability of equipment EPS (data on non-working conditions), depend on the big number of passport and operational data (installation sites, a class of a voltage, a design, service life, etc.) and consequently cannot be considered neither as analogue of general population, nor as final sample of homogeneous data. In the mathematician such data it is accepted

to name multivariate. Unfortunately, analytical methods of the analysis of multivariate data developed only for the assumption of conformity of distribution of random variables to some to one, to mainly normal law of distribution. It at all does not correspond to the real histograms of distribution constructed on statistical data of operation of electric equipment. As an example on fig.1 histograms of duration of emergency switching-off are resulted (xa) power units 300МVt [3]. The first histogram characterizes distribution according to operation of eight power units for the period 1992-2006 years. The second histogram characterizes distribution Р*(Ta) all power units for 2005 year. The number of cases of emergency switching-off for this sample has decreased with 634 up to 48. On the third histogram distribution Р* is shown P (xa) for the first power unit in 2005year. 2.

r W n-» ^ r< — 1 : l_ . . ]■ —:: _ *- A \

1 fl r — r ( P CL )

rr~

h1 I — ¿34 V " M> nv h i-l

p= \ i * L J,o

• Ox «-Cf 01

b. r 1. ( e. b

0, ? 71 -t

2 -

::: j: M r><

-

— 1— ::: r Y

11 • h o 1 DO V OO I ■too I 00 300 u foo 1 DO

Fig.1 Histograms of duration of emergency switching-off of power units 300МVt

Comparison of character of change of these histograms and laws of change of normal distribution confirms small probability of conformity Р*(Ta) to one concrete and, in particular, to the normal law of distribution.

3. At classification of multivariate statistical data on set VA, selective data taken from final population of multivariate data not casually. For example, all switches with rated voltage 110Kv not casually get out. We shall specify this feature. Not casual sample:

- consists of random variables;

- number of random variables of sample nv is casual, changes in time, for example, increases;

- features of distribution in an interval of change random variables final population of multivariate data depends from VA;

3. The type of the law of distribution of final population of multivariate statistical data not only is not known. It regularly casually varies in process of accumulation of statistical data

4. The interval of change random variable in sample of final population of multivariate statistical data on set VA is no more, than an interval of change random variable in the most final population. We shall remind, that for general population of a random variable the average quadratic deviation always is less, than the average quadratic deviation for any on number of representative sample and with reduction of number of random variables in sample, nv an estimation of an average quadratic deviation increases. These features allow concluding, that application of classical methods of the analysis samples from general population for the analysis samples from final population of multivariate data it is necessary to be careful.

III. ABOUT SET STATISTIC, DESCRIBING RANDOM VARIABLES OF SAMPLE

The most objective approach to the decision of the statistical problems arising at classification of multivariate data is application of computer modeling possible samples and checks of assumptions (hypotheses) about expedient classification of data on everyone VA. Difficulties arise at an estimation of expediency of classification of data. As matter of fact - this know problem about a finding significant VA. We spent the decision of this problem within the limits of methodology of the theory of check of statistical hypotheses. In mathematical statistics, it considered two types of the problems connected with comparison of functions of distribution:

1. Check of the assumption that sample of random variables X casually taken from general population of random variables with the set type of distribution FS(X).

2. Check of the assumption of uniformity two or several sample the random variables casually taken from same general populations with the known continuous law of distribution FS(X).

The estimation of expediency of classification of multivariate statistical data offered to be carried out by comparison of statistical functions of distribution (s.f.d.). Final population of multivariate data F*(X) with s.f.d. samples fV(Y), of this population. Comparison F*(X) also FV(Y) theoretically carried out on number of numerical characteristics of a random variable a vertical divergence of distributions F* (X) and FV* (Y), which we shall designate as A(Y) also we shall calculate under the formula:

A(Yi) = F*(Yi) - FV(Yi) (1)

where: 1 < A(Yi) <1 c i=1,nv

According to the established practice, these numerical characteristics we shall name statistics and we shall designate S (A). To S(A) concern:

1. The greatest vertical divergence between F* (X) and FV* (Y). It is calculated on following algorithm:

1.1. nv realizations A(Y) are placed in ascending order;

1.2. Absolute values of the first and nv -th values ranking of some A(Y) are compared and the greatest value is defined Am;

1.3. It also is the greatest vertical deviation F*(X) and fV(Y) with the sign;

2. Average value of a vertical divergence. It is calculated under the formula:

M* [A (Y)] = n-1 £ |A(Yi)| = A*v (2)

i=1

It is necessary to note, that

M* [A(Y)]* n-1 £ A(Yi) (3)

i=1

As under this formula, average value of a random variable A(Y), instead of an average deviation is calculated. Distinction between formulas (2) and (3) shown, when among realizations A(Y) there are both positive, and negative sizes.

3. Average quadratic deviation A(Y). Are calculated under the formula:

a*[A(Y)] =

By analogy with p.2

£ [Av — | A(Yi)|]2

= A* (4)

(nv — 1)

* [A(Y)] =

1

£ [A*av — A(Yi)]2

(nv — 1)

(5)

(6)

4. Scope of dispersion of a random variable L*V(A). It is calculated under the formula:

L*V(A) = A —A .

V V / max min

This list could be continued. But also it is enough resulted numerical characteristics for an illustration of the mechanism of comparison efficiency of criteria check assumptions of character a divergence s.f.d. F*(X) and FV(Y)

CT

IV. A QUESTION ON EFFICIENCY OF CRITERIA

According to the established practice efficiency of criteria is characterized by function of capacity of criterion W[S(A)]. In turn W[S(A)] =1-p[S(A)], where-p[S(A)] - an error II type for statistics S(A). The essence of considered criteria is same and is reduced to comparison of empirical value S^A) with boundary value of distribution R[Sa(A)]=a[S(A)], where a[S(A)] - an error I type.

It is accepted, for the fixed value a[S(A)] to consider criterion as more effective if its function of capacity has the greatest value. So that to compare with efficiency of criteria it is enough to construct dependences W[S(A)] from a[S(A)] and to compare W[S(A)] for 0<a[S(A)] <1 The algorithm of construction of this dependence reduced to following calculations:

1. Construction s.f.d. realizations of statistics S1(A) for initial assumption H1 according to which distributions F*(X) also FV(Y) differ casually. We shall designate this distribution as F*[S1(A)].

The sequence of calculations, features of imitating modeling of realizations representative sample, results of calculations for of some nv is resulted in [4] on an example of statistics of the greatest vertical deviation Am;

2. Here the sequence of construction s.f.d. is resulted. Realizations of statistics S2(A) for assumption H2 according to which distributions F* (X) also FV (Y) differ not casually.

3. It is systematized realizations F*[S1(A)] and F*[S2(A)] at S1(A)=S2(A). As quantile distributions F*[S1(A)] are not equal quantile distributions F*[S2(A)], performance p.3. It appears impossible. The analysis of realizations quantile these distributions after ranking shows, that distinction of some realizations S1(A) and S2(A) takes place not less than in the fourth category of their quantitative estimations. If to neglect this difference, the number of equal realizations S1(A) and S2(A) reaches 10%. Unfortunately, this quantity is often not enough for the full characteristic of dependence W[S(A)]=^{a[S(A)]}. The decision of this problem found the basis of the assumption of linear character of change s.f.d. Intervals between quantile distributions.

Considering, that the number quantile estimated in hundreds, the size of an entered error calculations corresponding quantile probabilities appears less accuracy of calculation quantile.

JSCS (a)]

Fig.2. Curves

1. Ri[S(A)] = 1-Fi[S(A)]; Mi[S(A)]

2. F2[S(A)]; M2[S(A)] = Mi[S(A)]

3. F3[S(A)]; M3[S(A)] = 2.0Mi[S(A)]

4. F4[S(A)]; M4[S(A)]>>Mi[S(A)]

0,5 <0

Fig.3. Typical dependences

P[S(A)]=9{a[S(A)]}

Are constructed according to Ri[S(A)] and i. F2[S(A)]; 2. F3[S(A)]; 3. F^S(A)]

Let's consider features of application of this approach to the sample analysis from final population of multivariate data on set VA. On fig.2 typical functions of statistic distribution, describing a divergence F*(X) are resulted andFV(Y). As simplification, s.f.d. F * [S (A)] are

represented by continuous functions of distribution. Three variants of sample distributions are shown. Curves 2 and 4 characterize limiting parities s.f.d. Final population of multivariate data FS(X) and s.f.d. the second and the fourth sample FV 2(X) and FV 4(Y).

The parity F* (X) also FV* 2 (Y) characterizes a case, when functions of distribution

[i-Ri[S(A)]] and F2 [S(A)] are practically identical, and a parity F*(X) and FV4(Y) - a case, when

a divergence [i-Ri[S(A)]] and F4[S(A)] it is not casual. A parity of functions of distribution Ri[S(A)] and F3[S(A)] borrows intermediate position.

As follows from fig.2: As Ri[S^i(A)]>>F2[S^i(A)], H^Hi "

Ri[S^2(A)] <<F2[S^2(A)], H^H2 ^ (7)

Ri[S^3(A)] <<Fз[Sэ,з(A)], H^H2 Rl[Sэ,2(A)]>Fз[Sэ,з(A)], H^Hi For these parities are constructed and represented on fig.3 dependences P[S(A)]=^{a[S(A)]}. In particular, a curve i according to Ri[S(A)] and F2[S(A)], a curve 2 according to Ri[S (A)] and F3 [S (A)] a curve 3-on data Ri[S(A)] and F4[S(A)].

V. EXPERIMENTAL RESEARCHES

In practice for statistical check of the assumption that sample of random variables X is casually taken from general population of random variables with the set law of distribution FS(X), the greatest distribution was received Kolmogorov's based on statistics Dn [4] with the criterion. This criterion concerns to group nonparametric. In other words, this criterion with success can be used as for comparison FS(X) and Fv* (X), F* (X) and Fv* (X). Formulas and tables for application of this criterion are resulted in many monographers and manuals. And practically in all these sources the inaccuracy of a finding of size of the greatest vertical divergence of distributions FS(X)

and Fv (X) is marked as maximal value absolute sizes of observable values A. However, in one of many seen managements on mathematical statistics the reason of this mistake is not underlined.

The analysis of statistics Am

Realizations of statistics Am, were calculated on following algorithm [5]. - Pay off Ai = (^ - i/nv); i=1,nv

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

- Are defined

Am,i = min (Ai, A2,..., Ai, ..., Anv}

(8)

Am,i = max {Ai, A2.., Ai, ..., Anv}

- If I Am,il > I Am,2 I , that Am =Am,1

- Differently Am =Am,2 Here random numbers with uniform expansion in an interval [0,1], simulating true values quantile uniform distribution. S.f.d. F*(Am) constructed on 25000 realizations Am and for of some nv are resulted on fig.4. Importance of these researches consist first of all that with sufficient accuracy for practice borders of change could be established Am, describing the greatest vertical divergence s.f.d. and Fv*(^) with the set significance value. And by that to have an opportunity to estimate

character of a divergence any s.f.d. FS(X) and Fv*(X).

Fig.4. Statistical functions of distribution F*(Am) for nv=4 both 29 and number of iterations N=25000

It is established, that quantile distributions F*(Am) =a<0,1 for n > 2 are equal on size and opposite on a sign quantile distributions F(Dn) =2a. In other words Dn though characterizes the greatest divergence of expansions FS(X) and Fv* (X), but there is no the greatest vertical divergence

of these distributions. Distributions Am it is original. Here Am it considered simply as some numerical value, ранжированное in ascending order. If Am to consider as the greatest vertical divergences s.f.d. FS(Q and F*(£), function F*(Am) - not is s.f.d. The reason for that presence of positive and negative values Am. And than more negative value Am, i.e. than divergence FS(Q and Fv*(^) there is more, that probability of acceptance of hypothesis Hi (FS(Q and Fv*(^) miss

casually) there will be more, and a probability of event Am=0 essentially more zero. Laws of change of distribution of positive and negative values Am for nv=4 are resulted on fig. 5, and a parity of their number for of some nv - in table 1.

Fig.5. The histogram of distribution of the greatest vertical distribution FS(Q and F** (£)

Table 1.

Data on a parity of positive and negative values A ^

Number of random variables in sample 2 4 7 11 16 22 29 150

Relative number of negative values Am 0,87 0,79 0,73 0,68 0,65 0,63 0,61 0,55

Parity of negative and positive values Am 6,7 3,8 2,7 2,1 1,9 1,7 1,6 1,2

As follows from table 1, with increase nv a parity of negative and positive values Am decreases, but at nv=150 it still is not equal to unit. In table 2 experimental and settlement values quantile distributions Fv*(Am) for of some nv and probabilities R^Am) = [l - Fv*(Am)]=a are resulted. We shall remind that experimental values received by imitating modeling on the computer [5], and settlement value-under the formula:

0.5«) = -kp0L + nv1 ] (9)

Table 2

Experimental and settlement values quantiles distributions Fv* (X) for of some nv and

probabilities R;(Am) = [l - Fv*(Am)].

RV(A m) Am Number of random variables in sample (ni)

2 4 6 11 40 90 150

0.025 experiment 0,343 0,377 0,358 0,302 0,185 0,131 0,104

settlement 0,342 0,373 0,356 0,298 0,183 0,130 0,103

0.05 experiment 0,285 0,319 0,303 0,260 0,164 0,116 0,092

settlement 0,285 0,317 0,302 0,262 0,162 0,116 0,092

0.1 experiment 0,184 0,240 0,244 0,216 0,140 0,100 0,079

settlement 0,184 0,244 0,244 0,218 0,139 0,100 0,079

0.2 experiment 0,060 0,160 0,171 0,160 0,112 0,091 0,065

settlement 0,061 0,165 0,171 0,164 0,116 0,081 0,064

0.3 experiment -0,239 -0,173 -0,127 -0,097 0,089 0,067 0,053

settlement -0,027 0,105 0,125 0,128 0,094 0,068 0,055

Given tables 2 show, that the formula (9) precisely enough displays interrelation of boundary values of an interval of change of statistics Am provided that 0,25. We shall enter into consideration three statistics based on random variables of an absolute vertical divergence of distributions FS(X) and Fv* (X) :

— The greatest value of an absolute divergence. The algorithm of calculation looks like:

- calculated Ai = - i/nv); i=1,nv "i (10)

- defined Bv = max {Ai, A2.., Ai, ... Anv} J Average value of an absolute divergence M*, j (A) with j=1,N, where N - number of iterations. It is calculated under the formula (3)

Average quadratic value of an absolute divergence aj(A). It is calculated under the formula

(5)

The analysis of statistics Bv

Distribution F*(Bv) has essential advantage in comparison F*(Am). It characterizes distribution of size of the greatest deviation of functions of distribution FS(X) and Fv* (X) without taking into account a sign on a deviation, i.e. it is considered equivalent both positive, and negative value of a deviation Am. In table 3 are resulted quantile distributions Bv for of some values nv and probabilities F*(Bv). If with them to compare to data table.2 it is easy to notice essential distinction of their critical values. So at F*(Dn)=F*(Bv)=F*(Am)=0,05 and nv=4 corresponding quantile will be equal Dn=0.624, Bv=0.570 and Am=0,319. Thus, the essence of a mistake at practical applications of criterion of Kolmogorov consists more often that statistics Bv is compared not to critical value of distribution R*(Bv), and with critical value of statistics of Kolmogorov Dn. If to sum up the aforesaid it is necessary to note, each of entered in consideration statistics, for example, Dn, Bv or Am, at check of the assumption it should be compared to the critical values, calculated on distributions, accordingly, F*(Dn), F*(Bv) and F*(Am). In the illustrative purposes according to table 3 on fig.6 statistical distributions R*(Bv)=1-F*(Bv) for nv=4, 22 and 150 are resulted. As one would expect with increase nv critical values Bv decrease. Character of distribution R*(Bv) changes also

Fig.6 Statistical distributions R*(By)=1-F*(By) for of some nv

In table 4 the factors of the equation Bv = A • n—b calculated according to table 3 and factor determination R2 are resulted.

Table 4

Factors of the equation of regress

F*(Bv) a b R2

0.9 1.079 0.459 0.9998

0.9 0.942 0.453 0.9997

0.8 0.774 0.439 0.9986

0.7 0.668 0.430 0.9982

0.6 0.590 0.422 0.9985

0.5 0.518 0.412 0.9975

0.4 0.447 0.396 0.9956

0.3 0.384 0.382 0.9922

0.2 0.317 0.360 0.9862

0.1 0.236 0.321 0.9829

As an example on fig.7 laws of change of a curve Bv = A • nvb for F*(Bv)=0,95 (for a significance value are shown a=0,05 and 0,50)

Fig.7 Laws of change of absolute size of the greatest vertical deviation from number of sample units nv at a=0,05 and 0,05

Table 3

Quantile distributions of statistics for of some values nv and probabilities F*(Bv).

N F*(Bv) Number of sample units (nv)

2 4 7 11 22 29 40 90 150

1 0,05 0,112 0,127 0,116 0,104 0,083 0,075 0,067 0,048 0,038

2 0,1 0,157 0,154 0,136 0,120 0,094 0,084 0,075 0,053 0,042

3 0,15 0,193 0,175 0,151 0,131 0,103 0,092 0,081 0,057 0,045

4 0,2 0,223 0,191 0,164 0,142 0,110 0,098 0,087 0,061 0,048

5 0,25 0,249 0,208 0,177 0,152 0,117 0,104 0,092 0,064 0,051

6 0,3 0,274 0,222 0,189 0,160 0,124 0,110 0,097 0,067 0,053

7 0,35 0,300 0,236 0,201 0,170 0,130 0,115 0,101 0,071 0,056

8 0,4 0,324 0,250 0,213 0,179 0,136 0,121 0,106 0,074 0,058

9 0,45 0,348 0,268 0,225 0,189 0,143 0,127 0,111 0,077 0,061

10 0,5 0,376 0,286 0,236 0,198 0,150 0,132 0,116 0,080 0,063

11 0,55 0,401 0,306 0,249 0,209 0,157 0,139 0,121 0,083 0,066

12 0,6 0,426 0,326 0,262 0,219 0,164 0,145 0,127 0,087 0,069

13 0,65 0,449 0,348 0,277 0,231 0,172 0,152 0,133 0,091 0,072

14 0,7 0,473 0,370 0,294 0,244 0,181 0,160 0,139 0,095 0,075

15 0,75 0,499 0,393 0,313 0,258 0,191 0,169 0,147 0,100 0,079

16 0,8 0,548 0,421 0,334 0,276 0,203 0,179 0,155 0,106 0,083

17 0,85 0,620 0,454 0,358 0,295 0,217 0,191 0,166 0,112 0,088

18 0,9 0,683 0,497 0,391 0,322 0,235 0,206 0,179 0,122 0,096

19 0,95 0,778 0,568 0,442 0,362 0,263 0,232 0,201 0,136 0,107

20 0,99 0,902 0,689 0,538 0,440 0,320 0,283 0,240 0,164 0,129

As is known, average arithmetic value of random variables is the basic numerical characteristic of their center of grouping. Distinguish also a geometrical average, a harmonious average, a fashion and a median. In spite of the fact that all these numerical characteristics is united with concept of the center of grouping of random variables, each of them, so to say, «has the center» and only it and characterizes. Hence, «the center of grouping of random variables» considered as an attribute, and its versions will be the numerical characteristics noted above.

Each of VA will characterize features of distinction of distributions FS(X) and Fv*(X) peculiar only to it. Below formulas for an estimation of these numerical characteristics are resulted.

Calculation of average arithmetic value n*,(A) spent under the formula (2), calculation of average geometrical value - under the formula:

<(A) = n

III A i|

i=1

and calculation of an average harmonious - under the formula:

-i

1 ^ 1 h;(a) = 1 ^ 1

■Si ,

n;t! A i

(11)

(12)

The estimation of a fashion spent under the histogram as average value of an interval, probability of hit in which random variables of sample the greatest.

The estimation of a median is spent by a finding 0,5nv values ranging random variables of sample, if nv even, and

A med = 0.5

A nv+1 + A nv-1

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2 2

(13)

Values - if nv uneven.

According to algorithm of classification of data with a view of decrease in duration of calculations, the expediency of classification of data supervised for the sample having the greatest value of statistics. Thus considered, that if for this sample divergence FS(X) and F;* (X) with the

minimal risk of the erroneous decision it can be accepted casual the divergence of all others sample on set VA casual also.

In table 5 some results of calculation quantile distributions of statistics m; (A) for of some n; and probabilities F*[M;(A)] are resulted at N=25000

In the illustrative purposes according to table 5 on fig.8 distributions R*[M;(A)] = {1 - F*[M;(A)]} for nv=4 are resulted; 10 and 50, and on fig.9 a curve of dependence

of size of statistics M* (A) fromnv for R*[M* (A)] = a = 0.05 h 0.5 . (til]

< M

Itw

. n,

cu Oil 0,1

0 (6 26 io »10 5"0 60 To

90 160 1(0 12c (30 (fo <5o 1£0

Fig.8. Laws of change

r*[M;(a )]

Fig.9. Dependence of critical values M; (A) from number of sample units t

Table 5

Quantile distributions of statistics M^A) for of some nv and probabilities F^M^A)]

N F*[MV(A)] Number of sample units (nv)

2 4 7 11 22 29 40 90 150

1 0,05 0,077 0,0714 0,056 0,045 0,032 0,028 0,024 0,016 0,013

2 0,1 0,111 0,085 0,065 0,052 0,037 0,032 0,027 0,018 0,014

3 0,15 0,136 0,097 0,072 0,057 0,040 0,035 0,030 0,020 0,015

4 0,2 0,158 0,106 0,078 0,062 0,044 0,038 0,032 0,021 0,016

5 0,25 0,177 0,115 0,084 0,067 0,046 0,040 0,034 0,023 0,017

6 0,3 0,194 0,124 0,090 0,071 0,049 0,043 0,036 0,024 0,019

7 0,35 0,209 0,133 0,096 0,075 0,052 0,045 0,038 0,025 0,020

8 0,4 0,224 0,142 0,102 0,080 0,055 0,048 0,041 0027 0,021

9 0,45 0,238 0,152 0,109 0,085 0,058 0,050 0,043 0,028 0,022

10 0,5 0,250 0,163 0,116 0,090 0,062 0,053 0,045 0,030 0,023

11 0,55 0,276 0,173 0,123 0,095 0,065 0,056 0,048 0,032 0,024

12 0,6 0,303 0,185 0,131 0,101 0,069 0,059 0,051 0,033 0,026

13 0,65 0,331 0,198 0,140 0,107 0,073 0,063 0,054 0,035 0,027

14 0,7 0,361 0,213 0,150 0,115 0,078 0,068 0,058 0,038 0,029

15 0,75 0,395 0,232 0,162 0,123 0,084 0,073 0,062 0,040 0,031

16 0,8 0,433 0,254 0,176 0,134 0,091 0,078 0,067 0,043 0,034

17 0,85 0,475 0,281 0,194 0,147 0,099 0,085 0,073 0,048 0,037

18 0,9 0,526 0,314 0,217 0,164 0,112 0,095 0,081 0,053 0041

19 0,95 0,591 0,363 0,254 0,192 0,130 0,111 0,095 0,062 0,048

20 0,99 0,68 0,448 0,318 0,250 0,168 0,145 0,121 0,080 0,062

As follows from fig.8, s.f.d. the sums of random variables with uniform distribution in an interval [0,1] even for nv=150 it is dissymmetric. And consequently, critical values quantile these distributions cannot be calculated according to average arithmetic value M*,.05(A) and average

quadratic value a*[M*v(A)]. The analysis shows, that the equation of interrelation M*v(A) and nv

for the fixed value R*[M*,(A)] = a (fig.9) can be presented by sedate function M*v(A) = An ~b with

factor of determination R2> 0.99

In table 6 the constant factors of this equation calculated under the standard program of

sedate transformation for of some values R*[M*,(A)] = a are resulted.

Table 6

Estimations of constant factors of the equations of regress and factor of determination

N R*[M*,(A)] = a. Factors of regress R2

AND IN

1 0.05 0.81 0.58 0.9961

2 0.1 0.71 0.58 0.9949

3 0.2 0.57 0.58 0.9941

4 0.3 0.48 0.57 0.995

5 0.4 0.41 0.56 0.996

6 0.5 0.35 0.55 0.998

7 0.6 0.30 0.54 0.998

8 0.7 0.26 0.53 0.998

9 0.8 0.22 0.52 0.999

10 0.9 0.16 0.49 0.999

So that to find critical value of statistics M*)(A), for example, a=0,5 and nv=5 it is necessary to calculate M^05(A) = 08^0 58 only and if we shall compare with empirical value of

statistics M*^(A) with M*,.05(A) at Mv^(A)<M*,.05(A) it is possible to approve, that sample with

a high probability is homogeneous with final population of multivariate data. In other words, classification of data on set VA is inexpedient.

Analysis of statistics a*, (A).

At the analysis of statistical data of operation, EPS a degree of dispersion nv realizations of a random variable A concerning the center of grouping M*v(A) it is characterized by an average

statistical deviation a*v(A)more often. The factor of a variation, and size of scope of dispersion

calculated under formula L*(A)=(Amax-Amin) is less often used it is not applied. Practice of classification of multivariate data shows, that sample of random variables X on significant VA is concentrated to some interval [Xj;Xj+n ], which according to recommended algorithm is located in

the top part of an interval of change final population of multivariate data since, M* (A) < M*. (A) and it is essential less it.

In table 7 some results of calculation quantile distributions of statistics a*. (A) for of some nv and probabilities F*[a*,(A)] with step 0,05 for number of iterations N=25000 are resulted. In the illustrative purposes on fig.10 are resulted in the form of continuous curves s.f.d. F*[a*,(A)] for of

some nv. On fig. 11 the curve changes a*, (A) = ^(nv) received under table 7 and the standard program of sedate transformation of statistical data are resulted

£01

o.os o,io 0,iS °'го

Fig.10 Character of change s.f.d. F*[a*V(A)] depending on nv

0,iS

с,г

0,1 o.t>5

♦CM

J1*

0 10 2.0 50 ID JEû te ?" so 90 lt>c 11С 1 Ю l3<5 <fo <5t> )£o

Fig.11. Laws of change critical

values of statistics а*,(Л ) depending on nv

In table 8 factors of the equation the regresses calculated under the standard program of sedate transformation for of some values R*[aV(A)] are resulted

Table 8

Estimations of constant factors of the equations of regress and factor of determination

N R*K(A)] = a . Factors of regress R2

AND IN

1 0.05 0,428 0,54 0,997

2 0.1 0,385 0,55 0,996

3 0.2 0,322 0,545 0,996

4 0.3 0,276 0,54 0,998

5 0.4 0,237 0,53 0,999

6 0.5 0,197 0,50 0,999

7 0.6 0,163 0,48 0,995

8 0.7 0,129 0,46 0,991

9 0.8 0,094 0,43 0,996

10 0.9 0,061 0,40 0,992

VI. SOME RESULTS COMPARISON OF CRITERIA

Results of the analysis of laws of change s.f.d. статистик Bv, M*V(A) also a*,(A) have

allowed estimating probabilities F*[Bv3], F*[MV(Aэ)] and F*[aV(AЭ)], where the index «e»

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

designates "experimental" value of probability of display of each of statistics. And as these of statistics characterize those or other properties of casual values of a vertical divergence of distributions FS(X) and Fv* (X), the probability of display of statistics will characterize, as a matter

of fact, the importance of this property. In other words, an attribute of divergence FS(X) and Fv* (X)

is the vertical distance between these distributions, and versions of an attribute - statistics.

By comparison, of this statistic the question on that, first, is of interest, probabilities of display of each of statistics, calculated on the same sample of general population how much essentially differs. Some results of calculations are resulted in table 9.

Table 7

Quantile distributions of statistics a*v(A) for of some nv and probabilities F*[a*v(A)] on N=25000

N fX(A)] Number of sample units (nv)

2 4 7 11 22 29 40 90 150

1 0,05 0,012 0,038 0,037 0,031 0,023 0,020 0,017 0,012 0,009

2 0,1 0,025 0,050 0,044 0,036 0,026 0,023 0,019 0,013 0,010

3 0,15 0,037 0,059 0,048 0,039 0,028 0,025 0,021 0,014 0,011

4 0,2 0,049 0,066 0,053 0,042 0,030 0,026 0,023 0,015 0,012

5 0,25 0,062 0,073 0,056 0,045 0,032 0,028 0,024 0,016 0,012

6 0,3 0,075 0,079 0,060 0,048 0,034 0,030 0,025 0,017 0,013

7 0,35 0,089 0,085 0,064 0,051 0,036 0,031 0,027 0,018 0,014

8 0,4 0,103 0,091 0,067 0,053 0,038 0,033 0,028 0,019 0,014

9 0,45 0,118 0,096 0,071 0,056 0,040 0,034 0,029 0,019 0,015

10 0,5 0,130 0,102 0,075 0,059 0,041 0,036 0,031 0,020 0,016

11 0,55 0,150 0,107 0,079 0,062 0,044 0,038 0,032 0,021 0,017

12 0,6 0,168 0,114 0,084 0,065 0,046 0,040 0,034 0,022 0,017

13 0,65 0,186 0,120 0,088 0,069 0,048 0,042 0,036 0,024 0018

14 0,7 0,204 0,128 0,094 0,073 0,051 0,044 0,038 0,025 0,019

15 0,75 0,224 0,137 0,100 0078 0,054 0,047 0,040 0,026 0,021

16 0,8 0,244 0,147 0,107 0,083 0,058 0,050 0,043 0,028 0,022

17 0,85 0,267 0,159 0,116 0,090 0,062 0,054 0,046 0,030 0,024

18 0,9 0,292 0,175 0,127 0,098 0,068 0,059 0,051 0,033 0,026

19 0,95 0,320 0,199 0,145 0,112 0,078 0,068 0,058 0,038 0,030

20 0,99 0,347 0,244 0,178 0,140 0,097 0,084 0,072 0,048 0,037

Table 9

Parities R*[M*,(A)] and R*[aV(A)], calculated for the same samples with nv=4

N 51 52 53 54 R*[Mv(A)] R*K(A)]

1 0,399 0,363 0,688 0,524 0,32 0,64

2 0,945 0,781 0,225 0,848 0,81 0,43

3 0,429 0,488 0,724 0,682 0,68 0,36

4 0,921 0,812 0,913 0,432 0,29 0,69

5 0,778 0,459 0,402 0,1 0,25 0,77

The examples resulted in table 10 testify that probabilities R*[M*,(A)] and R*[aV(A)] can essentially differ. The reasons of such distinction are known. Average value of realizations of a vertical deviation s.f.d. FS(X) and Fv*(X) there can be small enough, and their average quadratic deviation - greater and on the contrary. In other words, examples tables 10 testify that comparison of efficiency of criteria of check of hypotheses differing статистик not always is justified. First, because the result of comparison depends on distribution Fv*(X), i.e. the result of comparison not is a rule, and secondly because statistics of criteria can have various physical sense, for example M;(A) and о*,(А).

О .9 О .8 в.7 О .6 О .5 а .4

о.з о.г o.i о.о

F*K(A)]

А*

Ж Ж ж ж

Аж ЖА Ж А*

А \L t Ж А

A Ж

Ж i A ж ж

A * A л* ▲

A k * V ж

A * .......J .......

m)

a) Illustration of interrelation F*[Bv] and F*[Mv(A)] at nv=3

l .□ 0.9 О .8 0.7 0.6 0.5 0.4 О.З о.г

О . 1 0.0

г

к

A* 'Г ж А

ч *Ai

Sa Ж

Ж

A* A

> a' A

A i Ж A Ж ж

Ж i* Ж

л. 4*

О О.2 О.4 О.6 О.8 1 O.I О.З 0.5 0.7 0.9 Illustration of interrelation F*[Bv] and F*[av(A)] at nv=

c) Illustration of interrelation F [Mv(A)] and F [av(A)] at nv=3

Fig.12. Correlation field of interrelation of probabilities of display of realizations statistic

Moreover, as they can be independent as well as M*v(A) and a*,(A), casual character of divergence F*(X) from Fv*(X) by criterion with statistics M*v(A) yet does not mean, that divergence F*(X) from Fv*(X) by criterion with statistics a*,(A) it will appear also casual. In the illustrative purposes on puc.10 the correlation field of interrelation of probabilities of display of realizations statistics k Bv, M*v(A) and a*, (A), calculated for the same samples from nv=3 random

variables is resulted.

Calculations spent in following sequence: - For each sample from nv random variables In regular intervals distributed in an interval [0,1], realizations Bv are calculated, M*v(A) and a*,(A). Calculations are spent N time, where N-number of iterations. Results of calculations brought in the table and which form is shown on fig.13.

N Bv, MV( A) <(A) F*(Bv) F*[MV(A)] F*K(A)]

Fig.13. Table of initial data

- Ranking of realizations Bv of the table and by way of increase in numerical values Bv is spent. Together with Bv,i move and corresponding Bv,i values M*. i (A) and a*. i (A);

- Pays off Fi*[Bv] = 1N with i=1,N and are brought in a column F*(Bv) of the table A

- In the table B, the similar table A, ranking of realizations M*,( A) is spent and further under the formula Fi^M^A)] = ^N corresponding MV(A) probabilities Fi*[M*v(A)] are calculated

- For each value of statistics M*,( A) from the table A there is a value equal to it in the table B and corresponding value of probability F*[M*v(A)] which is brought in a column F*[M*v(A)] of the table A

- In the table B ranking of realizations a*v(A) is spent and further under the formula Fj*[av(A)] = 1N corresponding a^A) probabilities Fi*[a*,(A)] are calculated

- For each value of statistics a*v(A) from the table A there is a value equal to it in the table B and corresponding value of probability F*[a*,(A)] which is brought in a column F*[a*,(A)] of the table A

As follows from the resulted figures the essential interrelation between probabilities F*[Bv] and F*[M*v(A)] or F*[a*,(A)] observed. This interrelation has evident physical interpretation: with growth Bv grow, on the average, M*v(A) and a*) (A).

Figure 12 full enough characterizes weak interrelation between M*,(A) and a*v(A). Therefore and the answer to a question on, whether is enough to check up character of divergence F*(X) from Fv*(X) only on one statistics Bn it appears ambiguous, and the priority is given expediency of attraction to the decision of all statistic.

VII. RECOGNITION EXPEDIENCY OF CLASSIFICATION OF MULTIVARIATE DATA

Above-stated testifies to necessity of check of the assumption expediency of classification of multivariate data by the criteria reflecting the basic properties of random variables of a vertical

divergence of distributions FZ(X) recommended by authors look like: If Se(A)<So.5(A), If Se(A)>So.05(A), If So,5(A)<S(A)>So,o5(A),

and FVV (Y). Conditions of check of possible assumptions

(14)

that Hi that H2 that H3

where S(A) - one of possible statistics a random variable A; S0.05(A) and S0,5(A) - critical values of statistics with a significance value, accordingly 0,05 and 0,5; H1, H2 and H3 - assumptions, accordingly, about casual character of a divergence F*(X) and FVV(Y) and inexpediency of

classification of data; about not casual divergence F*(X) and FVV(Y) and expediency of

classification of data; expediency of an estimation and comparison of risk of the erroneous decision for H1 and H2.

Algorithm of an estimation of expediency of classification of multivariate data we shall consider on a following example. Let sample with nv=4 is set: {0,151, 0,341, 0,259, 0,120}. Random numbers are received by program way, are called pseudo-casual, and have uniform distribution in an interval [0;1]. The basic assumption: random numbers of sample have uniform distribution in an interval [0;1]. For check of this assumption that is identical to the assumption of inexpediency of classification of final population of statistical data, we shall calculate realizations of values of a vertical divergence of distributions FS(X) from Fv* (X). It is easy to be convinced, that they are equal: A1= -0,349; A2= -0,660; A3= -0,498; A4= -0,130. Results of calculations Am,E; Bv,e; M*ve(A) and a*vE(A) under formulas 8,10; 2 and 4 are resulted in table 10. Here probabilities

R*(Am); R*(Bv,e); R*P<e(A)] and R*[<e(A)] are resulted

Table 10

Results of calculations at an estimation of expediency of classification of statistical data

N Statistics Estimations craTHcmK R*[S;(A )] The decision

1 Am,E -0,660 0,02 H2

2 Bv,E 0,660 0,02 H2

3 m;,E(a) 0,407 0,03 H2

<e(A)

4 0,224 0,27 H3

As follows from table 10, three from four criteria testify that the set sample is unpresentable, and the lead classification is expedient (H2). And only size of an average quadratic deviation of realizations of vertical divergence FS(X) from Fv* (X) has probability 0,27 that value of a error II type testifies about necessities of the account of this property and attraction to check of hypothesis H1.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Thus, one criterion testifies to expediency of classification of data, others - on the contrary, testify to uniformity of compared data. Thus, it is necessary to answer a question: whether there are properties of compared random variables on which they cannot be acceptance homogeneous? In our example is. This question and character of the answer are natural.

CONCLUSION

1. The expediency of classification of final population of multivariate data, in other words, presence significant VA established based on the theory of check of statistical hypotheses.

2. Criterion of check of a hypothesis is the condition non-ascendance empirical value of statistics of its critical value. As statistics, Kolmogorov's Dn statistics is most often used. However random variables of vertical divergence FS(X) from Fv*(X) are characterized also:

- the greatest on absolute size and constant on a sign a vertical divergence Am;

- the greatest on absolute value of divergences Bv;

- average arithmetic value of an absolute divergence M*. (j A|)

- average quadratic value of an absolute divergence a*, (a|) ;

This list could be continued. But the main thing here is that each of considered above statistics characterizes, distinct from other properties of random variables A property and has the distribution R*[Si(A)]. The importance of properties is defined by a parity Bv = A • n~b, which calculated on concrete sample. The M^ (A) is less the importance VA above and on the contrary.

3. To compare with these curves on capacity, certainly, it is possible. As a result, of comparison we shall be convinced that the greatest capacity has criterion which statistics characterizes properties of random variables the samples having among all other properties the greatest importance. For following sample with considerable probability, the importance of properties of random variables can essentially change.

REFERENCE

1. Dyakov A.F., Isamuhamedov Y.Sh. Modern a condition of electric power industry of Russia and factors of decrease in reliability of electro supply. Methodical questions of research of reliability of greater systems of power. Problems of reliability of systems of power in market conditions. Baku, 2013, 7-15 p.

2. Power transformers. The help book. S.D.Lizunova, A.K.Lohanina. M: Energyatomizdat, 2004 r.-616 p.

3. Farhadzadeh E.M., Muradaliyev A.Z., Farzaliyev Y.Z. Increase of accuracy of an estimation and reliability of comparison of parameters of individual reliability of power units of a state district power station. M.: the Electricity №9, 2008, c.10-17

4. Ryabinin I.A. Bas of the theory and calculation of reliability ship electro power system. "Shipbuilding", 1971, 453 p.

5. Farhadzadeh E.M., Muradaliyev A.Z., Farzaliyev Y.Z. Decrease in risk erroneous classification the multivariate statistical data describing the technical condition of the equipment of power supply system. Journal: «Reliability: Theory&applications», R&RATA (Vol.8 No.2 (29)), 2013, June, USA, p.55-64

i Надоели баннеры? Вы всегда можете отключить рекламу.