УДК 519.21
On Distribution of Sums of Random Variables with Invariant Links and their Modeling
Sergey V. Chebotarev*
Altai State Pedagogical University Molodezhnaya, 55, Barnaul, 656015
Russia
Received 13.02.2019, received in revised form 10.06.2019, accepted 14.07.2019 A general form of distribution of a sum of a finite number of absolutely continuous random variables is obtained, examples of constructing and modeling sequences with averaged links (with invariant links) are considered based on the distribution of the sum of these random variables.
Keywords: sequences of random variables, sum of a finite number of random variables, sum of dependent random variables, distribution of sums of absolutely continuous random variables. DOI: 10.17516/1997-1397-2019-12-5-628-636.
Introduction
n
In paper [1] the sums Sn(Z(n)) = J2 Zt of finite sequences Z(n) = (Zt)tein, In = {1, 2,...,n}
t=i
with Rademacher, lattice and real random variables were investigated. For Rademacher random variables Zt € { — 1, 1}teIn the relationship between the finite-dimensional probability distribution of these sequences and the values of mixed moments was shown. Based on this study we obtained expressions for distribution of sums. In the same paper there were introduced and exploited sequences with averaged links Z(n) = (Zt)t.eIn, In = {1,2,... ,n}, based on the distribution of the sum of random variables of the original sequence Sn(Z(n)) (Shortly: sequences with averaged links or sal). For these sequences
, P(Sn(6n)) = 2k — n) . . ,
P(Zi,Z2, ...,tn)= (n)C k--, V(Zi, Z2,..., Zn) such that £ Zt = 2k — n.
Cn t=i From the properties of such sequences we note that all random variables of a sequence are equally distributed and the joint probabilities of any sets of these random variables are invariant with respect to the replacement of random variables. That is
P(Z»1 = Xi,ii2 = X2,..., iim = xm) = P(4 = Xi, ih = X2,..., ijm = xm)
is valid for any sets (ii,i2,..., im), (ji,j2,... ,jm) € In, for any 1 < m < n and for any sets (xi,x2,..., xm), where x^ € { — 1,1}.
All sequences for which the invariance property holds are defined as a class of sequences with invariant links. Further on, these concepts and results are extended to the case of sequences of lattice and real random variables. In particular, for absolutely continuous random variables in [1], an expression was obtained for a finite-dimensional distribution of random variables with averaged links, constructed from the distribution of sums of the original sequence. But, in contrast to the case of Rademacher and lattice random variables, the general form of distribution of sums of such random variables was not found. In this paper, we find the general form of distribution of a sum of a finite number of absolutely continuous random variables and consider some examples of modeling sequences with averaged links (with invariant links).
*[email protected] © Siberian Federal University. All rights reserved
Distribution of sums of random variables
We consider the problem of finding the general form of distribution of a sum of a finite number of centered absolutely continuous random variables having as their sum an absolutely continuous random variable with a nontrivial distribution.
This problem is similar to the problems that were solved in [2,3]. Therefore, we use the results of these works. In particular, in [2] it is shown (Theorem 3) that for a sequence of Rademacher random variables y = (jt)teN, where jt G { — 1, 1} satisfying the conditions: 1 n
1. - £ M7t ^ 0,
n t=l n^x
2. there exists a weak limit £1/2(7) sequences with nondegenerate distribution
1n
Si/2(Y(n)) = ^Y] Yt ^ Sl/2(Y), ' y ' -,/n n^x '
v t=1
the limiting random variable S1/2(y) has the following distribution density:
^ 00
Kx) = ^=e 2 V vm(Y) • hm(X). (1)
v —'
v m=0
We use this result to solve the stated problem. First, similarly to [3], we approximate the random variables of the original sequence £t by lattice random variables nt s. For this we divide
, N (2k — s — 1 2k — s + 1-' the set of real numbers R as follows: Axs(k) = (- -
'S
for k = 1,..., s — 1,
s 1 s 1
Axs(0) = ( —to,--, Axs(s) = —, to . Set
ss
/ 2k — s \
P(nM = = P(Ct G Axn(k)) = P(t(Axs(k)), k = 0,1,...,s.
n
We also approximate the sum S(g(n)) = £ £t, of random variables of the investigated se-
t=i
n
quence with sums S(ns) = £ nt,s of lattice random variables nt,s. For this we divide the
t=i
t-> a (2k — ns — 1 2k — ns +1"
set of real numbers R as follows: Axsn(k) =
'-■sny
/ns x/n.S
for k = 1,... ,ns — 1,
( ns — 1] (ns — 1 \
Axsn(0) = —to,--, Axns(ns) = —, to . In this case we set
ns ns
/ 2 k_ns \
P[S(ns) = —n^) = P(S(Z(n)) e Axns(k)) = Ps(tn))(Axns(k)), k = 0,1,...,ns.
For ns we show in [4] (see Theorem 2.4) existence of a finite sequence with averaged links, which has the same distribution of sums. In its turn, the same article shows existence of a finite sequence with averaged links of Rademacher type Y(sn) such that
1 s-1
nt,s = — it+i-n. (2)
V s r-0
v i—0
For this sequence we have
1 sn
Fs{i(„))(x) = FS(.?(sn))(x) Vx e R, where S(Y(sn)) =
s t-1
Passing to the limit as s ^ to, we get
1 sn j— sn
Fs{£m)(x) = Fsy)(x) Vx € R, where S(Y) == lim — V Yt = lim —=^TYt. (3)
v t=i v t=i
Comparing the limiting random variables in (1) and (3), we see that S(Z(n)) = S(7) = = y/nSi/2(Y) and we can formulate a statement regarding the density S(Z(n)):
Theorem 1. Let a sequence of centered absolutely continuous random variables Z(n) = (Zt)tein
n
be given on a measurable space (R(n), B(n)), the sum of which Sn = Zt is a non-degenerate
i=i
absolutely continuous random variable with a density distribution n(x). Then the distribution density of the sums of these random variables is as follows:
Kx) = -7=e 2n Y] vm{l) • hm(-^r), (4)
where vm(7) are mixed moments of sequence 7 = (7t)teN.
The proof follows from the above. Corollary 1. Let a sequence of centered absolutely continuous random variables Z(n) = (Zt)tein
n
be given on a measurable space (R(n), B(n)), the sum of which Sn = Zt is a non-degenerate
i=i
absolutely continuous random variable with a density distribution n(x). Then there exists a finite sequence with averaged links Z(n) = (Zt)tein defined on the same measurable space such that its joint distribution function satisfies the following relation:
1 f x0 f xn -1 E X2 ^ ( x )
Fe (x° ,...,x°n)= fTK-^ • • e 2 t=1 * Ys Vm(7) • h^[—^)dxi (5)
V(2n) m=0 v'
n
where x = xt.
t=i
Proof. It suffices to substitute in (6) expression (4) as n(x). □
Construction and modeling of sequences with invariant links. (Examples)
We use Theorem 3.3 from [1] to construct and model a sequence with averaged links based on the sum of random variables of the original sequence. Recall that in this theorem we give an expression for the n-dimensional distribution function of a sequence with averaged links Z(n) = (Zt)teIn constructed from the sums of the original sequence of centered absolutely continuous
n
random variables Z(n) = (Zt)t.eIn, with the sum of the original random variables Sn = ^ Zt
i=i
are the essence of a non-degenerate absolutely continuous random variable with a distribution
n
density n(x), where x = J2 xt. Then the expression for n-dimensional distribution function,
t=i
satisfies the following relation:
, (0 0) = 1 fxl fxn -1 = m(x)
,__0 0 ( n 2\ (6)
fx\.. fvHSx2-
e n ' ¡il(x)dxi • • • dxn,
where x = £ xt, yn(x) = . e . t=i \J2nn
Note that for £(n) the following expression is true:
F
sn(i)(x) = FSn(i)(x) e R
(7)
Let us consider some examples of constructing a sequence with invariant links by the distribution of the sum of these random variables.
Example 1. Let £(n) = (&)
t )tein
e R(n). 1
Let the sum of these random variables have
_
-e 2n, that is, it is a random variable with a
a probability distribution density \i(x)
normal distribution with parameters MSn = 0, DSn = n. Then the n-dimensional distribution density of a sequence with averaged links is
p«(„) (x1,---,xn) =
—n -t(E -4) . . —n -t(E -x2) 1
-e / n(x) = -e / ——
vW
-e 2n =
2 E xt
n i xt H —Tne-+
X\v(xtX
t=i
where y(xt) is the density of standard normal distribution. In this case, the sequence of random variables with invariant links will be the sequence of independent normallly distributed random variables and the simulation of random variables of such a sequence is reduced to simulation the required number of standardly distributed random variables, which does not cause any additional difficulties.
Example 2. Let a sequence be given £(2) = (£t)t.ei2 £ R(2), and let the sum of these random variables have a uniform distribution with a density n(x) = 1I^^i ^j. Here I is the indicator
function of the set {x £ ( — 1, 1)} Then the two-dimensional distribution density of a sequence with averaged links will be
P«i,«t (x1,x2)
- l(x1+xt-^t22) 1 , e V / -1
vW
{x£(-1, 1)}
1 e- J (-i-Xt)21
From here the distribution of random variables of the sequence can be found from the relation
p«i(x) = Pit(x) = p(x) =
e-1 (xt-x)t dx2
x+xt£(-1,1) 1 — x
1
e-4(xt-x)t dx2
V2n
vt
1-x
1-tx
vt
e-2 U du
vt
/1 - 2x\ ( 1 + 2x\
1 + 2xN
V2
Here F^01 (x) is the value of the distribution function of a normal random variable £ with parameters M£ = 0, D£ = 1.
1
e
u
1 — tx
Both random variables have the same distribution, but they are interdependent. To calculate the marginal density of the distributions £1,£2, we can use the MatLab package, or rather its package of symbolic calculations:
1 syms x y
2 int(exp(-(x-y)"2/4),x,-1-y,1-y)
3 ans = -pi"(1/2)*(erf(y - 1/2) - erf(y + 1/2))
As a result, we get
p(y) = - — -Vn ■ (erf(y - 1/2) - erf(y + 1/2)) = - (erf (—^ - erf (-.
Taking into account that
2 fx 2 ( x \ erf (x) = e-t dt and erf( = 2FWol (x) — 1,
Jo VV2^
we have a similar expression.
Let us check, using symbolic calculations in MatLab, the value of mathematical expectation of the obtained random variables: M£ = 0?
1 int ((y*(erf(y+ 1/2) - erf(y - 1/2))) ,y,-inf , inf)
2 ans = 0
We also calculate the variance :
1 int ((y"2 *(erf(y + 1/2) - e rf(y - 1/2))) ,y,-inf , inf)
2 ans = 7/6
As a result, we get
1 1 7 7
D£ =t2J V2 ■ (erf(y + 1/2) — erf(y — 1/2))dy = - ■ - = -.
Consider the relationship between these random variables, namely, we calculate the covariance of these random variables: Declare that A = {(x1,x2)\(x1 + x2) £ ( — 1, 1)}
V2(xi,x2)= cov(xi,x2) = Mxi ■ x2 = —^ xix2e-4(xi-x2) I^dxidx2 =
— oo — oo
2y/n
1 fx fx t
—= x1dx1 / x2e-4(xi-xt) IAdx>
2V n J-oo J-oo
2Vn J -
1 f^ f^ t
—^ x 1 dx 1 / (x2 - x{)e-4(xi-xt)tIAdx2+
2\ln J-x J-x
1 i'X fx t
+--— x1dx1 / x1e-4(xi-xt)t IAdx2. (8)
2 v n J-x J-x
Consider these two integrals separately:
1 /*w fw 2 1 /*w fw 2
1. xidxi I xie-4(Ж1-Ж2) IAdx2 = x\dxi e-4(Ж1-Ж2) IAdx2 =
2VП J-w J-w 2VП J-w J-w
/w
xlp(xi)dxi = Dxi.
w w
2. —= x 1 dx 1 (x2 —x 1 )e-4(Ж1-Ж2)2IAdx2
2у/n J J
1-2x1
w -У2
1 i' f _1 2
xidxi ue 2u du.
— TO — TO —TO —1-2xi
V2
Taking into account that
1 —2Ж1 V2
If _1 „2 1 f (1 + 2xi)2 (1 —2xi)2 \
ue 2 „ du = e 4 - e 4 ,
лДП J лДЛУ
-1-2X1 V2
we get
w -/2 w
1 f , f _1 u2 . If ( '1+2-1»2 (1-2ч)2 \ , л/П
xidxi I ue 2 du = r— xi{ e 4 — e 4 Idxi = --
A/2^7 J л/2П J v У Л/2П л/2'
V2
As a result, we have
1 7
V2(xi,x2) = cov(xi,x2) = ——2 + 12 ~ —0.1238.
Consider the process of modeling a sequence of random variables with invariant links from this example.
We shall proceed from
'1-2x \ „ (- 1-2x^
.. (1 — 2x\ / —1 — 2x\
P(x) = J — FWo Д j
and use the inverse function method to generate random values.
First, we obtain the values of the distribution density and the distribution function of random variables:
1 dx=0.001;
2 x = -20:dx : 20 ;
3 % distribution density calculation
4 p=(normcdf((1 - 2 * x)/sqrt(2),0,1)-normcdf(( - 1 -2 * x)/sqrt(2),0,1));
5 % calculation of the distribution function values
6 F(1) = p (1) * dx;
7 for k = 2: 1: length (x)
8 F(k)=F(k-1) + p(k) * dx;
9 end;
10 % value check F(\infty)=1 ?
11 F( length (x))
Performing the above calculations in MatLab, we obtain the values of the distribution density and distribution functions of random variables.
Further on, using the method of the inverse function and the obtained values of the distribution function, we generate the values of the 1st random variable (Fig. 1).
2
12
Fig. 1. Distribution density £i,£2
1 %We generate n independent random numbers with the distribution
2 % function F(x) using method of the inverse function
3 n = 20;
4 g=rand(2,n);
5 for i = 1 :n
6 k = 1;
7 while F(k)< g(1 , i )
8 k=k +1; 9 end ;
10 s l v 1 ( i )=x (k ) ;
11 end ;
As a result of these calculations, we obtain n samples with the generated value of the first random variable.
The value of the second random variable, taking into account the dependence of their values, is formed using the conditional distribution density of the second random variable, taking into account the obtained value of the first random variable in each specific sample.
1 p2=zeros(n,length(x));
2 % we o b t a i n n c o n d i t i o n a l distribution densities of the 2nd random variable
3 for i = 1 :n
4 for k = 1: length (x)
5 p2 (i , k) = exp ( - ( slv1 ( i ) x(k))"2/4)*Ind ( slv1(i ) , x(k))/(2 *sqrt(pi));
6 end ;
7 p2(i ,:) = p2(i ,:) / (sum(p2 ( ,:)) * dx);
8 end ;
9 % Calculation of the distribution function values for the 2nd random
10 % variable using conditional densities
11 for i =1:n
12 F2(i,1) = p 2 (i ,1)*dx;
13 for k = 2:1: length (x)
14 F2(i , k) = F2(i ,k-1) + p2( ,k)* dx;
15 end ;
16 end ;
17 %Generation of 2nd random variable values by the inverse function method
18 for i =1:n
19 k=1;
20 while F2(i ,k)< g(2, i)
21 k=k +1;
22 en d ;
23 slv2(i)=x(k) ;
24 end ;
Examples of conditional distribution densities £2 are shown in Fig. 2.
Fig. 2. Conditional distribution density £2 in 1-st and 2-nd samples
Verification of the obtained results using the Kolmogorov criterion showed the consistency of the modeled data with theoretical distributions:
1 cdf = [x 1 F 1 ];
2 [H,P,KSSTAT,CV] = kstest (slv1 , cdf ,0 .01 )
3 H = 0
4 P = 0.0179
5 KSSTAT = 0.3326
6 CV = 0.3524
1 [H,P,KSSTAT,CV] = kstest (slv2 , cdf ,0 .01 )
2 H = 0
3 P = 0.1991
4 KSSTAT = 0 . 2 3 1 7
5 CV = 0 . 3 5 2 4
6
7 sm=slv 1+slv2 ;
8 y= unifcdf(x,-1 ,1);
9 cdf = [x 1 y 1 ] ;
10 [H,P,KSSTAT,CV] = kstest (sm, cdf ,0 .01 )
11 H = 0
12 P = 0.4014
13 KSSTAT = 0 . 1 9 2 0
14 CV = 0.3524
Simulation results: the numbers of samples are shown in the columns, and the values of random variables in the lines (Tab. 1).
Table 1.
N 1 2 3 4 5 6 7
ei 0.6860 -0.8730 0.2590 -0.4500 1.3150 -0.7690 1.3130
0.1920 1.4570 -0.9850 0.3390 -0.3440 1.6200 -0.8460
ei + ь 0.8780 0.5840 -0.7260 -0.1110 0.9710 0.8510 0.4670
N 8 9 10 11 12 13 14
ei 0.6450 -0.1510 0.6230 0.3070 0.7900 0.3560 0.5000
e2 -1.1310 0.9340 0.3230 -1.1960 0.1310 0.2410 -0.4810
ei + e2 -0.4860 0.7830 0.9460 -0.8890 0.9210 0.5970 0.0190
N 15 16 17 18 19 20
ei 0.3070 0.4150 -0.4540 -0.9930 0.3900 1.2580
e2 -0.8380 -1.3040 -0.4750 1.2400 -0.5670 -1.9750
ei+e2 -0.5310 -0.8890 -0.9290 0.2470 -0.1770 -0.7170
Above, in the process of calculations, we used the function Ind(x,y), the indicator function of the set \x + y\ < 1.
1 function Ixy = Ind( x,y )
2 if abs(x+y)<1 Ixy = 1; else Ixy = 0;
3 end
References
[1] S.V.Chebotarev, On the equivalence of finite sums of random variables, Vestnik BGPU, series: natural and exact sciences, 4(2004), 108-116 (in Russian).
[2] S.V.Chebotarev, About limit distribution of sums of random variables, Journal of Siberian Federal University. Mathematics & Physics, 9(2016), no. 1, 17-29.
[3] S.V.Chebotarev, On the limit distribution of sums of real random variables. Journal of Siberian Federal University. Mathematics & Physics, 10(2017), no. 3, 310-313.
[4] S.V.Chebotarev, About sequences of random variables with averaged relationships, Vestnik AltSPA, seriya: estestvenye i tochnye nauki, 7(2011), 28-37 (in Russian).
О распределении сумм случайных величин с инвариантными связями и их моделировании
Сергей В. Чеботарев
Алтайский государственный педагогический университет Молодежная, 55, Барнаул, 656015
Россия
В 'работе получен общий вид распределения суммы конечного числа абсолютно непрерывных случайные величины, рассмотрены примеры формирования и моделирования последовательностей с усредненными связями (с инвариантными связями) исходя из распределения суммы этих случайных величин.
Ключевые слова: последовательности случайных величин, сумма конечного числа случайных величин, сумма зависимых случайных величин, распределение сумм абсолютно непрерывных случайных величин.