Научная статья на тему 'New approach in correlation analysis'

New approach in correlation analysis Текст научной статьи по специальности «Математика»

CC BY
94
53
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
correlations / functional least-square method / extraction of a small signal / influence and remnant functions.

Аннотация научной статьи по математике, автор научной работы — R. R. Nigmatullin

In this paper, a new method for restoration of the desired correlations is proposed. It allows to evaluate the “content” of the external factors (l = 1, 2, ..., L) setting in the form of data arrays (l) ym (x)(m = 1,2,...,M) inside the given Ym (x) function that is supposed to be subjected by the influence of these factors. As contrasted to the conventional correlation analysis, the proposed method allows finding the “influence” functions bl (x) (l = 1, 2, . . . , L) and evaluating the “remnant” array Gm (x) that is remained as a “quasi-independent” part from the influence of (l) the factors ym (x). The general expression works as a specific “balance” and reproduces the wellknows cases, when bl (x) = Cl (it is reduced to the linear least square method with Gm (x) ∼= 0) and coincides with the remnant function Ym (x) ∼= Gm (x), when the influence functions becomes negligible (bl (x) ∼= 0). The available data show that the proposed method allows to extract a small signal from the “pattern” background and it conserves its stability/robustness in the presence of a random fluctuations/noise. The method is rather flexible and allows to consider the cases of strong correlations, when the external factors act successively, forming the causeand-effect chains. It can be generalized for expressions containing the bonds expressed in the form of memory functions. The proposed method adds new quantitative ties expressed in the form of the desired functions to the conventional correlation relationships expressed in the form of the correlation coefficients forming, in turn, the correlation matrices. New relationships allow to understand deeper the existing correlations and make them more informative, especially in detection of the desired deterministic and stable bonds/laws that can be hidden inside.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «New approach in correlation analysis»

ISSN 2072-5981 doi: 10.26907/mrsej

aänetic Resonance in Solids

Electronic Journal

Volume 21 Special Issue 3 Paper No 19308 1-18 pages

2019

doi: 10.26907/mrsej-19308

http: //mrsej. kpfu. ru http: //mrsej. ksu. ru

Established and published by Kazan University Endorsed by International Society of Magnetic Resonance (ISMAR) Registered by Russian Federation Committee on Press (#015140),

August 2, 1996 First Issue appeared on July 25, 1997

© Kazan Federal University (KFU)*

"Magnetic Resonance in Solids. Electronic Journal" (MRSey) is a

peer-reviewed, all electronic journal, publishing articles which meet the highest standards of scientific quality in the field of basic research of a magnetic resonance in solids and related phenomena.

Indexed and abstracted by Web of Science (ESCI, Clarivate Analytics, from 2015), Scopus (Elsevier, from 2012), RusIndexSC (eLibrary, from 2006), Google Scholar, DOAJ, ROAD, CyberLeninka (from 2006), SCImago Journal & Country Rank, etc.

Editor-in-Chief Boris Kochelaev (KFU, Kazan)

Honorary Editors

Jean Jeener (Universite Libre de Bruxelles, Brussels) Raymond Orbach (University of California, Riverside)

Executive Editor

Yurii Proshin (KFU, Kazan) mrsej@kpfu. ru

This work is licensed under a Creative Commons Attribution-Share Alike 4.0

International License.

This is an open access journal which means that all content is freely available without charge to the user or his/her institution. This is in accordance with the BOAI definition of open access.

Editors

Vadim Atsarkin (Institute of Radio Engineering and Electronics, Moscow) Yurij Bunkov (CNRS, Grenoble) Mikhail Eremin (KFU, Kazan) David Fushman (University of Maryland, College Park) Hugo Keller (University of Zürich,

Zürich)

Yoshio Kitaoka (Osaka University,

Osaka)

Boris Malkin (KFU, Kazan) Alexander Shengelaya (Tbilisi State University, Tbilisi) Jörg Sichelschmidt (Max Planck Institute for Chemical Physics of Solids, Dresden) Haruhiko Suzuki (Kanazawa University, Kanazava) Murat Tagirov (KFU, Kazan) Dmitrii Tayurskii (KFU, Kazan) Valentine Zhikharev (KNRTU,

Kazan)

Technical Editors of Issue

Maxim Avdeev (KFU) Alexander Kutuzov (KFU)

* In Kazan University the Electron Paramagnetic Resonance (EPR) was discovered by Zavoisky E.K. in 1944.

Short cite this: Magn. Reson. Solids 21, 19308 (2019)

doi: 10.26907/mrsej-19308

New approach in correlation analysis

R.R. Nigmatullin

Kazan National Research Technical University (KAI), Kazan 420111, Russia

E-mail: [email protected]

(Received March 16, 2019; revised March 20, 2019; accepted March 31, 2019; published April 19, 2019)

In this paper, a new method for restoration of the desired correlations is proposed. It allows to evaluate the "content" of the external factors (l = 1, 2,..., L) setting in the form of data arrays ym (x)(m = 1, 2,..., M) inside the given Ym (x) function that is supposed to be subjected by the influence of these factors. As contrasted to the conventional correlation analysis, the proposed method allows finding the "influence" functions 6; (x) (l = 1, 2,... ,L) and evaluating the "remnant" array Gm (x) that is remained as a "quasi-independent" part from the influence of the factors y^ (x). The general expression works as a specific "balance" and reproduces the well-knows cases, when 6; (x) = C; (it is reduced to the linear least square method with Gm (x) = 0) and coincides with the remnant function Ym (x) = Gm (x), when the influence functions becomes negligible (6; (x) = 0). The available data show that the proposed method allows to extract a small signal from the "pattern" background and it conserves its stability/robustness in the presence of a random fluctuations/noise. The method is rather flexible and allows to consider the cases of strong correlations, when the external factors act successively, forming the cause-and-effect chains. It can be generalized for expressions containing the bonds expressed in the form of memory functions. The proposed method adds new quantitative ties expressed in the form of the desired functions to the conventional correlation relationships expressed in the form of the correlation coefficients forming, in turn, the correlation matrices. New relationships allow to understand deeper the existing correlations and make them more informative, especially in detection of the desired deterministic and stable bonds/laws that can be hidden inside.

PACS: 02.50.-r, 02.60.-x, 02.70.Rr, 05.40.Ca.

Keywords: correlations, functional least-square method, extraction of a small signal, influence and remnant functions.

This original paper I would like to devote to my Teacher Prof. Dr. Boris I. Kochelaev

Preface

When I was the PhD student (1970-1973) I was left by my Teacher with the problem alone, because he was invited to spent his "sabbatical leave" in the USA again. This time was very difficult from one side and was instructive for me from another side.

Finished the Physical faculty of Kazan State University with red diploma and became the PhD student, I should prove to myself that I am able to solve a difficult problem being alone. During two years, I solved a problem of relaxation in liquid He3'4 mixtures and obtained the desired expressions. However, absolutely the same result was obtained by another physicist in the frame of the Zubarev's NSO approach from Krasnoyarsk institute of Physics and I forced to change a subject for consideration of the processes of relaxation in paramagnets at low temperatures. During one year, I solved another problem and received the desired degree in 1974 year.

I never forget these "testimony" years and I would like to express my acknowledgements to my Teacher for this "lesson" that I received in that time. These years convinced me that I am able to work alone and solve the problems that I can meet in my life. Now I work in the Technical University (KNRTU-KAI) and participate in solution of engineering problems. I consider them as a new challenge that tests my ability and competence. I can consider myself as successful

scientist because my Hirsch index is 27 and number of citations exceeds 3000 (you can visit the site: http://expertcorps.ru/science/whoiswho/ci86 for more details). Thanks to my Teacher when he left me alone, I had a possibility to test myself for the usefulness to the chosen scientific carrier.

A few words about the original paper that I want to suggest to an attentive reader. I wrote more than 250 papers and each of the published paper is considered as my "scientific child". Some of them received a happy fortune; some of them were forgotten. I consider myself as an "armor" who proposes new methods for attacking the difficult problems that needs to be "defeated" with the help of new "arms". However, now I work actively in three "hot" regions: (a) dielectric spectroscopy, (b) fractals and fractional calculus and (c) development of new statistical methods in extraction of deterministic information from the random signals. The original paper presented below belongs to my last but not the least section (c). It solves one interesting and general problem of the mathematical statistics and it will be useful for a wide circle of engineers and experimentalists, which deal with treatment of different data. I do hope also that this paper will attract an attention of the specialists that work in NMR/EPR applications as well, especially in detection of weak signals affected by a "noise".

1. Introduction and formulation of the problem

If any reader have a look at the wide set of books and papers meant for practitioners [1-10] related to correlation analysis then he has a right to pose a reasonable question: what kind of new elements the ambitious author of the paper is trying to introduce in order to convince the skeptically tuned expert that these innovative elements are really new and will be useful for practical applications? If a reader will print in his computer "correlation and regression analysis" then he can find about 25 million results (!) associated with this subject. This huge "information wall" can serve as a real obstacle in attempts to suggest some new and original elements that "go out" from the conventional Pearson correlation analysis and its existing generalizations.

Nevertheless, the analysis of the basic literature [5-15] related to the establishing of the desired relationships between two correlated random sets of data allows finding the desired new and important elements that can be helpful and useful in many applications. In order to be more specific let us formulate the problem in details, including some necessary definitions.

Let the set {xj} (j = 1,2,... ,N) coincides with the input variable and Ym (xj) determines the rectangle matrix of the corresponding responses (outputs), where the parameter m = (1,...,M ^ N, M > 1) determines the number of successive measurements related to the given input set.

In accordance with the general theory related to the quasi-reproducible experiment (GTQRE) [16-17] one can derive a general functional dependence for the set of Ym (xj) and this dependence can be expressed in the form of the generalized fitting function related to the Prony function. In many practical applications, it is necessary to establish the correlation relationships between measurements that were obtained for the same set of input variables and corresponding to similar experimental conditions. We define these output variables as: ym(xj), l = 1,2,...,L, where parameter l determines the number of the external factors. Mathematically, the problem that we are going to solve can be formulated as: Is it possible to relate at least approximately the set of the functions Ym (xj) with other functions ym(xj) for removing their correlation dependence and separate its "remnant" or almost independent part?

We try to find the original solution of this problem that will have large applications related to various data.

2. Description of the proposed algorithm

In many cases, it is impossible to express the single-valued variable xj through the chosen function ym(xj) (except the cases of the simplest dependences), because the output ylH(xj) for any fixed m expressed in the form of the multi-valued function and the analytical transformation xj ^ F (ym) becomes impossible. Usually, in this case one can use the statistics of the fractional moments [18,19] and calculate the complete correlation factor that generalizes the conventional Pearson correlation coefficient over all space of the fractional moments. However, in many cases, it is necessary to know the "share" of correlations of one random variable with respect to another one in the form of an "influence" function and the problem in this formulation, as far as we know, is not solved. Mathematically, this correlation relationship (in the case when different external factors are considered independently from each other) can be expressed approximately in the form of the following linear combination:

L

Ym(xj) = Y (k(xj)> y(J)(xj) + Gm(xj), =1

(1)

where the sets bl (xj), Gm (xj) determine the unknown functions that are needed to be evaluated. In expression (1) we assume that the "influence" functions are expressed in the averaged sense bmi (x) ^ (bl (x)) and Gm (x) ^ (G (x)) (for simplicity we omit the index j) that implies that they do not depend on index m. We suppose also that variables Ym (xj), ym (xj) can be close in the statistical sense and, therefore, they are correlated to each other in some extend. For their evaluation, one can use the functional linear least square (FLLS) method, proposed in papers [16-17], requiring that the functional dispersion between the functions figuring in (1) achieve a minimal value:

a(x) = min

1

M

M ^

m=1

Ym(x) — Y (bs(x)> y^x) — (G(x)>

s=1

(2)

Taking the functional derivatives with respect to unknown functions (bi (x)), (G(x)) we obtain:

Sa(x)

1

M

Y

S (bi(x)> M m-1

\ \ /1 m=1

Sa(x)

y£)(xW Ym(x) — £ (bs(x)> ym(x) — (G(x)>

s=1

0,

(3)

1

S (G(x)> M

M

Y

m=1

Ym(x) — Y (bs(x)> ym(x) — (G(x)>

s=1

0.

Here we apply the averaging procedure over all admissible measurements (m = 1, 2,..., M ) supposing that the unknown functions (bi (x)> do not depend on the current index m. Defining the pair correlation functions

1 M 1 M Qi (x) = M Y Ym(x)ym)(x), Qs,i (x) = — Y ymXx^xl

m=1

m=1

(4)

1 M 1 M

(Y(x)> = M Y Ym(x), (y(l)(x)) = M Y y(m\x), s,l = 1,

= 1, 2,...,L,

m=1

m=1

one can obtain the system of linear equations with respect to unknown functions (bs (x)) figuring in (3)

2

L

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

EVs,i(x) <bs(x)} + <G(x)} (y(l)(x)) = Qi(x),

S=1L (5)

E <6-(®)K+ <G(x)) = <Y(x)}.

s=1

From linear system (5) one can calculate the desired functions <bs (x)) and <G (x)} and, finally, find the solution of equation (1). Here we want to notice that for calculation of the functions <bs (x)} we need to know the value of <G (x)}, however from (1) one can restore directly the set of desired functions Gm (x) for any fixed m. One can simplify equation (5) also, if one can express the unknown function <G (x)} from the second line and insert this expression into the first line of (5). After simple algebraic manipulations, one can obtain the following expression:

L

E^s.l (x) <bs(x)} = Ql(x),

s=1

qs,i(x) = qs,i(x) - (y(s)(x)\(y(l)(x)\,

~ ( (l) N (6)

Qi(x) = Qi(x) -<Y(x)}(y(l)(x) L(

' (s)(

<G(x)} = <Y(x)}-J>s(x)H y(s) (x)

s

s=1

These expressions solve the problem of calculation of the set of true quasi-independent functions Gm (x) from (1) based on the calculated set of the "influence" functions <bs (x)} from (6). It is instructive to write down some expressions for the simplest cases, having large practical importance.

1. The influence of one factor (L = 1). For this case system (6) has the following solution:

«'I'*» = ill'

<G(x)} = <Y(x)} - <bi(x)} (y(1)(x)|

i M 2 (7) 9ii(x) = M E (Ay(m)(x^ , Ay«(x) = ymi)(x) - (y(1)(x)) , ( )

m=1 1M

Q31(x) = mT, AYm(x)Ay(1), AYm(x) = Ym(x) - <Y(x)}.

m=1

It is useful to consider the limiting cases. (a) Let us suppose that

AYm(x)- AAy^x). (8)

In this case, (after averaging (8) over all measurements m = 1,2,..., M) we have

<AY(x)} = A ^Ay(1)(x)^ + e(x), (9)

and the calculation of the unknown parameter A is reduced to the linear least square method (LLSM). The mean value in (9) is replaced by numbers that are calculated over the input variable x. If A = 1 (it implies the complete coincidence of the function (x) with y^ (x)) then Gm (x) = 0. For A = 1, the influence of the function G (x) becomes negligible and it is replaced in (9) by a fitting error function e (x).

(b) Let us suppose that the number of measurements is small Ym (x) = (Y (x)) (it corresponds the case of an "ideal" experiment considered in [16-17]). In this case, the influence of correlation with the function (yi (x)) becomes negligible ((61 (x)) = 0). In practice, this condition can be tested easily if we analyze the condition (Y (x)) = (G (x)) with the help of the Pearson correlation coefficient. If this coefficient (PC) is located in the interval 0.9 < PC < 1, then one concludes that the arrays Ym (x) and y^^ (x) do not correlated with each other or, in other words, they practically independent from each other. Therefore, expression (7) covers the cases known earlier.

2. The influence of two factors (L = 2).

For this case from system (6) one can receive the following expressions for (61,2 (x)):

(6 (x)) = Qi(x)g2,2(x) - Q2(x)gi,2(x) (b (x)) = 0?2(x)gl,l(x) - Q1 (x)gj,2(x) (10) <?1,l(x)q;2,2(x) - (<?1,2(x))^ <?1,1(x)q;2,2(x) - (<?1,2(x))2

Here the matrix components are defined by the following expressions:

1 M 1 M

9s,i(x) = — E Ay(m)(x)Ay(m)(x), Q(x) = — E AYm(x)Ay(m)(x), (11)

m=1 m=1 ( )

Afm(x) = fm(x) -(/(x)) , S,1 = 1, 2.

The "remnant" function (weakly correlated part of Ym(x)) is given by expression:

2

Gm(x) = Ym(x) - E (6l(x)) y^>(x). (12)

1=1

Expressions (10)-(12) are considered as the final expressions for the case of influence of two additive factors. In any case considered above, it is supposed that the number of similar measurements M exceeds the number of external factors L (M > L).

The previous calculations were associated with the case when all external factors act inde-pendently/additively from each other. This case is expressed mathematically by expression (1). Is it possible to consider the case when the factors are strongly correlated and act successively, i.e. when one dominant factor (cause) evokes the appearance of another factor (effect)? In this case, locating the cause-and-effect relations in the right order one can write the following relationship:

-I-r / t \ \ \°s(x)>

Ym(x)= Gm(x)n (y(m)(x^ . (13)

s=1

Taking the natural logarithm from the both parts, we obtain

L

ln (Ym(xj)) = E (6l(xj)) ln (y(m)(xJ)) + ln (Gm(xj)). (14)

1=1

Formally, equation (14) is similar to expression (1), however, in addition one can determine the values of the logarithmic functions, when the function yin^x^) becomes negative and equals zero. Therefore, all calculations considered above can be applicable for this case too. The proposed method allows also combining these two cases and considering more complex problem. In order to show its generality we rewrite expression (13) in the following form

(ws(x)>

Fm)(x) = G^ (x) n (y(m)(x^WsX, s = 1,2,..., l. (15)

s=1

Equation (15) allows finding the set of the unknown functions Gm(x) and the weighting factors (ws (x)) with the help of equation (14). After that, one can apply again equation (1) representing it in the form:

L

Ym(x) = Y (b (X)) Gm (x) + Rm(x). (16)

l=1

From this equation, it is possible to find the additive weighting factors (bi(x)) and the remnant function Rm(x) that is defined again as the weakly-correlated part of Ym(x).

In the case of cause-and-effect relations, a special attention should be paid for the negative values of functions figuring in expressions (13) and (15). The negative values of a power-law function f (x) in the space of real numbers can be defined as

Re (V(x)) =Re Q(-f(x))±a(x)]) = |f(x)|±a(x) cos(a(x)n). (17)

The region when f (x) = 0 requires some regularization procedure in any partial case, when a(x) becomes negative or close to zero value (a(x) = 0). In conclusion of the section 2, one can show also another possibility, where the proposed algorithm could be applied also. Let us suppose that the measured function Fm(t) is formed by a linear combination of the correlated functions related to the initial one by means of the convolution operation, i.e:

L t

Fm(t) = E / (№(t - T)) fin)(r)dT + (t). (18)

l=l o

In expression (18) the set of the functions (^l(t)) determines the so-called "correlation memory" that arises between the correlation function fm(t) evoked by the factor l (l = 1,2,... ,L) and the initial function Fm(t) that can be affected by this factor. These correlations are important in cases when they are expressed in the form of the Riemann-Liouville fractional integral [20]. Applying the Laplace transform to the both parts of (18), we obtain:

L

Fm(p) = E (^l (P)) fin)(P) + (P), l=1

CO (19)

J(t) := J(p) = J J(t) exp (-pt) dt. o

The first line in (19) is similar to expression (1) and, therefore, one can apply the algorithm developed above for this nontrivial case. From our point of view, expression (19) can be useful in analysis of different dielectric "mixtures", when the total complex permittivity can be presented approximately as an additive combination of different complex permittivities associated with different dielectric materials.

In order to outline the limits of the proposed algorithm one can find the answer to the following question: how many minimal measurements (that is given by the value of M) are necessary for its application? The analysis shows that for calculation of the mean values given by expressions (4) only two repetitions of the same measurement (M = 2) are necessary. However, these two successive measurements cannot create the statistically significant sampling and, therefore, it is necessary to collect a "rich" sampling with number of measurements exceeding M > 10, as minimal. How to do it artificially if a researcher cannot realize the representative sampling? In this case, one can suggest an approach that is described below in section 3.

3. Creation of a representative sampling

Let us suppose that we definitely know only two measurements coinciding with the limits of a possible sampling Ydn(x) and Yup(x). These functions define the "down" and "up" limits of the desired sampling, respectively. Let us define the variable from the interval [0,1]

m — 1

vm = m-J, m = 1,...,M. (20)

Then we define the function from the same interval F(vm) = 3(vm)2 — 2(vm)3. This specific choice is stipulated by the fact that at F(0) = 0, F(1) = 1, F(1/2) = 1/2 and, therefore, the desired array can be obtained as:

Ym(xj) = Ydn(xj) + F(vm) (Yup(xj) — Ydn(xj)),

(21)

Ydn(x) ^ Ym (x) ^ Yup(x), j = 1,2,... ,N, m = 1,...,M.

Let us suppose that a priory 3 functions are known: Ydn(x), Yup(x) and some intermediate function f1(x). We look for the function F(v) in the form of the polynomial of the second order:

F (v) = av + bv2. (22)

For the finding of the unknown functions, a(x) and b(x), we take into account the following conditions: F(1) = 1, F(1/2) = w1(x) = (f1(x) — Ydn(x)) / (Yup(x) — Ydn(x)). After simple algebraic transformations we obtain again the structure (21) with the function F(vm,w1(x)) from (22), where the functions a(x) and b(x) are defined as

a(x) = 4w1(x) — 1, b(x) = 2 — 4w1(x). (23)

By complete analogy with this result, one can obtain the desired expressions for two known intermediate functions: fk(x) (k = 1,2). We subordinate the function F(v) to the following conditions: F(1) = 1, F(1/3) = w1(x) and F(2/3) = w2(x) with

wk(x) = (fk(x) — Ydn(x)) / (Yup(x) — Ydn(x)), k = 1,2

and consider the polynomial of the third order:

F (v) = av + bv2 + cv3. (24)

After some algebraic manipulations, one can find the desired functions from the linear system of equations

a(x) + b(x) + c(x) = 1,

3 a(x) + 1 b(x) + 27 c(x) = w1(x) (25)

3 a(x) + 4 b(x) + 27 c(x) = W2(x).

The system admits the solution:

a(x) = 1 (2 + 18wi(x) — 9w2(x)), 9

b(x) = —9 (1 + 5wi(x) — 4w2(x)), (26)

9

c(x) = - (1 + 3w1(x) — 3w2(x)).

Finally, we obtain the same mathematical expression (21), where the function F(v) should be replaced by expression (24) with the functions from (26). More complex cases with intermediate functions fk(x) (k = 1,2,...) can be considered and, therefore, the problem of generation of an artificial array in general case can be solved by a similar way.

Expression (21) generates a set of the functions from the interval [Ydn(x), Yup(x)]. If necessary, one can add the intermediate functions fk(x) (k = 1,2,...) and use more complicated expressions (22) and (24) shown above. These expressions can be applied for generation of the desired data in cases, when the creation of the arrays becomes impossible. This situation is observed in economics, meteorology and biological/medical sciences.

Let us suppose that we fixed the random curve Ydn(to) at the moment to and to the moment tM it accepted the form Yup(tM). Then, introducing the normalized temporal variable

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Vm = tm = to + m (tM - to), m = 0,1,...,M, (27)

tM — to M

one can create an artificial data array with the help of expression (21) and its generalizations given below. Actually, expression (27) can be considered as a transformation of one random function to another one and this transformation can be useful in many practical applications.

In conclusion of this section, we would like to show how to obtain the quantitative/numerical estimations for the influence functions (b(x)) and the remnant function Gm(x). For this purpose, we take the average values from the functions Ym(x), yin^x) and Gm(x)

1 M 1 M 1 M

(Y(x)) = M E Ym(x), (y(1)(x)) = - E ym(x), (G(x)) = M E Gm(x), (28)

m=1 m=1 m=1

and then we replace in expression (1) the functions (bj(x)) ^ b by the set of unknown constants and (G(x)) by the constant g. In the result of this transformation we obtain

L

(Y (x)) = E bi( y(0(x)) + g. (29)

z=i

The set of the desired constants bj (l = 1,2,..., L) and g can be evaluated from (29) with the help of the linear least square method. In particular, for L = 1 the constants b1 and g coincide with the value of the slope and intercept, accordingly.

4. Examples

4.1. Verification of expressions (7) on mimic data

Let us consider the following problem. From some measurements we have a "pattern" background that is described mathematically by an array ym^x). This background is distorted by a "small" signal Sgn(x) that was not appeared in the background earlier. Is it possible to "notice" the presence of the function Sgn(x) inside the function Ym(x) and extract this small signal in the presence of a noise? We should stress here that this common problem is solved in many branches of natural/technical sciences:

1. Detection of a small additive x(c) on the "pattern" background - electrochemistry, microbiology, chemical technology etc.

2. Detection of a small signal Sgn(t) on the radio-interference background.

3. The general problem "friend-or-foe" identification in military science.

For demonstration purposes, we choose the following functions:

(a) "Pattern" array will be identified with yin^x) and generated by expression (21) with the limiting functions: Ydn(x) = sin(0.4x — n/8), Yup(x) = cos(0.67x — n/8) + 3; the array size: M = 100; number of discrete points: j = 0,1,... ,N = 500; the interval of the input

variable: xmin = 10 3,

Xm

10, Xj

xm

+ (j/N) (x max xmin)-

(b) The tested massive Ym(x) is generated again with the help of (21) and located in the interval Ysdn(x) = Ydn(x) + Sgn1(x), Ysup(x) = Yup(x) + Sgn2(x), Sgn1>2(x) = a/[(x — x1,2)2 + y2], a = 0.01, x1,2 = 6, 5, y2 = 0.05, Ysup(x, w1) = w1 x Ysup(x) + w2 (Yrnm(x) + 3), where, in turn, w1 + w2 = 1, Yrnm(x) defines the generator of the uniform random numbers from the interval: [—1,1].

Having these functions, it is naturally to solve the problem of extraction of useful signals Sgn1,2(x) from the array functions Ym(x) at different values of the parameter w1; it admits a "mixture" of a noise (w1 = 1 (absence of the noise), w1 = 0.9 (small influence of a noise)). The figures 1-4 demonstrate the results of this simple analysis. In Fig. 1, we show the location of two small signals in the absence of a noise w1 = 1 on the corresponding background. In Fig. 2 we show the extraction of two weak signal with the help of expressions (7). If the background remains stable ((61(x)) is located near the unit value, (G(x)) = 0) then the weak signal is detectable easily. The influence of the random fluctuations is shown in Figs. 3a, 3b. Even a mixture of a small noise (Fig.3a) w2 = 0.1 distorts the Sgn2(x). With increasing of the parameter w2 = 0.5 the Sgn2(x) completely disappears, however the Sgn1(x) conserves its initial location (see Fig. 3b). Therefore, having a stable and "pure" background that serves as a specific detector one can notice the presence of a small "entity" comparable with the noise or it can be detected as a signal.

tí tí

op

CO 00 tí

'tí

tí o

0

00 .tí

1

co

1 Oh tu

¿H H

4-

2-

0-

-1-

5

0<x< 10

-r~

10

Figure 1. Here we demonstrate the location of two artificially created small signals. The array was created with the help of expression (21). M = 100.

0.4-,

0.2-

0.0-

-0.2-

Detection of the weak signals with the help of <G(x)>

? Sgn2(x)

• Behavior of G(x)

M

• • • 2

/V.

Sgn1(x)

-1-

5

0<x< 10

level "0"

10

Figure 2. For this "ideal" background two small signals (where dependence (G(x)) = 0 serves as a specific "ruler") are detected easily with the help of expressions (7).

(D i-H °

to +->

> ^

§ «2

00 g

00 H

cd o

a 2

.00 KS

c/3 <+H

^ °

£ a

& "

s

^ t-l

Oh a

¡1

CO

CD

H

4-

2-

0-

The influence of random fluctuations (w = 0.1)

-1-

5

0<x< 10

-r~

10

Figure 3a. The presence of a "noise" (even in a small amount w2 =0.1) in the chosen sampling creates some certain difficulties in detection of small signal.

1.4-1

-cv

o §

Ph

1.2-

1.0-

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

■ Behavior of <b (x)> under the influence of the factor w = 0.1

-1-

5

0<x< 10

10

Figure 3b. The view of the distorted signals under the influence of a noise. The form of signals are distorted, while their locations are still conserved. The distorted influence function (b1(x)} is shown in this figure.

'h

v sn

a §

Ph

0.0-

-0.6-

-1.2-

■ Behavior of <G(x)> under the influence of a noise factor w = 0.5

-1-

5

0<x< 10

-r~

10

Figure 3c. The further increasing of random fluctuations (w2 = 0.5) leads to the complete disappearance of the small signals and their locations.

2 -i Partial sampling corresponding to 100% of olive oil

O

> 0.

T3

tu _ N

Is

o tí tu ¿H H

-2-

-I—

0.0

0.5

I—

1.0

0 < x = The noramalized data points < 1

Figure 4. Here we demonstrate the selected sampling of the VAGs corresponding to 100% of olive oil.

4.2. Verification of expressions (7) on real data

As one can notice from the previous analysis, the creation of a stable "pattern"/ideal background plays a key role in any original research. If the reliable and sensitive background can be created then a researcher will obtain a sensitive instrument for detection of a "strange" signal/reality that can appear in the given background. I asked doctor of chemical science Artem V. Sidelnikov from Ufa Petroleum Institution to send me some electrochemical data for short analysis. He asked his colleagues to perform a typical electrochemical experiment: to choose a background solute and add some "x" substance for its detection. Actually, he and his colleagues sent me VAGs of two experiments:

Case 1. The background contained 100 measured voltammograms (VAGs) corresponding to 100% pure olive oil. Another experiment contained the mixture between 80% of olive oil and 20% of rape oil. Other parameters were not so important for analysis. Only one basic question is remained: is it possible to notice the difference/direct correlations between pure olive oil chosen as a background (in our notations it is defined as an array ym(x)) and mixture Ym(x) when 20% of olive oil was replaced by 20% of rape/(rapeseed) oil?

Case 2. As a background, the selected carbon electrode located near the Faraday region measured the VAGs of a background (the conventional buffer solution was chosen). Then after 100 measurements, a small amount of water was added (about 1%) and the "modified" background with water was measured again.

Two experiments were performed in the same experimental conditions. It implies that any surrounding parameters as: temperature (T), humidity (H), pressure (P) keep their given values.

Is it possible to notice possible changes evoked by the presence of "x" substance inside the given background, if each experiment was repeated M = 100 times? This repetition is necessary for the creation of the representative array.

In this real experiment we associate the background array with the functions ym(x) (L = 1), where ym = Jm is the registered current, x = (normalized data points from the interval [0,1]) and arrays Ym(x) for cases 2 and 3 (as affected arrays with the presence of "x" substance). For this case, it is sufficient to use simple expressions (7) as in the previous mimic experiment explained above. For both experiments, we used the normalized VAGs, which are defined as

. ( ) = Jm(x) - (J(x)) (x) = stdev(Jm(x)) ,

1 M

(J(x)) = M £ Jm(x).

(30)

m= 1

The figures 4-8 demonstrate the details that are explained in the corresponding captions. The most important figures, from our point of view are figures 6 and 8, where we show two different behaviors of the functions (G(x)) and their mean values (Y(x)) averaged over the arrays containing "x" substance. For the first experiment, we observe weak correlations between two sampling compared, while for the second experiment we observe the strong correlations between them. The proposed approach forces to reconsider the experimental results that are obtained in the result of this independent analysis, when a competitive theoretical hypothesis is used for the fitting of experimental data. We omit the figures demonstrating the corresponding arrays Gm(x) because they are unimportant for this case.

2-

o

<

>

tu 0 •

in CS

tu

tU

¿H H

-2-

The partial sampling corresponging to the mixture 80% (olive oil) + 20% (rape oil)

-1—

0.0

-1-

0.5

0.1 <x< 1.0

-1—

1.0

Figure 5. For comparison with the previous figure we show the VAGs corresponding to the mixture: 80% olive oil and 20% of the rapeseed oil. All VAGs are normalized with accordance with expression (30).

v /T

v

0.2-

0.0-

-0.2-

b = 0.79, g = 0.001

I

0.0

-1-

0.5

0<x< 1

~~i—

1.0

Figure 6a. As one can notice from the central figure the curve (G(x)) (in black color) close to the mean curve (Y(x)) (red color). It means that the correlation between the VAGs corresponding to 100% and the mixture is weak (these oil mixtures can be considered as the independent from each other).

A^

-cv sn

0

1

tu o

s

tu .g

tu ^

H

0.3-

0.0-

-0.3-

Behavior of <b (x)>

I—

0.0

-1-

0.5

0<x< 1

~~i—

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1.0

Figure 6b. It is confirmed also by a weak dependence of the influence function (6i(x)) depicted in this figure.

3-i

O

<

>

tu _ N

13

o sn

tu ^

H

0-

-3-

-6-

Set of the selected and normalized VAGs measured near the Faraday region (without water)

~~I—

0.0

-1-

0.5

0<x< 1

~~i—

1.0

Figure 7a. Here we demonstrate the selected samplings corresponding to the measured VAGs in the given background solute for the Faraday region without water.

o

<

>

tu _ N

o s

tu H

0-

-3-

-6-

Set of the selected and normalized VAGs measured near the Faraday region (1% H O)

-1—

0.0

-1-

0.5

0<x< 1

-1—

1.0

Figure 7b. VAGs distorted by an addition of a small amount of water (about 1%) shown in this figure. One can notice some small distortions by an inexperienced eye and concludes that the difference between these two arrays (compare it with the neighboring Fig. 7a) will be small.

0.3-

v

0.0-

-0.3-

= 0.994998 = 0.00025

~~I—

0.0

-1-

0.5

0<x< 1

~~i—

1.0

Figure 8a. This figure can be considered as the central one in the frame of the second experiment. Here we show the small signal corresponding to "water" influence. It is small; however, it can be detected.

2-,

1 -

A^

-cv

0-

-1

I—

0.0

-1-

0.5

0<x< 1

~~i—

1.0

Figure 8b. The behavior of the influence function (6i(x)) reflecting the strong correlations by the initial array y&>(x) is shown in this figure. Comparison with functions Ym(x) and (Y(x)) depicted in Fig. 7b shows that they are different and, therefore, these two arrays are strongly correlated with other.

5. Analysis of the obtained results

In conclusion, we want to stress the advantages of the proposed approach:

1. We generalized the existing expressions related to analysis of correlations and, in the first time, we obtain the solution in explicit form that allows to separate directly the correlated factor (taken in the form of background array) from a "mixture" of other factors and to obtain the desired uncorrelated/remnant part Gm(x) in the form of an array.

2. Expression (1) can be generalized and allows considering the factors having a temporal memory (18) and nonlinear factors that are related by the cause-and-effects relations (expressions (15) and (16)).

3. In the limiting cases the general expression (1) is reduced to the linear least square method (when (bi(x)) can be replaced by the constant values) and, in another case (when we have approximately Ym(x) = (Y(x))) it restores directly the uncorrelated part Gm(x), when the degree of correlations becomes negligible y^ (t) = 0.

4. The examples based on mimic and real data show and confirm some important details that can be obtained in the result of the proposed approach. Special attention should be paid for the receiving of the stable/robust background serving as a specific detector for registration of any small signal that could appear in the result of an unexpected "x" perturbation. This registered perturbation can help in detection of the desired "signal" (amount of water in the second experiment) that separates this new "quality" from the random fluctuations containing in the "pattern" background.

5. New approach can be helpful in selection of the array that can be associated with the correlated cluster or not. Really, if we obtain the result, when (bi(x)) = 1, (G(x)) = 0, then it implies that the array Ym(x) is strongly correlated with the compared arrays y™ (x), while for the opposite case (bi(x)) = 0, (G(x)) = (Y(x)) these arrays can be considered as almost independent from each other. This simple comparison adds more certain information to the conventional analysis based on the Pearson correlation analysis and the statistics of the fractional moments [16,18,19].

It is interesting to note also that the first experiment demonstrates some unexpected result: the VAGs obtained for the pure olive oil and for the mixture of two oils do not correlated with each other. It means that the oil mixture forms an independent chemical substance that differs from simple mechanical mixture of two oils. For the second experiment, we obtain also important result; the proposed approach allows to extract a "signal" corresponding to a small amount of water (Fig. 8) and proves that VAGs obtained for the both cases are strongly correlated.

Acknowledgments

I would like to express my gratitude to Dr. Artem V. Sidelnikov from Technology Department of Ufa State Petroleum Technological University (USPTU) for the electrochemical data that were sent to me for this analysis.

References

1. Haghighat M., Abdel-Mottaleb M., Alhalabi W. IEEE Transactions on Information Foren-sics and Security 11, 1984 (2016)

2. Sen P.K. Journal of the American Statistical Association 81(394), 560 (1986)

3. Schervish M.J. Statistical Science 4, 396 (1987)

4. Anderson T.W. An Introduction to Multivariate Statistical Analysis, Wiley, New York (1958)

5. Mardia K.V., Kent J.T., Bibby J.M. Multivariate Analysis, Academic Press (1979)

6. Feinstein A.R. Multivariable Analysis, New Haven, CT: Yale University Press (1996)

7. Hair J.F.Jr. Multivariate Data Analysis with Readings, 4th ed., Prentice Hall (1995)

8. Johnson R.A., Wichern D.W. Applied Multivariate Statistical Analysis, 6th ed., Prentice Hall

(2007)

9. Schafer J.L. Analysis of Incomplete Multivariate Data, CRC Press (1997); Saeed V. Advanced Digital Signal Processing and Noise Reduction, 2nd ed., John Wiley & Sons, Ltd (2001)

10. Haghighat M., Abdel-Mottaleb M., Alhalabi W. IEEE Transactions on Information Foren-sics and Security 11(9), 1984 (2016)

11. Haghighat M., Abdel-Mottaleb M., Alhalabi W. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1866 (2016)

12. Lee Y.W. Cheatham T.P.Jr., Wiesner J.B. Proceedings of the IRE 38(10), 1165 (1950)

13. Buciu I., Gacsadi A. IEEE, Proceedings ELMAR, Zadar, 21 (2011)

14. Boashash B. IEEE Transactions on Acoustics, Speech, and Signal Processing 36(9), 1518 (1988)

15. Boashash B. Proceedings of the IEEE 80(4), 519 (1992)

16. Nigmatullin R.R., Maione G., Lino P., Saponaro F., Zhang. W. Communications in Nonlinear Science and Numerical Simulation 42, 324 (2017)

17. Nigmatullin R.R. in Complex Motions and Chaos in Nonlinear Systems, Nonlinear Systems and Complexity, edited by V. Afraimovich et al., Vol. 15, Ch. 1, p. 1, Springer (2016)

18. Nigmatullin R.R., Toboev V.A., Lino P., Maione G. Chaos, Solitons & Fractals 76, 166 (2015)

19. Nigmatullin R.R., Ceglie C., Maione G., Striccoli D. Nonlinear Dynamics 80(4), 1869 (2015)

20. Uchaikin V.V. The Method of the Fractional Derivatives, Chapter 5, "Artishok" Publ. House

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(2008) (in Russian)

i Надоели баннеры? Вы всегда можете отключить рекламу.