UDK 519.2
PREDICTIVE MODELS OF THE STATE OF PRODUCTION PROCESSES BASED ON TIME SERIES USING COGNITIVE INFORMATION
VEREVKIN ALEXANDER PAVLOVICH
Professor of the Department of Industrial Process Automation, Ufa state petroleum technological
university, Ufa, Russian Federation
MURTAZIN TIMUR MANSUROVICH
Associate Professor of the Department of Industrial Process Automation, Ufa state petroleum technological university, Ufa, Russian Federation
Annotation. Advanced control systems (APC-systems) are based on the use of process models that allow you to quickly predict changes in technological parameters (TP) and product quality indicators (QI) of production. Statistical information, which is the results of passive experiments, is usually used to identify the structure and parameters of models. Data of operating parameters in control systems are archived in the database in the form of time sequences without ensuring their reliability and uniformity. In conditions of the time series non- stationarity and heterogeneity data, the quality calculation models sharply decreases, therefore, as a rule, they cannot be used for model development without preliminary preparation. The preparation is aimed at ensuring the conditions for the stationarity of the series, the adequacy of the models and includes the stages of selecting clusters (fragments) of data andfiltering them, for which situational models will be formed in the future.
The article discusses methods of bringing data from the automated control system archives into a common format, filtering and clustering them. The methods of filtering and clustering based on the use of so-called exemplary cognitive models and TP and QI cross-correlation coefficients are described.
Keywords: regression model, time series, cross-correlation, cognitive approach, normalization.
I. INTRODUCTION
In the construction of predictive analytics, control and diagnostics systems, models are widely used. For the structure and parameters identification used statistical information is usually [1 - 6]. Statistical data have the form of time sequences, for which, usually, the uniformity and stationarity conditions are not met.
In conditions of data non-stationarity and heterogeneity , the quality of the models obtained from them sharply decreases, up to the production of unstable and non-physical (contradicting the physical meaning) values of the model parameters. Known approaches to data preparation [7 - 11, 15] are aimed at identifying clusters (fragments) of data and filtering them in order to ensure the conditions of series fragments stationarity and to obtain situational models in the future [1].
The problems that arise in this case are common for forecasting tasks based on time series in different areas, but for production processes there are features due to the presence of cognitive information about the relationship between input and output parameters of models, which allows using this information to regularize solutions [20].
The purpose of this article is to present some methods of modeling objects and systems based on the use of non-stationary time series and cognitive information about the process.
II. FEATURES OF MODELING OBJECTS WITH NON-STATIONARY CHARACTERISTICS
A typical plan for analyzing statistical data and time series [7 - 11, 15] begins with clarifying the composition of time series: the presence of a trend, periodic components, outliers (interventions) and of the random remainder stationarity properties, primarily stationarity.
The problem of the control object non-stationarity can be associated with both the influence of non-measurable factors and changes in the object (system) characteristics in time, for example, due to the presence of the control processes dynamics, which determines the fundamental difficulties in determining the structure and parameters of the model. The application of traditional approaches to obtaining models in the regression equation form in conditions of the object non-stationarity can lead to estimates of the coefficients that contradict the of the process laws both with respect to the signs of the coefficients and their values (the instability of the model). Therefore, when preparing data and modeling, in particular, filtering and clustering information, it is proposed to comprehensively use statistical analysis techniques, cognitive information and cognitive models [12 - 16, 17].
III. THE ORDER OF PROCESSING TIME SERIES TO OBTAIN OF THE MODELS STRUCTURE AND PARAMETERS
In [15], two separate methods were proposed for solving the problem of filtering and clustering data. The first one is based on the analysis of the measured values in comparison with the control model, which used restrictions on the relationship of technological parameters for a specific technological process. The second is based on the use of cross-correlation coefficients of input and output parameters.
The generalization of this approach consists in the use of of the process cognitive model (CM), the parameters of which are determined with accuracy to interval values assigned by experts or based on "strict" models. At the same time, the cognitive model may not be balanced and have low adequacy in relation to real data, but qualitatively correctly display the relationships of variables. Approximation models obtained in certain combinations and ranges of variable variation, i.e. by data fragments, may have high prediction accuracy on individual samples, but at the same time the conditions of parameters physicality may be violated, and the models will be phenomenological.
The technology of cognitive analysis and modeling [12, 16, 17] is based on cognitive (cognitive-target) structuring of knowledge about an object and its external environment, and the object and the external environment are "indistinctly", "fuzzy" distinguished.
The purpose of such structuring is to identify the most significant (basic) factors characterizing the "boundary" layer of interaction between the object and the external environment, and to establish qualitative (causal) relationships between them, i.e. what mutual influences factors have on each other in the course of their change. The mutual influences of factors (concepts) are displayed using a cognitive map (model structure), which is usually a signed (weighted) signal graphs.
The modeling task is proposed to be solved in several stages.
At the first stage, a set of parameters (concepts) of the model is determined by an expert (cognitively), which, according to the expert, should be included in the cognitive model and the cognitive model M (R, T, Q, A) is formed, where R is a set of controls signals, T is a set of technological parameters (state parameters), Q is a set of dependent variables, for example, product quality indicators (QI), A is the matrix of the coefficient in the model . Additionally, based on the results of cross-correlation analysis, variables with a high correlation coefficient with the parameters of the sets Q and R or T can be included in the concepts, if this does not contradict the laws of the process. Thus, the generalized structure of the process cognitive model will take into account both heuristic and empirical information. Note that when conducting a cross-correlation analysis of time series, transport delays (lags) between time series determined by known methods should be taken into account [7 - 11].
At the second stage, clustering of time series (sets) R(t), T(t), Q(t) as functions of time t is performed. Fragments of time series are selected in such a way that the average values of variables
ОФ "Международный научно-исследовательский центр "Endless Light in Science"
for different fragments differ markedly. The criterion for selecting a new fragment may be the difference in the average values for neighboring fragments, by an amount exceeding 3 values of the standard deviation g (root-mean-square error) on the previous time fragment:
M
t ; t 1 . r r +1.
- M
r-1
t л;t r-1 r
> 3a
t л;t r-1 r
M
where a
t ; t , r r +1
1 tr
is the standard deviation on the time interval of the fragment
■1; tr
M
r-1
t л;t r -1 r
are the averages (as estimates of mathematical expectation) on the
time intervals of fragments from r...(r+1) и (r-1)...r respectively; r=2,3, ...R is the fragment number, R is number of fragments.
Previously, mistakes (anomalous observations) should be removed for each fragment [7]. Usually, single values are considered to be misses, for which the standard deviation attributed to mathematical expectation is usually more than 10-20 %.
The number of observations should be at least 10-15 at each discreteness interval.
At the third stage, graphs of changes in mathematical expectations are constructed on time fragments, which are smoothed time series.
The time series are shifted in time taking into account the legs (the origin is determined by the output variable) determined at the first stage.
The process of filtering mathematical expectations for the purposes of parameters calculating the models is reduced to the selection of such fragments of time series (data clusters) that meet three criteria:
1. The sign of the cross-correlation coefficients determined for the pairs Q and T or R must coincide with the sign of the "weights" of the arcs (or sequence of arcs) of the cognitive model;
2. The values of estimates partial regression coefficients of for factors should lie in the range of changes corresponding coefficients in the cognitive model; to verify the fulfillment of this condition, regression equations can be used, which are written on a standardized scale [19], but another normalization method is proposed below - according to the values of mathematical expectations of the variables under consideration;
3. The value of the cross-correlation coefficients module should exceed some boundary values that allow us to consider the relationship strong enough.
Certain difficulties are caused by ensuring the requirements for the second criterion, since some variables may not change at certain time intervals, and then the pair correlation coefficients will turn out to be close to zero.
The following methodology for estimating partial (for parameter pairs) regression coefficients is proposed:
1. Calculate the covariance coefficients separately for each pair of variables "input factor -output factor". To do this, such aggregates of fragments are formed for which the values of the mathematical expectation for both the input and output factors on these fragments are close. At the same time, the values of the correlation coefficients for the remaining pairs of "input factor - output factor" are small.
2. Covariance coefficients are normalized by the value of the mathematical expectation of the corresponding variables for a given set of fragments:
Cov(R,Q) Cov(T, Q)
M(R)xM(Q) ' OV( ,QJ M(T) xM(Q) '
where M(R), M(T), M(Q) are mathematical expectations for data from elements of sets R, T, Q.
In contrast to the normalization of variables deviations by standard deviations, in this case the value of the normalized coefficient of covariance (NCC) can serve as a measure of the input variable partial regression coefficient of the under consideration, expressed in relative deviations. In addition,
Cov(R,Q) =
r
r
r
r
the absolute values of the NCC carry information about the ratio of a purely random component to the level of information useful for modeling these variables. Using the values of the correlation coefficients (CC) and NCC, additional information can be extracted.
For simplicity, we assume that CC and NCC can have two fuzzy values: small or large. The normalized values of deviations have the meaning of relative values, which makes it possible to compare them with the coefficients of connections of cognitive models expressed in fuzzy scales [12].
Then we have 4 possible combinations:
1. CC - big, NCC - big;
2. CC - large, NCC - small;
3. CC - small, NCC - large;
4. CC - small, NCC - small.
In the first case, statistics make it possible to draw a definite conclusion that there is a strong connection between the input and output factors. If, at the same time, the obtained NCC does not fall within the range of changes in the cognitive model coefficients, the correctness cognitive model coefficients assignment should be considered.
In the second case, despite the high value of the CC, it is necessary to approach with caution the estimation of the calculated regression coefficient value, since it was obtained under conditions of a small level the random component at least one variable. In this case, the NCC may be evaluated unstable and its comparison with the value in the cognitive model should be done taking into account additional information.
In the third case, we can say that the variance of at least one variable is large or the mathematical expectation is small, and the obtained value of the NCC as an estimate of the regression coefficient is also estimated unstable.
In the fourth case, it can be argued that the relationship between the variables is small and it makes no sense to include the corresponding input variable in the regression equation.
The procedure for calculating covariance coefficients is repeated sequentially for all fragments of independent variables.
As a result, fragments of lagged and smoothed series are selected, which are suitable for analyzing the stationarity of the object and calculating regression coefficients.
At the fourth stage, the analysis of the time series prepared at the third stage is carried out for stationarity, the presence of trends or periodic components and the coefficients of the regression model are determined.
In this case, the following causes of prepared time series behavior of mathematical expectations are possible.
Case 1. There are pairs of fragments of output and separate input variables, for which CC is large, NCC is large. This means that there is no trend for such fragments, and the partial coefficients of the regression model can be calculated from these data. It may turn out that there are several pairs of fragments with different values of CC or NCC. In this case, the regression coefficients are averaged over a set of fragments with "weights" determined through the NCC. Regression coefficients are calculated for pairs Q and T or R (excluding missing data), which are "weighted" for each parameter, taking into account the NCC of the corresponding fragment:
n m
a(R) = Ial(R) x cl(R), a(T) = I af(T) x cf(T):
i=l J=1
Covt (R,Q) CoVj(T,Q)
ct(R) = , —, c,(T) = . _
l H=jCovt(R,Q) ' j Ij=jCoVj(T,Q)
where n, m are the number of independent variables of the model from the sets R, T, respectively; k, l are the number of fragments for the variable sets R, T, respectively, a are the coefficients of the regression model (elements of matrix A).
Case 2. There are sections of prepared time series with combinations "CC - large, NCC - small", "CC - small, NCC - large". This means that either there are factors that are not taken into account by the cognitive model structure adopted at stage 1, or there is a relationship (multicollinearity) of variables. An additional analysis of the process should be carried out to identify such factors. One of such factors for the oil refining and petrochemistry processes can often be a change in the composition of raw materials.
Case 3. There is a monotonous change in the NCC over time. This means that the modeled system is characterized by a change in the dependent parameter over time, for example, due to degradation of equipment or catalyst. In this case, the time parameter should be additionally included in the model, and the model should be obtained, for example, in the form of a difference equation.
IV. EXAMPLE OF OBTAINING A MODEL BASED ON TIME SERIES ANALYSIS
Let's consider the use of the proposed approach on the example of modeling an industrial facility, which is an adsorption column of petrochemical raw materials [1].
At the first stage, four technological parameters are phenomenologically identified as determining variables:
1. FF - ratio of irrigation consumption to the consumption of raw materials in the column;
2. PD - pressure drop along the height of the column;
3. LC - level in the phlegm tank;
4. TA - temperature in the column cube.
The purpose of the simulation is to calculate the solvent concentration in the column top product (ACET).
The signed cognitive model (CM) has the form (Fig. 1):
Fig. 1. Cognitive model for calculating the parameter
At the second stage, fragments (generally different for each of the variables) are allocated for each of the input parameters according to the above methodology, for which mathematical expectations are calculated. Table 1 shows the values of mathematical expectations when fragmenting all variables. Similar tables are obtained for each input variable.
Ta
ble 1. The values of the expected values for the selected time fragments by the FF parameter
Numb FF PD LC TA ACET
1 0.0171 0.021 45.42 135.41 7.62
2 0.0165 0.0175 59.36 135.72 8.45
3 0.0162 0.0158 57.99 134.84 8.91
4 0.0188 0.0359 44.74 134.98 6.46
5 0.0203 0.0200 48.66 134.13 5.50
6 0.0205 0.0279 56.28 134.65 8.77
7 0.0246 0.0361 50.62 135.29 5.97
8 0.0232 0.0293 49.81 134.93 5.14
At the third stage, normalized covariance matrices for sets of fragments are calculated for the values of mathematical expectations (see Table 2), which are characterized by a slight difference in the values of mathematical expectations. Recall that they show the relative variation of each variable on the fragment under consideration. There are three such sets for Table 1 (1-3, 5-6, 7-8).
Table 2. NCC for a set of fragments by parameter FF
Fragments FF PD LC TA
1-3 -0.0014 -0.00764 0.0066 -8.7E-05
5-6 0.0011 0.03771 0.0166 0.00043
7-8 0.0021 0.00774 0.0006 9.94E-05
At the fourth stage, for parameters for which the NCC sign corresponds to the CM sign and exceed a certain threshold value (highlighted in color), the linear regression coefficient is calculated in pairs "ACET - P", where P is one of the input parameters of the model (see Table 3). Table 3 shows the regression coefficients (CR) for fragments, the modulus of the NCC values corresponding to them, and the regression coefficient for an independent parameter of the model obtained as a result of convolution by "weighing" the values of the NCC.
Table 3. Values of partial regression coefficients and the result of calculating the "weighted"
CR
Par. Value CR/NCC for fragments Final CR
№1 №2 №3 №4 №5
FF -1423/0.001 -1243/0.013 -214.3/0.053 -521.7/0.025 -/- -458.1
A model with coefficients is obtained:
АСЕТ = -458.1* FF-112.6* PD+0.109* LC + 0.491*TA+a0.
Table 4. Statistical characteristics of the model
Amount of data 508
R2 -coefficient of determination 0.22
CC 0.47
Standard deviation 1.97
Index (the ratio of the standard deviation to 0.9
the ACET variance)
The average value ACET 6.79
The average value ACET of the model 6.39
Standard deviation ACET 2.89
Standard deviation ACET of the model 2.51
The free term of the model is selected so that the average value of the calculated dependent parameter is close to the average in the training sample.
The results of testing the model are shown in Table.4. The results of QI forecasting by comparison with laboratory data of the indicator are shown in Fig. 2.
gSSS3a8stSI58R«$SISS33e
AC£T_ LAB -ACET_ CALC
Fig. 2. The results of predicting the parameter by the model on the test sample
Note that with a relatively low coefficient of determination, the resulting model can be successfully used for the purposes of predicting the quality of products in the tasks of control. Improving the accuracy of the model is possible by dividing the considered time interval into subsets and obtaining several situational models.
REFERENCES
1. Verevkin A.P., Kiryushin O.V. Avtomatizatsiya tekhnologicheskikh protsessov i proizvodstv v neftepererabotke i neftekhimii. Ufa: Izd-vo UGNTU, 2005. - p. 71.
2. Dozortcev, E.L. Itckovich, D.V. Kneller. Usovershenstvovannoye upravleniye tekhnologicheskimi protsessami (ARS): 10 let v Rossii. //Avtomatizatsiya v promyshlennosti. - 2013. - № 1. - pp. 12-19.
3. Terrence Blevins, Willy K. Wojsznis, Mark Nixon. Advanced Control Foundation: Tools, Techniques and Applications. ISA. 2012. - 556 p.
4. Ansari, R.M.,Bawardi, K.M.Multivariable control and advanced monitoring: Applications to hydrocracking process. Saudi Aramco Journal of Technology,June 2006.-p. 33-37.
5. Campos, M., Teixeira, H., Liporace, F., Gomes, M. Challenges and problems with advanced control and optimization technologies (Conference Paper).7th IFAC International Symposium on Advanced Control of Chemical Processes, ADCHEM'09; Istanbul; Turkey; 12 July 2009 through 15 July 2009; Code 85828. Volume 7, Issue Part 1, 2009. - pp. 1-8.
6. Kadlec P., Gabrys B. and Strandt S. Data-driven soft sensors in the process industry. Computers and Chemical Engineering. 2009. Vol. 33. - pp. 795-814.
7. Orlova I.V., Polovnikov V.A. Ekonomiko-matematicheskiye metody i modeli: komp'yuternoye modelirovaniye /Ucheb.posobiye. - M.: 2007, p. 365.
8. J. Box, G. Jenkins. Analiz vremennykh ryadov, prognoz i upravleniye: Per. s angl. // Pod red. V.F. Pisarenko. - M.: Mir, 1974, kh. 1. - p. 406.
9. R. Otnes, L. Enoxon Prikladnoy analiz vremennykh ryadov. - M.: Mir, 1982, p. 428.
10. Tyurin U.N., Makarov A.A. Analiz dannykh na komp'yutere. Izd. 3-ye, pererab. i dop./Pod red. V. E. Figurnova — M.: INFRA-M, 2002. - p. 528.
11.Dombrovsky V.V.Ekonometrika.URL: http://sun.tsu.ru/mminfo/2016/Dombrovski/book/chapter-5/chapter-5-4.htm
12. Vasilyev V.I., Ilyasov B.G. Intellektual'nyye sistemy upravleniya. Teoriya i praktika: uchebnoye posobiye. - M.: Radiotekhnika, 2009. - p. 329.
13. Verevkin A.P., et al. Postroyeniye matematicheskoy modeli trubchatoy ustanovki piroliza dlya tselevykh rezhimov i diagnostiki progarov zmeyevika /A.P. Verevkin, D.S. Matveyev, M.KH. Khusniyarov, A.V. Chikurov // Neftegazovoye delo. - 2010. - T.8, №1. - pp. 70-73.
14. Verevkin A.P., Kalashnik D.V, Khusniyarov M.H. Modelirovaniye operativnogo opredeleniya indeksa rasplava dlya upravleniya protsessom polietilena. //Bashkirskiy khimicheskiy zhurnal. Ufa: UGNTU. 2013. Tom 20. № 1. - pp. 69-74
15. Verevkin A.P., Murtazin N.M., Denisov S.V., Ustyuganin K.U. Podgotovka dannykh dlya postroyeniya nastraivayemykh analizatorov v zadachakh usovershenstvovannogo upravleniya.//Avtomatizatsiya v promyshlennosti, №3, 2019. - pp. 12 - 17.
16. Verevkin A.P.Kognitivnyye modeli v fondakh iskusstvennogo intellekta: tseli i metody postroyeniya // Integratsiya nauki i obrazovaniya v vuzakh neftegazovogo profilya-2016 /Materialy mezhdunarodnoy nauchno-metodicheskoy konferentsii, posvyashchennoy 60-letiyu filiala UGNTU v g. Salavate. - pp.167 - 170.
17. Verevkin A.P., Murtazin T.M., Grigoryeva U.L. Kognitivnoye modelirovaniye protsessov neftepererabotki s uproshchennoy protseduroy adaptatsii modeley. //Territoriya Neftegaz. - 2018. -№7-8 - pp. 14 - 18.
18. Gaidamak A.V., Verevkin A.P. Diagnostika i povysheniye vysokoy otsenki pokazateley za schet ucheta povyshennoy izbytochnosti (pri vozniknovenii bloka ustanovki kataliticheskogo riforminga). // «Voprosy krupnykh i matematicheskikh nauk»: materialy mezhdunarodnoy zaochnoy nauchno-prakticheskoy konferentsii. (27 maya 2013 g.) - Novosibirsk: Izd. «SibAK»,2013.-pp. 154.
19. Available: https://scask.ru/g_book_mkor.php?id=30.pdf
20. Tikhonov A.N., Arsenin V.J Metody resheniya nauchnykh zadach.M.:Nauka.-1979, p.283.