Вестник СибГАУ. Том 17, № 4
UDC 517.9
Sibirskii Gosudarstvennyi Aerokosmicheskii Universitet imeni Akademika M. F. Reshetneva. Vestnik Vol. 17, No. 4, P. 878-882
ABOUT THE METHODS FOR SELECTION INFORMATIVE FEATURES USING SELF-ADJUSTING NEURAL NETWORK CLASSIFIERS AND THEIR ENSEMBLES
E. D. Loseva1*, R. B. Sergienko2
1Reshetnev Siberian State Aerospace University 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037, Russian Federation 2Institute of Communications Engineering, Ulm University, 43, Albert Einstein Allee, Ulm, 89081, Germany *E-mail: rabota_lena_19@mail.ru
Using feature selection procedures based on filters is useful on the pre-processing stage for solving the task of data analysis in different domains including an air-space industry. However, it is a complicated problem, due to the absence of class labels that would guide the search for relevant information. The feature selection using "wrapper" approach requires a learning algorithm (function) to evaluate the candidate feature subsets. However, they are usually performed separately from each other. In this paper, we propose two-stage methods which can be performed in supervised and unsupervised forms simultaneously based on a developed scheme using three criteria for estimation ("filter") and multi-criteria genetic programming using self-adjusting neural network classifiers and their ensembles ("wrapper"). The proposed approach was compared with different methods for feature selection on tree audio corpora in German, English and Russian languages for the speaker emotion recognition. The obtained results showed that the developed technique for feature selection provides to increase accuracy for emotion recognition.
Keywords: emotion recognition, neural nework classifiers, multi-criteria genetic programming, feature selection.
Вестник СибГАУ Том 17, № 4. С. 878-882
О МЕТОДАХ ОТБОРА ИНФОРМАТИВНЫХ ПРИЗНАКОВ С ПРИВЛЕЧЕНИЕМ САМООРГАНИЗУЮЩИХСЯ НЕЙРОСЕТЕВЫХ КЛАССИФИКАТОРОВ И ИХ АНСАМБЛЕЙ
Е. Д. Лосева1 , Р. Б. Сергиенко2
1Сибирский государственный аэрокосмический университет имени академика М. Ф. Решетнева Российская Федерация, 660037, г. Красноярск, просп. им. газ. «Красноярский рабочий», 31 2 Институт телекоммуникационной инженерии, Ульмский университет Германия, 89081, г. Ульм, аллея Альберта Энштейна, 43 *E-mail: rabota_lena_19@mail.ru
Применение методов отбора признаков на основе фильтров является эффективным на этапе предобработки данных для анализа в различных предметных областях, включая аэрокосмическую отрасль. Но такое применение также является сложным ввиду того, что отбор проводится без использования функции, определяющей качество признака. Отбор признаков с использованием методов свёртки использует для обучения функцию качества признаков или наборов признаков. Однако оба этих метода обычно используются отдельно друг от друга. Предложен подход, который может быть представлен в виде последовательной кооперации обоих методов. Этот метод основан на разработанной схеме с использованием трех критериев эффективности для оценки качества признаков (фильтры) и многокритериального генетического программирования с привлечением самоорганизующихся нейросетевых классификаторов (свёртки). Проведено сравнение разработанного двухэтапного метода отбора признаков с существующими методами. Сравнение проводилось с использованием трех баз данных, содержащих акустические характеристики голосов людей на немецком, английском и русском языках для распознавания эмоций человека. Полученные результаты показали эффективность разработанных методов отбора признаков для повышения точности распознавания эмоционального состояния человека.
Ключевые слова:распознавание эмоций, нейросетевые классификаторы, многокритериальное генетическое программирование, отбор признаков.
Introduction. Nowadays, the air-space industry in Russia demonstrates the potential growth. Currently, there are 99 Russian satellites on the orbit (70 % of them are military or dual-use). Some of them are new versions of satellites (new early warning system, communications and surveillance "Meridian"), as well as satellites of the GLONASS navigation system, the number of them has increased to 24 [1]. In the air-space industry for succeeded performance, it is necessary to pay attention not only to physical condition of cosmonauts, but also to the psycho-emotional state. According to research on the physical condition in weightlessness, there are factors which affect on the condition of cosmonauts such as tremendous congestion including on their psycho-emotional state. Analysis of the cosmonauts' psycho-emotional state will allow to choose the most appropriate method for rehabilitation and improving its performance. In that case, the advanced technologies for emotion recognition using dialog systems are useful. Emotion recognition is a complex task because spoken dialogue systems work in real time. Real-time recognition could be accompanied with noise, emissions and absence of data. Therefore, it is necessary to develop approaches for increasing recognition effectiveness of such systems. Most of systems which work with speech could have modules for processing of the audio signal. Those modules implement methods for pre-processing or processing features of the signal. However, some existing methods do not demonstrate high effectiveness in most cases. For solving those problems, it is necessary to perform determination of irrelevant features (attributes) in data sets (feature selection). An alternative way for feature selection is an application of the methods based on evolutionary techniques that are effective for high-dimensional and poorly structured problems.
Various classifiers (support vector machine, linear discriminant analysis, naive Bayes, decision tree, multilayer perceptron) for speaker state recognition problems were compared in [2] on Berlin, UUDB, LEGO databases of emotional speech. These results showed the highest value of precision using ANN classifiers and one-criterion genetic algorithm (OGA) for feature selection in [2]. Also the results of emotion recognition with feature selection have been presented in [3]. The authors have achieved the high value of precision on the databases Berlin, UUDB, LEGO with different methods of feature selection such as: one-criterion Genetic Algorithm (OGA) [4], Principal Component Analysis (PCA) [5], Information Gain Ratio (IGR) as it was done in [3] and SPEA [6] using Multilayer Perceptron (MLP) [4] as a classifier in. In this research the authors noted that the reduced with the SPEA method feature set was a twice less than the original dimensionality.
This paper we propose hybrid method for feature selection based on the cooperation a "wrapper" method and a "filter" method [7]. The developed method consists of two steps. In the first stage, the data pre-processing using developed scheme ("wrapper"). In the second stage based on multi-criteria genetic programming (GP) using self -adjusting artificial neural network (ANN) classifiers or their ensembles. It is known, that applying evolutionary algorithms require to setup a lot of different parameters. In that case in this paper a Self - adjusting procedure is used, which allows to choose the most effective settings of evolution operators (EO) automatically [8].
The developed two-stage methods representation.
In the first stage the algorithm PS (Preprocessing with Sort) for preprocessing is applied.
Estimation of relevant features using several criteria is implemented: the level of variation, the and Fisher score [9]. At this stage, assessment is carried out without the involvement of classifiers but by calculation of density of classes location and statistical evaluation of the available data (the process of "unsupervised"). For the selection of features to an intermediate subset the following scheme is used:
1. Calculation fitness functions by formulas (2), (3) values for all features from the initial set.
2. Calculation average value of efficiency features from initial feature set by formula (1):
R F
Z Z^f
Value = F^^-, (1)
R
where K - amount of classes; R - amount of features; Fif( - fitness function of r-th feature; f = 1, F, F -
amount of fitness function.
3. Identification features as "0" - not effective,
R F
if ZZFitf > Value and "1" - effective, if
r=1 f=1 R F
ZZFitf <= Value.
r=1 f=1
4. To select features with rang "1" to intermediate feature set.
The fitness functions are follows:
1. The criterion is Variation level. Fitness function by formula (2) is calculated:
FitGA\ = -+—, (2)
1 + 8r
where 8r2 - dispersion value of r-th feature.
2. Second criterion is Fisher score [9]. Fitness
function by formula (3) is calculated:
FitGA2r = Fr, (3)
where Fr is calculated by formula (4):
K
Z Pk W-^)2
Fr = ^-, (4)
Z Pk Sk2 k=1
where pk - amount of objects in k-th class; ^k - mean value; Srk2 - dispersion of k-th class, k = 1, K; r -current number of feature.
In the second stage MCGP (Multi-Criteria Genetic Programming) applies multi-criteria genetic programming for formation neural network models using Self-adjusting procedure for choosing the most effective EO. In the second stage, feature selection process using intermediate set is implemented, which was created in the first stage. After applying MCGP algorithm, all found features are formed into the final feature set.
BecmnuK Cu6fAy. TOM 17, № 4
In the MCGP, the ANN classifiers are used as a learning algorithm. In our evolutionary procedure we use genetic programming operating with trees (tree encoding). The ANN model is encoded into the tree. A tree is a directed graph that consists of nodes and end vertex (leaves). In nodes one operator may stay from the multiplicity F {+, <} and there are objects from the multiplicity
T {/Ni, IN2, IN3.....INn - input neurons (feature subsets),
Fi, F2, F3, F4..... Fn - activation functions (neurons)}
in the leaves [8]. Each input neuron corresponds to one feature. The operator "+" from multiplicity F indicates formation all neurons in one layer and the operator "<" indicates formation all layers in ANN.
For estimation features in this stage the three fitness function are used:
1. The first fitness function: Pair correlation level:
FitG— = -1--> max, (5)
1 + measure
where "measure" is a maximum of pair correlation values between input neurons in ANN:
measure = max(cort), (6)
where corrt is pair correlation value of two (X, y) ANN input neurons, t = 1,T, T - amount of all possible pairs of ANN input neurons. Corrt by formula (10) is calculated:
M
X (x - x)(t - y)
c0rrt = IM ' Im , (7)
^X (X - xt X (yt - yt )2
where M - amount of objects in X.
2. The second fitness function: Classification accuracy:
FitGP2 = —, (8)
V
where P is the amount of correctly classified objects; V is the amount of classified objects.
3. The third fitness function: Complexity of ANN structure.
i-1
FitGP3 = n • N1 + X NiNi+1 + Nl • 1, (9)
i=1
where n is the amount of input neurons; Ni is the amount of neurons in the i-th layer; i is the number of hidden layers; L is the amount of hidden layers in ANN; 1 is the amount of output neurons in ANN.
The developed approach (MCGP) works as follows:
Step 1. Create a population of individuals. Each individual is a tree as a representation of ANN.
Step 2. Optimization of the neural network weighting factors by OGA. The criterion for stopping the OGA is the maximum value of classification accuracy.
Step 3. In this step all combinations of EO have equal probabilities of being selected. In other steps, it is necessary to recalculate probability values for new combinations of EO. All combinations with different types of operators were formed: two types of selection operators (tournament, proportion), two types of mutation operators
(strong, weak) and one type for recombination (one-point) were used.
Step 4. Estimate criteria values for all individuals from the current population.
Step 5. Selection two individuals for recombination by VEGA (Vector Evaluated Genetic Algorithm) method [10].
4. Recombination of two selected individuals for creation a new descendant.
5. Mutation of a descendant.
6. Evaluation a new descendant.
7. Compilation new population (solutions) by each created descendant.
Step 6. Choose a new combination of EO by recalculation of the probability values. For recalculation need to estimate the EO combination effectiveness by formula (10) for each descendant which was created by this EO combination:
i /p f
Fit _ Operp = — •XX FitGP/, (10)
/p d =1 f=1
where Fitdi is fitness Ah descendent by d-th criterion; Ip is amount of descendants which were created by chosen variant of EO combination.
The number of added fitness functions may be different; it depends on the algorithm. After comparing values (Fit_Operp), the variant of EO with highest value calls a "priority" variant. A combination of EO with the lowest probability value changes on the "priority" variant. The recalculation of probabilities is implemented for each iteration of the algorithm. If all combinations on a "priority" option have been replaced, all probability values are cleared. New variants of EO combination are generated again.
Step 7. Check the stop-criterion: if it is true, then complete the working of MCGP and select the most effective individual as a representation of ANN from population, otherwise continue from the second step. The chosen best ANN is the model with relevant set of features, which equals to the set of input neurons in ANN. In algorithm PS + MCGP_Ens using ensembles of neural networks the set of input signal of all network models are used for final feature set (exclude repeated signals).
For formation general ensemble decision, the developed modified Scheme ED 2 was applied [11].
The ensemble value of classification accuracy by formula (11) is calculated:
Accuracy = (P/ V) -100 %, (11)
where P/ V - ratio of right classified objects to all objects.
Research and results. The research of effectiveness of the developed algorithms were performed on three databases such as Berlin [12], LEGO [13] u RSDB [2] with several classifiers. The considered databases in German, English and Russian are described in tab. 1. For feature extraction and creation of databases from records, the software Praat script [14], Notepad++, Excel were used. The types of classifiers are the following: Decision Table, HyperPipe, VFI, LWL, A-NN, JRip [15].
Table 1
Data base description
Type of data base Language Data base size (amount of objects) Amount of features Amount of classes
Berlin German 535 45 7
LEGO English 4827 29 5
RSDB Russian 800 20 3
Table 2
Average value of effectiveness of developed methods on three data bases
Data base Amount of features Classifier
LWL JRip Á-NN Naive Bayes VFI
Before selection After selection Accuracy, %
Berlin 45 Developed algorithm 22
PS + MCGP 22,8 58,5 76,9 74,9 54,7
PS + MCGP_Ens 28 36,2 49,9 74,8 70,3 56,1
LEGO 29 PS + MCGP 18 73,9 70,8 81,3 76,6 67,0
PS + MCGP_Ens 23 72,1 75,6 80,5 74,1 68,9
RSDB 20 PS + MCGP 13 91,4 95,3 88,6 87,6 65,7
PS + MCGP_Ens 16 93,7 85,7 96,2 92,2 95,6
Table 3
Average accuracy value of recognition emotions using developed methods and existing methods for feature selection
Data base Á-NN using PCA Á-NN using OGA Á-NN using PS + MCGP Á-NN using PS + MCGP_Ens
Berlin 56,9 55,1 76,9 74,8
LEGO 70,3 60,7 81,3 80,5
RSDB 73,8 74,4 88,6 96,2
Table 4
Average accuracy value of recognition emotions using PS + MCGP_Ens method and A-NN classifier
Data base Berlin LEGO RSDB
Amount of ANNs in Ensemble Accuracy, %
3 64,8 70,9 50,2
5 65,7 45,5 64,8
7 35,6 28,9 40,8
Below, the results of comparison analysis of developed methods and existing methods are presented. For the comparison the existing methods such as PCA (Principal Component Analysis) [4] and one-criteria genetic algorithm (OGA) are used [3]. Those results are presented in tab. 3. In tab. 2 the results of comparison analysis of two developed methods on three databases and described classifiers are shown. In tab. 4 the results of using PS+MCGP_Ens with different amount of ANNs in the ensemble are shown.
Application of PS + MCGP may significantly reduce the set of features. According to the results given in tab. 2, the most significant result is observed using the method PS + MCGP_Ens on the RSDB (96,2 %). Based on the results presented in tab. 2, it is necessary to note the high recognition accuracy using the method of PS + MCGP on the Berlin database is 76,9 %, LEGO is 81,3 %. According to the results in tab. 2, 3 and 4, in most of the experi-
ments the use of the proposed method improves the selection of informative features in terms of classification accuracy and, therefore, demonstrates efficiency.
Conclusion. The use of the proposed algorithms PS + MCGP and PS + MCGP_Ens allows to select relevant features and to improve the accuracy of recognition (classification). Therefore, there is a clear reduction of feature space without loss of recognition accuracy on three databases. The results presented in tab. 3 show that in most cases the use of the developed algorithms leads to significant increase recognition accuracy. In tab. 4 the results after application of the method of PS + MCGP_Ens with different amount of neural networks in the ensemble show that large and small amount of neural networks in the ensemble does not improve accuracy. Therefore, the optimal number of neural networks in ensemble can vary from type of the task and database capacity. It should be noted the effectiveness of the
BB1
Вестник СибГАУ. Том 17, № 4
selected classifiers are improved with proposed approaches. For example, the highest value of accuracy involving k-NN classifiers, Naive Bayes, Decision Table is observed with the proposed methods (see tab. 2).
References
1. Ministry of Defence of the Russian Federation. Available at: http://eng.mil.ru/en/structure/forces/cosmic.htm/ (accessed 02.6.2016).
2. Loseva E. D. [Application of ten algorithm for optimization of the support vector machine parameters and for optimization of feature selection process in the task of recognition human's gender and age]. Materialy XII Mezhdunarodnoy nauch. prakt. konf. "Sovremennyye kontseptsii nauchnykh issledovaniy". [Proceed. of XII International scientific-practical conference]. (27-28 March
2015, Moscow). Moscow, Part 7, No. 3, 2015, P. 132-136 (in Russ).
3. Maxim Sidorov, Christina Brester, Eugene Semenkin, Wolfgang Minker. Speaker State Recognition with Neural Network-based Classification and Self-adaptive Heuristic Feature Selection. International Conference on Informatics in Control (ICINCO). 2014, P. 699-703.
4. Holland J. H. Adaptation in Natural and Artificial System. University of Michigan Press. 1975, P. 18-35.
5. Jonathon Shlens. A Tutorial on Principal Component Analysis. Cornell University Library. 2014, P. 1-12.
6. Zitzler E., Thiele L. Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach. Evolutionary Computation (IEEET). 1999, No. 3(4), P. 257-271.
7. Saadia Minhas, Muhammad Younus Javed. Iris Feature Extraction Using Gabor Filter. International Conference on Emerging Technologies. 2009, P. 252-255.
8. Loseva E. D., Lipinsky L. V. [Ensemble of networks with application of multi-objective self-configurable genetic programming]. Vestnik SibGAU. Vol. 17, No. 1, 2016, P. 67-72 (in Russ.).
9. He X., Cai D., Niyogi P. Learning a Laplacian Score for Feature Selection. Advances in Neural Information Processing Systems 18 (NIPS). 2005, P. 1-8.
10. Ashish G., Satchidanada D. Evolutionary Algorithm for Multi-Criterion Optimization: A Survey. International Journal of Computing & Information Science. 2004, Vol. 2, No. 1, P. 43-45.
11. Loseva E. D. [Decision - making Scheme for Ensembles of Neural Networks in the Task of Intellectual Data Analysis]. MaterialyXXVMezhdunar. nauch. konf. "Aaktual'nyye problemy v sovremennoy nauke i puti ikh resheniya". [Materials of XXV International science conference "Actual problems in science and ways to solve them"]. (28 April 2016, Moscow). Moscow, No. 4 (25),
2016, P. 20-26 (In Russ.).
12. Burkhardt F., Paeschke A., Rolfes M., Sendlmeier W. F., Weiss B. A database of german emotional speech. Interspeech. 2005, P. 1517-1520.
13. Schmitt A., Ultes S., Minker W. A parameterized and annotated corpus of the cmu let's go bus information system. International Conference on Language Resources and Evaluation (LREC). 2012, P. 208-217.
14. Boersma P. PSat, a system for doing phonetics by computer. Glot international. 2002, No. 5(9/10), P. 341-345.
15. Akthar, F. and Hahne, C. Rapid Miner 5 Operator reference: Rapid-I. Dortmund. 2012, P. 25-55.
Библиографические ссылки
1. Ministry of Defence of the Russian Federation [Электронный ресурс]. URL: http://eng.mil.ru/en/structure/ forces/cosmic.htm/ (дата обращения: 02.6.2016).
2. Лосева Е. Д. Применение десяти алгоритмов для оптимизации параметров машины опорных векторов и оптимизации отбора информативных признаков в задаче распознавания пола и возраста человека // Современные концепции научных исследований : материалы XII Междунар. науч.-практ. конф. (27-28 марта 2015, г. Москва). 2015. Ч. 7, № 3. С. 132-136.
3. Speaker State Recognition with Neural Network-based Classification and Self-adaptive Heuristic Feature Selection / M. Sidorov [et al.] // International Conference on Informatics in Control (ICINCO). 2014. С. 699-703.
4. Holland J. H. Adaptation in Natural and Artificial System. University of Michigan Press, 1975. C. 18-35.
5. Shlens J. A Tutorial on Principal Component Analysis // Cornell University Library. 2003. С. 1-12.
6. Zitzler E. and Thiele L. Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach // In Evolutionary Computation. IEEET. 1999. No. 3(4). C. 257-271.
7. Saadia Minhas, Muhammad Younus Javed. Iris Feature Extraction Using Gabor Filter // International Conference on Emerging Technologies. 2009. С. 252-255.
8. Loseva E. D., Lipinsky L. V. Ensemble of networks with application of multi-objective self-configurable genetic programming // Vestnik SibGAU. 2016. Vol. 17, No. 1. С. 67-72 (In Russ.).
9. He X., Cai D., Niyogi P. Learning a Laplacian Score for Feature Selection // Advances in Neural Information Processing Systems 18 (NIPS). 2005. С. 1-8.
10. Ashish G., Satchidanada D. Evolutionary Algorithm for Multi-Criterion Optimization: A Survey // International Journal of Computing & Information Science. 2004. Vol. 2, no. 1. P. 43-45.
11. Loseva E. D. Decision - making Scheme for Ensembles of Neural Networks in the Task of Intellectual Data Analysis // Актуальные проблемы в современной науке и пути их решения : материалы XXV Между-нар. науч. конф. (28 апр. 2016, г. Москва). 2016. № 4 (25), С. 20-26.
12. A database of german emotional speech / F. Burkhardt [et al.] // Interspeech. 2005. С. 1517-1520.
13. Schmitt A., Ultes S., Minker W. A parameterized and annotated corpus of the cmu let's go bus information system // International Conference on Language Resources and Evaluation (LREC). 2012. С. 208-217.
14. Boersma P. PSat, a system for doing phonetics by computer // Glot international. 2002. No. 5(9/10). С. 341-345.
15. Akthar F., Hahne C. Rapid Miner 5 Operator reference: Rapid-I. Dortmund, 2012. С. 25-55.
© Loseva E. D., Sergienko R. B., 2016