WINE QUALITY ASSESSMENT BASED ON PHYSICOCHEMICAL CHARACTERISTICS

Starostina V.S.; Kaplina S.E.

Literature:

1. https://nsportal.ru/ap/library/drugoe/2014/02/01/vydelenie-prirodnykh-krasiteley-iz-rasteniy-i-ikh-primenenie

2. Internet networks

UDC 004.8

Starostina V.S.

2 st-year master degree student, Transbaikal State University, Chita, Russia Scientific supervisor: Kaplina S.E.,

Doctor of Pedagogical Sciences, Professor, Transbaikal State University, Chita, Russia

WINE QUALITY ASSESSMENT BASED ON PHYSICOCHEMICAL CHARACTERISTICS

Abstract

In this study, the quality of wines has been assessed based on their physicochemical properties in order to determine the relationship between them and identify the optimal parameters for predicting the target indicator. The study has showed a significant correlation between various characteristics and the quality of wine, including the level of alcoholism. Using various data analysis methods such as linear regression, random forest and the nearest neighbor method, models have been built to predict the quality of wines. The results obtained make it possible to better understand which physicochemical parameters affect the quality of wine, and provide an opportunity to more accurately assess its characteristics based on available data.

Keywords:

artificial intelligence, machine learning, forecasting, data analysis, wine

Wine quality assessment is an important task in winemaking and wine sales. Historically, the quality of wine is determined by testing at the end of production, this happens through tasting by experts, which spends a lot of money and time, however, with the development of machine learning technologies, the possibilities of automated quality assessment are becoming more attractive. In this article, we will consider an approach to assessing the quality of white and red wine based on physicochemical characteristics using machine learning algorithms.

To perform the analysis, a publicly available dataset was used, containing information on the physicochemical characteristics of various types of wine, as well as their quality assessment. A dataset is a structured array of data collected on a specific topic and requires preprocessing to eliminate errors and inaccuracies that may distort the results of the analysis. With the development of machine learning, the topic of wine quality assessment has been analyzed on various data sets. In this study, the dataset was obtained from the Kaggle database [1]. This dataset refers to the red and white Portuguese "Vinho Verde" wines and contains

11 physicochemical characteristics: fixed acid, volatile acid, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulfates, and alcohol. The output variable (based on sensory data): quality (score between 0 and 10).

These signs are the key physicochemical characteristics of wine and can have an impact on its quality. Data on the quality of wine are presented in the form of estimates reflecting the general impression of experts about wine based on its parameters. Before using the data for analysis and building machine learning models, preprocessing was carried out to eliminate possible errors and outliers, as well as to bring the data to a uniform format.

In order to ensure the reliability and accuracy of the analysis, preprocessing steps were undertaken to cleanse the dataset of any inconsistencies or anomalies that could potentially skew the results. This involved techniques such as handling missing data, outlier detection and removal, and normalization or standardization of features to bring them to a consistent scale.

Overall, the preprocessing and exploratory analysis stages are crucial steps in the data analysis pipeline, laying the foundation for the subsequent application of machine learning algorithms to predict wine quality based on its physicochemical characteristics [2].

In order to gain a deeper insight into the distribution and attributes of every item within the dataset, it is essential to compute statistical metrics for each individual entity. The table 1 offers a comprehensive overview of the data pertaining to each specific trait.

Table 1

Descriptive statistics of the variables of data

count mean std min 25% 50% 75% max

fixed acidity 6487 8,32 1,74 4,60 7,10 7,90 9,20 15,90

volatile acidity 6487 0,53 0,18 0,12 0,39 0,52 0,64 1,58

citric acid 6487 0,27 0,19 0,00 0,09 0,26 0,42 1,00

residual sugar 6487 2,54 1,41 0,90 1,90 2,20 2,60 15,50

chlorides 6487 0,09 0,05 0,01 0,07 0,08 0,09 0,61

free sulfur dioxide 6487 15,87 10,46 1,00 7,00 14,00 21,00 72,00

total sulfur dioxide 6487 46,47 32,90 6,00 22,00 38,00 62,00 289,00

density 6487 1,00 0,00 0,99 1,00 1,00 1,00 1,00

pH 6487 3,31 0,15 2,74 3,21 3,31 3,40 4,01

sulphates 6487 0,66 0,17 0,33 0,55 0,62 0,73 2,00

alcohol 6487 10,42 1,07 8,40 9,50 10,20 11,10 14,90

quality 6487 5,64 0,81 3,00 5,00 6,00 6,00 8,00

A heat map is a graphical representation of data in which the values of each feature are represented in color according to their magnitude. This clearly shows the strengths and weaknesses of the relationships between the features in the dataset [3].

A high positive correlation indicates that an increase in the value of one attribute is accompanied by an increase in the value of another, while a high negative correlation indicates an inverse relationship. This analysis helps us determine which characteristics of the wine affect its quality most strongly, and identify the most significant signs for subsequent research. Below is a diagram of the heat map, which demonstrates that the quality of wine is influenced by the main components such as alcohol, sulfates and citric acid, while the pH level has the least effect (figure 1).

In this article, we have found out that the quality of wine strongly depends on alcoholism. Therefore, let's consider this relationship by converting the wine quality indicator into a binary form, which takes only two values - 0 and 1, where 0 is a low score (range 0-5), and 1 is a high score (range 6-10).

fixed acidity volatile acidity citnc acid residual sugar chlorides free sulfur dioxide total sulfur dioxide

1 4 26 067

4 26 1 4 55

- 067 4 55 1

014 0 2 -0 061 0 036 0 )6 -0 54 1ЯЭЯ

Oil 00019 0.14 HH 3 056 019 02 0 36 4 086 00055 0094 0 061 0.2 0056 ^^H 00056 0047 02 -0 27 ' 0 <7

Л15 flOll «061 019 0 0056

-0 11 0 076 0 036 0 2 0 047

FQfl О 022 007 0052

0071 4066 0.043

■0 34 015

0022 0 36 0 36 02 -0 022 0071

•0.68 0 23 -0.54 -0 086 -0 27 0.07 -0 066 -0.34

01В «26 0 31 00055 0 37 0 052 0 043 015 4 2

-0 062 -0 2 O il 0042 4 22 4 069 4 21 4 5 0 21 0094

012 4 39 0 23 0 014 4 13 4 051 4 19 4 17 4 058 025

* * £

я я js

5 S £

и -и e

Figure 1 - Heat map of the signs

Having drawn up a schedule for evaluating the quality of red and white wine, you can see that the lowest rating is given to wine with a low alcohol content and vice versa, wines with the highest alcohol content receive the highest rating (figure 2).

Figure 2 - The relationship between alcoholism and quality assessment

When assessing the quality of wine, it was decided to divide the data into training and test sets to ensure adequate verification of the model's performance. The data was divided in such a way that 80% of the total data was a training set, and the remaining 20% was a test set [4].

This approach allows the model to be trained on a sufficient amount of data to capture patterns and patterns, while retaining some of the data in the test set to evaluate its performance on new, previously unknown data.

This approach to data separation helps to avoid overfitting the model (adapting too closely to the training data) and gives us a more objective assessment of how well the model generalizes data and is able to make predictions on new data.

Linear regression is used to predict a quantitative value based on a linear combination of input features. In this case, it was used to predict the assessment of wine quality based on its physicochemical characteristics. The accuracy of the linear regression was 0.2. Low accuracy may indicate insufficient ability to capture nonlinear dependencies.

A random forest is an ensemble of decision trees, each of which is trained on a random subset of data and predicts the result. Then the results of all the trees are averaged or a decision is made based on a vote. The accuracy of the random forest for wine quality assessment was 0.83. The high accuracy of the Random Forest Classifier model indicates its ability to effectively capture the complex relationships between the physicochemical characteristics of wine and its quality.

The k-nearest neighbor method is used to classify objects based on their proximity to the k nearest objects of the training sample [5]. The accuracy of the K-neighbors classifier model is 0.69, although lower than that of the random forest model, it still demonstrates the ability of this method to adequately classify wine. The accuracy of this model can be improved by additional hyperparameter settings.

Data analysis shows that random forest is the most effective algorithm for predicting wine quality based on its chemical characteristics, having high accuracy compared to other algorithms.

The research has revealed several key factors influencing the determination of wine quality:

- Alcohol: an increased alcohol content is associated with a higher quality of wine, which confirms its important role in the formation of taste characteristics.

- Sulfates: The high content of sulfates is also associated with improving the quality of wine, which indicates their importance for aroma and taste.

- Citric acid: the increased content of citric acid also plays an important role, indicating its effect on the quality of wine.

On the other hand:

- Volatile acidity: a low content of volatile acidity is considered a sign of a good wine, which indicates a negative effect of this parameter on quality.

- Sulfur dioxide: A high content of sulfur dioxide is associated with a decrease in the quality of wine, emphasizing the importance of controlling this parameter for the production of high-quality wine.

Thus, based on the analysis, it is possible to identify certain criteria for assessing the quality of wine, which will be useful to winemakers and specialists in this field in the production and selection of high-quality wines. Refrences

1. Piyush Bhardwaj, Parul Tiwari, Kenneth Olejar, Wendy Parr, Don Kulasiri. A machine learning application in wine quality prediction // Machine Learning with Applications URL: https://doi.org/10.1016/j.atech.2023.100202 (дата обращения 20.05.2024)

2. Bradley A. King, Krista C. Shellie. A crop water stress index based internet of things decision support system for precision irrigation of wine grape // Smart Agricultural Technology URL: https:// doi.org/10.3390/beverages7040078 (дата обращения 22.05.2024)

3. K. R. Dahal, J. N. Dahal, H. Banjade, S. Gaire. Prediction of Wine Quality Using Machine Learning Algorithms // Open Journal of Statistics. 2021. № .2, С. 278-289.

4. Harrison Fuller, Chris Beaver, James Harbertson. Alcoholic Fermentation Monitoring and pH Prediction in Red

and White Wine by Combining Spontaneous Raman Spectroscopy and Machine Learning Algorithms // Beverages. 2021. №7, Pp. 11-22.

5. Quanyue Xie. Machine Learning on Wine Quality: Prediction and Feature Importance Analysis // Highlights in Science, Engineering and Technology, 2023, №41, Pp.170-174.

УДК 62

Агабаев Н., старший преподаватель, Туркменский государственный архитектурно-строительный институт,

Ашхабад, Туркменистан Джумаханов А., старший преподаватель, Туркменский государственный архитектурно-строительный институт,

Ашхабад, Туркменистан

СТРОИТЕЛЬНО-ДОРОЖНЫЕ МАШИНЫ И ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ: БУДУЩЕЕ СТРОИТЕЛЬНОЙ ОТРАСЛИ

Аннотация

Строительно-дорожные машины и искусственный интеллект - два технологических направления, которые с каждым годом становятся все более востребованными в современном мире.

Ключевые слова:

строительство и дорожное строительство, транспортная интеллектуальная система, компьютерная наука, Искусственный интеллект.

Строительство и дорожное строительство - это важные отрасли, которые обеспечивают развитие инфраструктуры и экономики страны. Однако, с ростом технологий и цифровизации, в этих отраслях все большую роль начинает играть искусственный интеллект.

Искусственный интеллект (ИИ) - это область компьютерных наук, которая занимается разработкой и созданием интеллектуальных систем, способных выполнять задачи, требующие человеческого мышления и принятия решений. В последние годы, благодаря развитию машинного обучения и нейронных сетей, ИИ стал все более доступным и применимым в различных сферах, включая строительство и дорожное строительство.

Одной из главных задач ИИ в строительстве и дорожном строительстве является оптимизация процессов и повышение эффективности работ. С помощью алгоритмов машинного обучения, ИИ может анализировать большие объемы данных и предсказывать оптимальные пути выполнения работ, учитывая различные факторы, такие как погода, географические особенности и технические характеристики машин.

Кроме того, искусственный интеллект может использоваться для управления и контроля за работой строительно-дорожных машин. С помощью датчиков и камер, ИИ может отслеживать состояние и производительность машин, а также предупреждать о возможных поломках или неисправностях. Это позволяет операторам машин быстро реагировать на проблемы и минимизировать простои оборудования.

Искусственный интеллект может быть применен для создания виртуальных моделей объектов и территорий, что позволяет проводить более точное планирование и проектирование строительных работ. Также, ИИ может помочь в оптимизации расходов на материалы и ресурсы, что важно для экономически эффективного строительства.

WINE QUALITY ASSESSMENT BASED ON PHYSICOCHEMICAL CHARACTERISTICS Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Starostina V.S., Kaplina S.E.

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Starostina V.S., Kaplina S.E.

Текст научной работы на тему «WINE QUALITY ASSESSMENT BASED ON PHYSICOCHEMICAL CHARACTERISTICS»