UDC 004.891:625 DOI: 10.25513/2222-8772.2018.3.82-90
FUZZY IDEAS EXPLAIN A COMPLEX HEURISTIC ALGORITHM FOR GAUGING PAVEMENT CONDITIONS
Edgar Daniel Rodriguez Velasquez1'2
Instructor, e-mail: [email protected], [email protected]
Carlos M. Chang Albitres2 Ph.D. (Engr.), Associate Professor, e-mail: [email protected]
Vladik Kreinovich2 Ph.D. (Phys.-Math.), Professor, e-mail: [email protected]
1Universidad de Piura in Peru (UDEP) 2University of Texas at El Paso, El Paso, Texas 79968, USA
Abstract. To gauge pavement conditions, researchers have come up with a complex heuristic algorithm that combines several expert estimates of pavement characteristics into a single index — which is well correlated with the pavement's durability and other physical characteristics. While empirically, this algorithm works well, it lacks physical or mathematical justification beyond being a good fit for the available data. This lack of justification decreases our confidence in the algorithm's results — since it is known that often, empirically successful heuristic algorithms need change when the conditions change. To increase the practitioners' confidence in the resulting pavement condition estimates, it is therefore desirable to come up with a theoretical justification for this algorithm. In this paper, we show that by using fuzzy techniques, it is possible to come up with the desired justification.
Keywords: gauging pavement conditions, fuzzy logic.
1. Formulation of the Problem
It is important to gauge pavement conditions. Most roads are heavily used. Heavy traffic stresses the pavement. As a result, after several years, it is necessary to maintain — or sometimes even repair — the roads.
Roads repairs are expensive. It is therefore important to adequately gauge pavement conditions — so that we will be able to correctly decide which road segments need maintenance or repair, and which can wait a few more years.
This is especially important since it is known that a proper maintenance can make the road last much longer and thus, drastically decrease the need for expensive road repairs.
How pavement conditions are gauged now. One of the most frequently used technique for gauging pavement conditions is based on visual inspection of the pavement.
Visual inspection enables the inspectors to detect different types of problems — known as distresses. We can have buckling, we can have potholes, we can have cracks, etc. For each type of distress, inspectors:
• measure the area affected by this type of distress (or the length, for linear distresses like linear cracks), and
• use the results of these measurement to evaluate the severity of the corresponding distress.
The resulting data is then combined into a single pavement condition index (PCI).
The combination rules used in the computation of the PCI are selected so as to provide the most accurate prediction of the pavement durability. To improve the predictive quality, more and more complex algorithms are used; see, e.g., the latest international standard [1].
Problem. The problem is that the existing algorithm for gauging the pavement condition is heuristic. This algorithm has been selected purely empirically, it does not have any physical or mathematical justification — beyond being a good fit for the available data.
In general, heuristic methods often work well, but they are usually less reliable than theoretically justified algorithms - since they rely solely on the past experiences and when the situations change, we may need to change the algorithms as well. To increase the user's confidence in the PCI algorithm, it is thus desirable to come up with a theoretical justification for this algorithm.
What we do in this paper. In this paper, we provide the desired theoretical justification for the current state-of-the-art complex heuristic algorithm for gauging pavement conditions.
In this justification, we take into account the fact that this algorithm combines — somewhat subjective — inspector observations, observations which include information described not in numerical terms, but rather in terms of imprecise ("fuzzy') words from natural language, such as "high", "low", and "medium". Thus, to analyze this problem, it is reasonable to use techniques specifically designed for translating such knowledge into precise numbers — namely, fuzzy techniques; see, e.g., [3,8-10,12,13].
These techniques are what we will use in our justification.
2. The Current State-of-the-Art Algorithm for Gauging Pavement Conditions: A Brief Reminder
What we start with. For each road segment, this algorithm starts with the numbers x\,...,xn that describe the relative areas (or relative lengths) of the distresses within this segment.
First step: a non-linear transformation. First, an appropriate non-linear transformation fi(xi) is applied to each value Xi, resulting in so-called deduct values Si = fi(xi) ranging from 0 to 100 (or, equivalently, from 0 to 1). These non-
linear transformations fi(xi) are selected so that the resulting PCI have the largest correlation with the pavement's durability.
The deduct values are selected in such a way that larger values of the scores correspond to more severe distresses:
• the value 100 (or 1) corresponds to the most sever distress, while
• the value 0 corresponds to the absence of distress.
Second step: sorting the deduct values. The deduct values corresponding to distresses of different types are then sorted in the decreasing order, from the most severe to the least severe: s(1) ^ S(2) ^ ...
Third step: deciding how many deduct values to use. Based on the largest deduct value, we then decide how many deduct values to use. This number m of used deduct values is found from a formula
9
m =1 + 98 ■ (100 - 8(i)). (1)
We then use only the values S(1) ^ S(2) ^ ... ^ S(m).
Final step: combining deduct values. To combine the values S(1), ..., S(m), we do the following:
• first, we compute the sum of the largest deduce value s(1) and of m — 1 small values (equal to 2); we apply an appropriate non-linear transformation to transform this sum into the interval [0,100]; thus, we get the first combined deduct value c1;
• then, we compute the sum of the two largest deduct values and of m — 1 2s — and apply a different non-linear transformation to the resulting sum; thus, we get the second combined deduct value c2;
• after that, we compute the sum of 3 largest deduct values and m — 2 2s, and apply a yet different non-linear transformation to the resulting sum; thus, we get the third combined deduct value c3;
• then we repeat the same procedure for 4 largest deduct values, for 5 largest deduct values, etc., until we are combine all m deduct values.
As a result, we get m combined deduct values c1,c2,,..., cm.
After that, we take the largest of the resulting combined distress values
c =f max Ci. The PCI is simply 100 minus this largest value: PCI =f 100 — c.
i
The resulting combination of somewhat subjective estimates is indeed well-correlated with physical properties. The algorithm has been selected so as to provide the largest correlation with the pavement durability and other physical characteristic. For example, it has been shown that PCI is strongly correlated with the International Roughness Index that measures the passing vehicle's vibrations caused by the pavement's imperfection; see, e.g., [8].
Towards reformulating the final step. Our ultimate goal is to decide when a road segment needs maintenance or repair. This decision is made by comparing the PCI estimated for this segment with a certain threshold t0. The condition that
100 - C > ta
is equivalent to c ^ 100 — t0. In its turn, the condition that c = maxd ^ 100 — t0
i
is equivalent to requiring that q ^ 100 — t0 for each i. Each value q is obtained from the sum S(i) + ... + S(i) :
• by adding (m — i) values of 2 and
• by applying an appropriate non-linear transformation to the resulting sum. Thus, the condition a ^ 100 — t0 is equivalent to requiring that the sum
S(1) + ... + S(i) is greater than or equal to some threshold ti. Thus, we can reformulate the final step as follows.
Reformulation of the final step. To decide whether the given road segment needs repairs or maintenance, we check, for each i from 1 to m, whether
S(1) + S(2) + ... + S(i) ^ ti
for the corresponding threshold ti.
What needs explanation. Natural questions are:
• Why should we use sum and not any other combination function?
• Why should we consider the sum of a few largest distress values and not of all these values?
Why should we consider several sums instead of just one?
• Where does the formula for the number m of considered deduct values come from?
There can be many other questions, since the above procedure, with its emphasis on sorting and maxima, does not look like any physical formula — physics formulas very rarely use maxima.
3. Why Should We Use Sum and Not Any Other Combination Function: An Explanation
Let us start analyzing the problem. The road segment is good if there are not too many distresses of each type, i.e., if there are: few distresses of the first type and few distresses of the second type, etc. In other words, the pavement is good if:
• the first value x1 is small and
• the second value x2 is small, etc.
This looks like a typical phrase to be analyzed by fuzzy techniques. Namely, phrase is an "and"-combination of simpler phrases like "the value x1 is small", "the value x2 is small", etc. To assign a numerical value to the validity of this phrase, it makes sense:
• first, to estimate the degree to which each simple statement "xi is small" is true, and then
combine these degree of confidence into a single degree. This is exactly what we will do.
We need different membership functions for different i. In accordance with the usual fuzzy techniques, for each i and for each Xi, we need to come with a number di describing to what extend the given value Xi is small. Let us denote this number by ^i(xi). In fuzzy techniques, the corresponding function ^i(xi) is known as the membership function corresponding to the notion "small".
In the traditional application of fuzzy techniques, when we have several occurrences of the same word like "small", we use the same membership function. However, most fuzzy textbooks emphasize that this is not necessarily the case: for example, then transforming the size in meters into a number, "small" means two different things when referring to cats or to people — a cat the size of a small human being is, by cats' standards, a giant.
This is exactly the case here. For example, if x1 describes the relative area of severe distress, then x1 should really be small for this distress to be acceptable and not requiring any maintenance. However, for low severity distress x2, even if this distress takes a significant part of the road segment, by itself, this may not necessarily trigger any need for maintenance. Thus, in our case, we need different membership functions ^i(xi) for different i.
How to combine the degrees. In general, the problem of combining the degrees is as follows:
• we know the degrees a and b to which statements A and B are true, and
• we want to use these values a and b to estimate the degree to which a composite statement A & B is true.
In fuzzy logic, the corresponding estimate is called an "and"-operation (or, for historical reasons, a t-norm); let us denote it by f&(a,b).
In these terms, the desired degree of confidence that the road segment is good is equal to
/&(M^1),M^2),...). (2)
Natural conditions on an "and"-operation. The "and"-operation should satisfy several conditions. First, since A & B and B & A mean the same, it is reasonable to expect that the corresponding estimates for their degrees should be the same, i.e., that we should have f&(a,b) = f&(b,a) for all a and b. In other words, the "and"-operation should be commutative.
Similarly, since A &(A & C) and (A & B)& C means the same, we expect that the estimates of the degree of these two statement should be the same, i.e., that for all a, b, and c, we should have f&(a, f&(b,c)) = f&(f&(a,b),c). In other words, an "and"-operation should be associative.
There are several other reasonable properties; see, e.g., [3,8-10,12,13]. An "and"-operation that satisfies all these properties is usually what is called a t-norm.
Structure of a generic t-norm. Some t-norms have the form
f&(a,b) = g-1(g(a) + g(b))
(3)
for some increasing function g(z), where g-1(z) indicates an inverse function, for which g-1(g(z)) = z. Such t-norms are know as Archimedean.
For example, for the probability-inspired operation f&(a,b) = a • b, we get this form with g(z) = — ln(z). A more traditional way of representing Archimedean t-norms is by reducing them to the product, as f&(a,b) = h-1(h(a) • h(b)); this can be reduced to the above sum-based representation if we take g(a) = h(— ln(a)).
It is known (see, e.g., [5]) that for every t-norm f&(a,b) and for every e > 0, there exists an Archimedean t-norm f'&(a,b) which is e-close to f&(a,b), i.e., for which
|/&(a, b) — /&M)| ^ e
for all a and b. Since the expert's degrees of confidence are always approximate, and e can be arbitrary small, in practice, we can safely replace the original t-norm with an e-close Archimedean one — as long as e is small enough. Thus, without losing generality, we can safely assume that the t-norm f&(a,b) is Archimedean.
This explains why in gauging pavement conditions, we use sum. Indeed, the degree of confidence that the road segment is good is determined by the formula (2). As we have discussed, we can safely assume that the corresponding t-norm is Archimedean, i.e., that it is described by the formula (3).
Substituting the expression (3) into the formula (2), we conclude that the desired degree d has the form d = g-1(g(^1(x)) + g(^2(x2)) +...), i.e., equivalently,
the form d = g-1(s), where s = s1 + s2 + ..., Si = fi(xi), and f\(z) d=f g(fa(z)).
In particular, since the function g(z) is increasing, the condition that road is good enough, i.e., that d ^ d0 for some threshold d0, can be equivalently reformulated as s ^ t0 = g(d0). In other words, we get s1 + s2 +... ^ t0. This is exactly the sum-based formula used to estimate the desired degree — which is thus explained by fuzzy ideas.
4. Why Should We Consider the Sum of a Few Largest Distress Values And Not of All These Values?
Analysis of the problem: analyzed road segments are reasonably good. The
whole procedure makes sense when roads are reasonably well maintained and are in reasonable condition. If the road is in a clearly bad condition, there is no need to accurately gauge its quality, we just need to repair it.
The need for an accurate estimate of the road's quality occurs when we have several segments of reasonably good quality, and we need to find the way to maintain them and making them even better.
In such situations, most distress values Xi are small. When a distress value is very small, it does not affect the overall quality of a road segment.
Computational consequences of this analysis. Since small distress values do not affect the quality of a road segment, taking them into account would be a waste of computational resources.
To avoid this waste, it makes sense to ignore these very small values, and consider only a few largest distress values. This is exactly what is usually done: instead of taking the sum of all the values s1 + s2 + ..., we only consider the sum of the m largest values S(1) + S(2) +... + S(m). This is exactly what practitioners do.
5. Why Should We Consider Several Sums Instead of Just One?
General idea. If, based on the largest distress, we know that the road segment need repair or maintenance, there is no need to consider all other distresses. In this case, taking other distresses into account would be a waste of computational resources.
If, based on the first distress, we cannot make a definite conclusion, it is reasonable to also consider the second distress, etc.
Thus, instead of always taking all m distresses into account, it makes sense to first check just the largest distress, then two largest, then three largest. etc.
This is exactly what is done in practice.
This is a fuzzy analog of lazy logical operations. In classical 2-valued logic, if we want to find the truth value of a statement A & B and we know that A is false, there is no need to find the truth value of B — we can already conclude that the composite statement A & B is also false.
This simple observation saves us computation time. The corresponding operation is known as a lazy "and". This is the most commonly used "and"-operation in programming languages such as C or Java.
What we are describing here is the fuzzy analogue of such lazy "and"-operations. Indeed, when the first values S(1), S(2), ...are already large — corresponding to close-to-false (0) values of the corresponding degrees ^i(xi) - then there is no need to compute any further terms, we know that the road segment needs repair or maintenance.
6. Where Does the Formula for the Number m of Considered Deduct Values Come From?
Analysis of the problem. Suppose that we know the largest distress S(1). Let us denote, by S0, the overall distress level after which the road segment needs repairs or maintenance.
Let us denote, by s0, the smallest value of an individual distress that is still worth taking into account, so that values smaller than s0 can be safely set to 0. Then, if, in addition to the largest distress, we take into account m — 1 other non-zero distresses, we get the overall value S(1) + (m — 1) • s0. If this value is already larger than or equal to the threshold S0, this means that there is no need to consider any additional distresses — we already know that the road segment needs repairs or maintenance.
On the other hand, if among the m largest distresses, the smallest is already below s0 — and can hence be safely ignored - this means that all smaller distresses can also be ignored. So, considering more than m distresses also does not make sense.
Thus, in all possible cases, the largest number of distresses to be continued is the smallest m for which S(1) + s0 ■ (m — 1) ^ S0. In terms of m, this inequality can be reformulated in the equivalent form
m ^ 1 + — ■ (So — S(1)). So
So, the smallest possible value m that satisfies this property has the form
m =1 + — ■ (SO — S(1)). (4)
^0
This analysis explains the formula for the number m of considered deduct values. Indeed, (4) is exactly the formula used to estimate how many deduct values we need to take into account.
Acknowledgments
This work was supported in part by the National Science Foundation grant HRD-1242122 (Cyber-ShARE Center of Excellence).
References
1. ASTM International. Standard Practice for Roads and Parking Lots Pavement Condition Index Surveys. International Standard D6433-18.
2. Belohlavek R., Dauben J.W., Klir G.J. Fuzzy Logic and Mathematics: A Historical Perspective. Oxford University Press, New York, 2017.
3. Klir G., Yuan B. Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River, New Jersey, 1995.
4. Mendel J.M. Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions. Springer, Cham, Switzerland, 2017.
5. Nguyen H.T., Kreinovich V., Wojciechowski P. Strict Archimedean t-norms and t-conorms as universal approximators // International Journal of Approximate Reasoning. 1998, V. 18, Nos. 3-4. P. 239-249.
6. Nguyen H.T., Walker E.A. A First Course in Fuzzy Logic. Chapman and Hall/CRC, Boca Raton, Florida, 2006.
7. Novak V., Perfilieva I., Mockor J. Mathematical Principles of Fuzzy Logic. Kluwer, Boston, Dordrecht, 1999.
8. Park K., Thomas N.E., Lee K.W. Applicability of the International Roughness Index as a predictor of asphalt pavement condition // Journal of Transportation Engineering. 2007. V. 133, No. 12. P. 706-709.
9. Zadeh L.A. Fuzzy sets // Information and Control. 1965. V. 8. P. 338-353.
НЕЧЁТКИЕ ИДЕИ ОБЪЯСНЯЮТ СЛОЖНЫЙ ЭВРИСТИЧЕСКИЙ АЛГОРИТМ ДЛЯ ОПРЕДЕЛЕНИЯ УСЛОВИЙ ДОРОЖНОГО ПОКРЫТИЯ
Эдгар Даниэль Родригес Веласкес1,2
преподаватель, e-mail: @utep.edu Карлос М. Чанг Альбитрес2
к.т.н, доцент, e-mail: [email protected] В. Крейнович2
к.ф.-м.н., профессор, e-mail: [email protected]
1 Университет Пиуры в Перу (UDEP) 2Техасский университет в Эль Пасо, США
Аннотация. Чтобы оценить условия дорожного покрытия, исследователи придумали сложный эвристический алгоритм, который объединяет несколько экспертных оценок характеристик дорожного покрытия в единый индекс, хорошо коррелирующий с долговечностью покрытия и другими физическими характеристиками. Эмпирически этот алгоритм работает хорошо, но ему не хватает физического или математического обоснования, он просто хорошо подходит для доступных данных. Это отсутствие обоснования уменьшает нашу уверенность в результатах алгоритма — известно, что эмпирически успешные эвристические алгоритмы часто нуждаются в изменении, когда меняются условия. Поэтому, чтобы повысить уверенность практиков в оценках состояния дорожного покрытия, желательно придумать теоретическое обоснование этого алгоритма. В этой статье мы показываем, что, используя нечёткие методы, можно придумать желаемое обоснование.
Ключевые слова: оценка условий дорожного покрытия, нечёткая логика.
Дата поступления в редакцию: 30.06.2018