Вычислительные технологии
Том 17, № 1, 2012
Extreme distributions on intervals
M. Chiangpradit1, W. Panichkitkosolkul1, H. T. Nguyen2, V. Kreinovich3 1King Mongkut's University of Technology North, Bangkok, Thailand 2 New Mexico State University, Las Cruces, USA 3 University of Texas at El Paso, USA e-mail: {monchaya_c, wararit_tu}@hotmail. com, hunguyenSnmsu.edu, vladikSutep.edu
One of the main tasks of interval computation is to analyze situations in which we only know the lower and upper bounds on the desired quantity, i.e., we only know an interval that contains this quantity. One of the objectives of such analysis is to make decisions. According to decision theory, a consistent decision making procedure is equivalent to assigning probabilities to different values within each interval. Thus, we arrive at the problem of describing a natural probability distribution on an interval. In this paper, we describe such a distribution for the practically important case of the "weakest link" arrangement, when the collapse of each link is catastrophic for a system. This situation occurs in fracture mechanics, when a fracture in one of the areas makes the whole plane inoperable, in economics, when the collapse of one large bank or one country can have catastrophic consequences, etc.
Keywords: extreme distributions, interval, decision making, symmetries.
Introduction
Need to make decisions. One of the main practical objectives of science and engineering is to make decisions, i.e., to select an alternative which is the best for the decision maker.
How to describe preferences of a decision maker: the notion of utility. A standard way to describe preferences of a decision maker is to use the notion of utility; see, e.g., [7, 8, 10, 12, 14]. To describe the utility of an outcome A, we need to select two extreme outcomes: a very unfavorable alternative A- and a very favorable outcome A+.
We assume that all outcomes A in which we are interested are better than A- and worse than A+. If we denote the relation "the decision maker prefers A' to A" by A < A', then we can describe this assumption as A- < A < A+.
Then, for each probability p G [0,1], we can consider a lottery L(p) in which we have A+ with probability p and A- with the remaining probability 1 — p.
For p =1, the lottery L(p) coincides with A+, so we have A < L(1). For p = 0, the lottery L(p) coincides with A-, so we have L(0) < A. The larger p, i.e., the larger the probability of a beneficial event A+, the more beneficial is the lottery L(p) for the decision maker. So, if p < q, then L(p) < L(q).
Let po be the infimum (greatest lower bound) of the set of all the values p for which A < L(p). Then:
• When p < p0, then for p = (p + p0)/2, we have p < p0 and thus, by definition of the infimum, we cannot have A < L(p). Thus, we have L(p) < A. Since p < p, we have L(p) < L(p) < A and thus, L(p) < A.
• When p > p0, then, since p0 is the greatest lower bound, p is not a lower bound, i.e., there exists a value p for which A < L(p) and p < p. Since p < p, we have L(p) < L(p) hence A < L(p).
Thus, we have the value p0 that has the following property:
• when p < p0, the corresponding lottery is worse than the event A:
L(p) < A;
• when p > p0, the corresponding lottery is better than the event A:
L(p) > A.
This threshold value p0 is called the utility of the event A. The utility is usually denoted by u(A).
We can simplify the above somewhat complicated relation between A and p0 by saying that the event L(p0) is equivalent to A. We will denote this equivalence by A ~ L(p0).
How to describe the utility of an action with uncertain consequences. In
practice, we can rarely predict the exact consequences of each decision. The consequences depend on the circumstances. For example, if we decide whether to take an umbrella or not, the consequences of this decision depend on whether it will rain or not. In the ideal situation, we know the probabilities pi,... ,pn of different possible consequences E^ ..., En. In other words, the action leads to Ei with probability pi, to E2 with probability p2, ..., and to En with probability pn.
By definition of the utility, the event Ei is equivalent to a lottery L(u(Ei)) in which we get A+ with probability u(Ei), the event E2 is equivalent to a lottery L(u(E2)) in which we get A+ with probability u(E2), etc. Thus, the original action is equivalent to the composite lottery, in which:
• with probability pi, we get a lottery that results in A+ with probability u(Ei), and in A- otherwise;
• with probability p2, we get a lottery that results in A+ with probability u(E2), and in A- otherwise;
In this composite lottery, we get either A+ or A-, and the probability of getting A+ can be easily computed as
u =f pi ■ u(Ei) + p2 ■ u(E2) + ... + Pn ■ u(En).
Thus, the original action is equivalent to the lottery L(u). By definition of the utility, this means that the utility of the action is equal to u.
From the mathematical viewpoint, u is the expected value of the utility of different consequences, so we can conclude that the utility of an action is the expected value of utilities of its consequences.
What if we do not know the probabilities of different consequences? In many practical situations, we do not know the exact values of the probabilities of different consequences. For each event Ej, we can estimate its .subjective probability ps(Ej) as the probability pi for which the lottery L(pj) (in which we get A+ with probability pi, otherwise we
get A-) is equivalent to the new "lottery" L(Ei) in which we get A+ if Ei occurs, otherwise we get A-.
In other words, we determine the subjective probability ps(Ei) as the utility u(L(Ei)) of the new lottery L(Ei).
In practice, it is sometimes difficult to ask experts. The traditional utility theory approach — that we described above — is to elicit, from the experts, all the information about their preferences and their subjective probabilities.
In some applications, e.g., when we undertake a large project, this is possible and reasonable. For example, decision theory has been used to select a location for a major airport. With such a long-term billion-dollar investment that affects many potential users, it makes sense to spend a certain amount of time and resources to get a clear picture of user preferences.
However, often, we face decisions which need to be made fast and which are not that critical. In such situations, we do not have time to elicit all the values of subjective probabilities, and, even when we have some time for such an elicitation, we may end up spending more resources on this elicitation than we gain from knowing these probability values.
Need to get reasonable probability distributions. A typical situation is when the consequences of an action depend on some quantity a, we do not know the actual probabilities of different values of this quantity, and we have no time and/or resources to elicit subjective probabilities of different values a.
In this case, we need to come up with a reasonable probability distribution for a, a distribution that will be used in decision making.
Need to assign distributions on an interval. Usually, we know some bounds on each quantity, i. e., we know that the value a is always larger than or equal to some value a and always smaller than or equal to some value a. In other words, we know that the value a belongs to the interval [a, a].
In this case, we need a natural way to assign probabilities on an interval.
What we do: consider weakest link case. In this paper, we consider a practically important case of the "weakest link" arrangement.
Informally, this means that we have a multi-link system, and the collapse of each link is catastrophic for a system. Such situations are typical in economics, when the collapse of one large bank or one country can have catastrophic consequences. They are also typical in fracture mechanics, when a fracture in one of the areas makes the whole structure (e. g., an airplane wing) inoperable.
1. Analysis of the problem
Weakest link: a usual mathematical description. The weakest link situation is usually described as follows: the quality of each link i is characterized by a value vi, and the quality of
a system as a whole is determined by the smallest of the corresponding values vi: v =f min vi.
i
It is reasonable to assume that the values vi are independent random variables. In mathematical terms, this means that we are looking for a distribution of the minimum
min vi of several independent variables. When n is a large, we get close to the limits of such
i
distributions.
Limit distributions of minimum or maximum of independent identically distributed random variables (properly centered and normalized) are usually called extreme value distri-
butions; see, e.g., [1, 5, 6, 9, 15]. There is a known description of such distributions; they include Frechet, Gumbel, and Weibull distributions.
A usual mathematical description of the weakest link distributions: limitation. Alas, none of the known extreme value distributions describe a random variable which is located on a finite interval.
What we plan to do. Our objective is to describe extreme value distributions located on an interval. To describe such distributions, we will use the symmetries approach.
In order to be able to do that, let us first show that the known extreme value distributions can also be derived from the appropriate symmetries. We will then show how this derivation can be extended to the new case — when a random variable is limited to an interval.
Known extreme value distributions: main ideas behind derivation based on symmetries. We want to find the probability distribution of the extreme values. Traditionally, a probability distribution is described by a cumulative distribution function F (v0) =f Prob(v < v0) that describes the probability that a random variables does not exceed a given number v0. However, from the practical viewpoint, we are interested in probabilities of rare events, i.e., in the probabilities that v exceeds the given value v0. Thus, for extreme value distributions, it is more convenient to use the corresponding function G(v0) = 1 — F(v0) = Prob(v > v0).
In deriving the types of distributions, it is usually taken into account that the numerical value of a physical quantity v depends:
• on the choice of a measuring unit v — a • v (e. g., 1.7 m = 170 cm), and
• on the choice of the starting point v — v + b (e. g.: A.D. or since the French Revolution).
Under these transformations, the original function G(v0) turns into a new re-scaled function G(a • v0 + b). It is therefore reasonable, instead of looking for a single function G(v0), to look for a family G of distributions {G(a • v0 + b)}a,b obtained from some function G(v0).
By definition of an extreme distribution as a minimum, if n independent identically distributed variables vi are distributed according to the extreme value distribution, then their minimum v' =f min vi is the minimum of minimums, so it should also be distributed according to the extreme value distribution. Clearly, v' > v0 ^ vi > v0 & ... & vn > v0; so, since vi are independent, we conclude that
n
G'(v0) = Prob(v > v0) = J]Prob(vi > v0) = (GM)n
i=i
Thus, the desired family G should contain, with each function G(v0), also a function G'(v0) =
(Gi(v0))n.
Similarly, for the maximum v'' of a • n values, we conclude that the function G''(v0) = (G(v0))a'n belongs to the family G, where G''(v0) = (G'(v0))a.
It is therefore reasonable to conclude that if G(v0) G G, then Ga(v0) G G for all a. By definition of the family G, this means that for every a, there exist a(a) and b(a) such that Ga(v0) = G(a(a) • v0 + b(a)).
Extreme value distributions: symmetry-based derivation of the known formulas. The above functional equation can be simplified if we consider an auxiliary function g(v0) == — ln(G(v0)). For this auxiliary function, the above formula takes the form a • g(v0) = g(a(a) • v0 + b(a)).
When a = 1, we have a(a) = 1, and b(a) = 0. Differentiating both sides of the above
formula by a and taking a = 1, we get g = ■ (a ■ vo + b), i. e., — =--—-, where we
dv0 g a ■ v0 + b
denoted a == a'(1).
When a = 0, integration leads to ln(g) = + c, so g(vo) = exp + cj and G(vo) =
( fv0 W TTT1 , „ r def A . , A , , dg dv
exp — exp — + c . When a ^ 0, lor v = v0 + Aw, with Aw = b/a, we get — = -
Vvb// ga ■ v
hence ln(g) = a ■ ln(v) + c, so g = c ■ va = c ■ (v0 — Av)a, hence
G(vo) = exp (-c ■ (vo - Av)a).
Comment. Actually, we get two different types of distributions depending on whether a > 0 or a < 0.
Not all linear transformations are physically meaningful. The above derivations are based on the assumption that we have linear symmetries v0 ^ a ■ v0 + b.
For some quantities like time or temperature, all values are possible, so we have both shift- and scale-invariance.
For other quantities, only some values are possible. For example, height can only take non-negative values, i.e., possible values are limited to the set [0, to). In this case, only linear transformations that preserve this set make physical sense. In other words, we only consider scalings v0 ^ a ■ v0.
How to extend this analysis to distributions on an interval: discussion. For the quantities whose values are limited to a fixed interval [v,v], it also makes sense to restrict ourselves to linear transformations that preserve this set \v,v] of possible values. However, the only linear transformation that preserves this interval is identity.
Our solution: to go beyond linear symmetries, to more general (non-linear) symmetries.
Basic nonlinear symmetries: reminder. Sometimes, a system also has nonlinear symmetries. How can we describe the set of such symmetries?
If a system is invariant under transformations f (x) and g(x), then:
• it is invariant under their composition f o g, and
• it is invariant under the inverse transformation f-1.
In mathematical terms, this means that symmetries form a group.
In general, we may have transformation groups that require infinitely many parameters: for example, the group of all possible transformations, i. e., all possible one-to-one functions from the real line to itself. However, in practice, at any given moment of time, we can only store and describe finitely many parameters. Thus, it is reasonable to restrict ourselves to transformation groups whose elements can be described by finitely many parameters, i.e., to finite-dimensional groups.
Thus, we arrive at the following problem: describe all finite-dimensional transformation groups that contain all linear transformations. This question was first formulated by N. Wiener, the father of cybernetics, in [17]. For an Euclidean space of arbitrary dimension n, such Lie groups have been classified in [16]. In particular, for our case n =1, the only such groups are the group of linear mappings and the group of all fractionally-linear mappings
/(*) = (!)
c ■ x + d
Since we are interested in non-linear re-scalings, we should therefore consider re-scalings of the type (1).
Resulting idea. In the linear case, we required that Ga(v0) is equal to the result G(a(a) • v0 + b(a)) of applying a linear transformation v0 — a(a) • v0 + b(a) to v0.
Now, we similarly require that Ga(v0) is equal to the result of applying a fractionally-linear transformation, i.e., that for every a, there exist a(a), b(a), c(a), and d(a) for which
G«M = G,'«(«)■» o + b(a)
c(a) • v0 + d(a) for some transformation
a(a) • v0 + b(a) c(a) • v0 + d(a)
that preserves the set \v,v] of possible values.
Side observation: symmetries explain the basic formulas of Neural Networks.
Fractionally-linear transformations have been actively used; for example, they were used to explain the empirically successful form f(x) = 1/(1 + e-x) of the activation function, i.e., a function that is used in describing how the output y of a neuron is related to its inputs
def n
x1,..., xn: y = f (x), where x = wi • xi — w0, for some real numbers wi.
i=1
The details of this explanation are given, e. g., in [13]. The main idea behind this explanation is as follows. The input x is only determined modulo starting point. If we change the starting point for measuring xi, then the original value x changes into x + s. When we apply the activation function f (x) to this changed input, we get the value f (x + s).
In other words, changing the starting point means that we replace the original activation function f (x) with a new activation function f (x + s). It is reasonable to require that the new output f (x + s) is equivalent to the original output f (x) modulo an appropriate transformation. We have already shown that all appropriate transformations are fractionally linear. Thus, we conclude that for every s, there exist values a(s), b(s), c(s), and d(s) for which we have:
a(s) • /(x) + b(s) H } c(s) ■ f(x) +d(s)'
Differentiating both sides of this functional equation by s and equating s to 0, we get a differential equation for f (x). Its known solution is the above activation function — which can thus be explained by symmetries.
2. Extreme distributions on an interval: derivation and the main result
Reduction to [0,1]. Before we start our derivation, let us observe that every interval can be linearly reduced to the interval [0,1]. Thus, it is sufficient to consider the case when [v,v] = [0,1].
Fractionally-linear transformations that preserve the interval [0,1]. According to our idea, we must describe all fractionally-linear transformations
,,, , a • x + b
fw =-71
c • x + d
that preserve the interval [0,1].
First, dividing both numerator and denominator of the fractionally-linear formula by d, and using a/d, b/d, and c/d instead of the original values of a, b, and c, we get a simplified expression
,, , a ■ x + b
/W = -•
1 + c ■ x
For a monotonic transformation to preserve [0,1], we must have f (0) = 0 and f (1) = 1. Substituting the above formula for f (x) into the equation f (0) = 0, we can conclude that
b = 0. Substituting the above expression (with b = 0) into the equation f (1) = 1, we a
conclude that -= 1, hence c = a — 1 and
1 + c
a * x
x) =
1 + (a — 1) ■ x
Resulting reformulation of our problem. Now, we can reformulate our problem as follows: for every a, there exists a(a) for which
G>0) = G ' a(a) '
1 + (a(a) — 1) ■ vo J '
Derivation of the formula. By taking logarithms of both sides of the above formula we get
( a(a) ■ vo
a • g{v0) = g
vo + (a(a) — 1)
When a =1, there is no transformation, so a(a) = 1.
Differentiating both sides of the above equation by a and substituting a = 1, we get the differential equation
dg f 2\
g = —— ■ {a ■ v0 - a ■ v0). dv0
Moving all the terms depending on g to the left-hand side and all the terms depending on v0 to the right-hand side, we conclude that
dg dv0
2
g a ■ v0 — a ■ v2
The fraction in the right-hand side can be represented as the sum of two simpler fractions:
dg 1 { 1 1 ' +
g a v0 1 — v0 Now, we can explicitly integrate both sides. As a result, we get the following formula:
Info) = - • (lnfoo) - ln(l - Vo)) + c = - • In + C,
a a 1 — v0
hence
'1 — vo N C g(v0) = /3 ' 0
v0
for some parameter C.
For a general interval [v,v], we get
C
g(v0) = p •
v — v0
Vo-V
Exponentiating, we get G(v0) = exp(—g(v0)), hence we arrive at the following result:
Result. For variables on an interval [v_,v], the general extreme distribution has the following form
Discussion. These distributions were empirically found in fracture mechanics by A. Chudnovsky and B. Kunin [2-4, 11].
For the specific case of C = 0, we get a uniform distribution — a usual distribution on an interval.
Acknowledgments
This work was supported in part by the National Science Foundation grants HRD-0734825 and DUE-0926721, by Grant 1 T36 GM078000-01 from the National Institutes of Health, by Grant MSM 6198898701 from MSMT of Czech Republic, and by Grant 5015 "Application of fuzzy logic with operators in the knowledge based systems" from the Science and Technology Centre in Ukraine (STCU), funded by European Union. The work was partly done when Monchaya Chiangpradit and Wararit Panichkitkosolkul were visiting researchers with New Mexico State University. M. Chiangpradit was supported by the Thailand Commission on Higher Education Strategic Scholarships for Frontier Research Network program.
The authors are very thankful to Sa-aat Niwitpong for his encouragement and support, to all the participants of the the 14th GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics SCAN'2010 (Lyon, France, September 27-30, 2010) for useful discussions, and to the anonymous referees for helpful suggestions.
References
[1] Beirlant J., Goegebeur Y., Segers J. et al. Statistics of Extremes: Theory and Applications. N.Y.: Wiley, 2004.
[2] Chudnovsky A., Botsis J., Kunin B. The role of microdefects in fracture propagation process // Cracking and Damage. Strain Localization and Size Effect / Eds. J. Mazais, Z. Bazant. Amsterdam: Elsevier, 1988. P. 140-149.
[3] Chudnovsky A., Kunin B. A probabilistic model of brittle crack formation //J. Appl. Phys. 1987. Vol. 62(25). P. 4124-4129.
[4] Chudnovsky A., Kunin B. On applications of probability in fracture mechanics // Computational Mechanics of Probabilistic and Reliability Analysis / Eds. W.K. Liu, T. Belytschko. Lausanne, Switzerland: Elmepress Intern., 1989. P. 396-415.
[5] Coles S. An Introduction to Statistical Modeling of Extreme Values. Berlin, Heidelberg, N.Y.: Springer, 2001.
[6] Embreohts P., Klueppelberg C., Mikosch T. Modelling Extremal Events: For Insurance and Finance. Berlin, Heidelberg, N.Y.: Springer, 2010.
[7] Fishburn P.C. Utility Theory for Decision Making. N.Y.: John Wiley & Sons Inc., 1969.
[8] Fishburn P.C. Nonlinear Preference and Utility Theory. Baltimore, Maryland: The John Hopkins Press, 1988.
[9] Gumbel E.J. Statistics of Extremes. N.Y.: Dover, 2004.
[10] Keeney R.L., Raiffa H. Decisions with Multiple Objectives. N.Y.: John Wiley and Sons, 1976.
[11] Kunin B. A Probabilistic Model for Predicting Scatter in Brittle Fracture. PhD Dissertation. Department of Mathematics, Statistics, and Computer Science. Univ. of Illinois at Chicago, 1992.
[12] Luce R.D., Raiffa R. Games and Decisions: Introduction and Critical Survey. N.Y.: Dover, 1989.
[13] Nguyen H.T., Kreinovich V. Applications of Continuous Mathematics to Computer Science. Dordrecht: Kluwer, 1997.
[14] Raiffa H. Decision Analysis. Reading, Massachusetts: Addison-Wesley, 1970.
[15] Resnick S.I. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Berlin, Heidelberg, N.Y.: Springer, 2006.
[16] Singer I.M., Sternberg S. Infinite groups of Lie and Cartan. Part 1 // J. d'Analyse Math. 1965. Vol. XV. P. 1-113.
[17] Wiener N. Cybernetics, or Control and Communication in the Animal and the Machine. Cambridge, Massachusetts: MIT Press, 1962.
Received for publication 11 January 2011