Научная статья на тему 'Mean time to failure for periodic failure rate'

Mean time to failure for periodic failure rate Текст научной статьи по специальности «Математика»

CC BY
67
13
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук

Аннотация научной статьи по математике, автор научной работы — Christian Tanguy

The paper is concerned with the determination of the Mean Time To Failure (MTTF) in configurations where the failure rate is periodical. After solving two configurations exactly, we show that when the period of the failure rate oscillations is small with respect to the average failure rate, the MTTF is essentially given by the inverse of the average failure rate, give or take corrections that can be expressed analytically. This could be helpful in the description of systems the environment of which is subject to changes.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Mean time to failure for periodic failure rate»

MEAN TIME TO FAILURE FOR PERIODIC FAILURE RATE

Christian Tanguy

Orange Labs, CORE/TPN, Issy-les-Moulineaux, France e-mail: christian. tanguy@o range -fit group .com

ABSTRACT

The paper is concerned with the determination of the Mean Time To Failure (MTTF) in configurations where the failure rate is periodical. After solving two configurations exactly, we show that when the period of the failure rate oscillations is small with respect to the average failure rate, the MTTF is essentially given by the inverse of the average failure rate, give or take corrections that can be expressed analytically. This could be helpful in the description of systems the environment of which is subject to changes.

1 INTRODUCTION

The occurrence of failures in systems is often described by using well-known distributions (see for instance (Kuo and Zuo 2003, Pham 2006, Rausand and Hoyland 2004)) such as exponential, Weibull, etc. However, most of these distributions have associated failure rates which are constant or monotonous. This cannot realistically describe many real-life situations. To quite but a few examples, the probability of hurricanes is highest during the "right" seasons, computing systems exhibit different level activities during the day (there is also a weekly dependence because of week-ends). The efficiency of cooling units for electronic equipments in telecommunication networks depends on the ambient temperature; problems may arise in summer.

Quite naturally, the possibility of the periodicity of the failure rate has been raised. Castillo and Sieworek (1981) have considered the reliability of computing systems, and presented several data, clearly showing that hard disk failures seem to follow the workload. The influence of this workload can be taken into account quite satisfactorily by the addition of a (periodical) failure rate. A few fundamental, mathematical studies have also been devoted to the issue of periodic random environment (Dimitrov, Chukova, and Green 1997, Prakasa Rao 1997), the emphasis being laid on time distributions, nonstationary Poisson processes and other probability properties such as the "almost lack of memory". Semi-Markov processes have also been used to model failure rates; a beautiful analytic expression has been found for the reliability in the case of a Furry-Yule process (Grabski 2002). More practical consideration emerge again, as witnessed by recent work on highperformance computing systems such as grids (Kang and Grimshaw 2007, Schroeder and Gibson 2006). To quote Kang and Grimshaw (2007)

Accurate failure prediction in Grids is critical for reasoning about OoS guarantees such as

job completion time and availability

Another recent practical paper (Andrews 2005) considers a problem that could (somehow) ring a bell to all of us: what is the life expectancy of our mobile phones? In these electronic devices, the temperature of specific part of the circuits may substantially increase during operations such as finding the next antenna, working in conditions of huge traffic. It has been recognized for many decades that some processes ultimately responsible for hardware failures in electronic components

have a temperature dependence which obeys the Arrhenius law (Baker 1972), used in many acceleration life test procedures. While the universality of this law is to be considered very carefully, there is no doubt that even a small increase in temperature may lead to surges in the failure rate. Should we consider the worst-case (meaning: temperature) scenario, or the most-of-the-time situation, knowing that these two hypotheses lead to mean times to failure (MTTF) differing by orders of magnitude? A review of the potential problems linked to temperature can be found in (Parry, Rantala, and Lasance 2002).

For this reason, we have tried to answer the following question: is there some way to perform a quick and not so dirty evaluation of the MTTF of a System subject to periodic failure? What are the important parameters?

Our paper is organized as follows: in section 2, we recall the well-known general expressions for the reliability and the MTTF, and compute the latter in two exactly solvable cases: in the first one, the failure rate takes two possible (constant) values; in the second one, we add an sinusoidal contribution to an otherwise constant failure rate. We show that when the oscillation period T of the added failure process is small compared with the otherwise expected lifetime, what really matters is

merely the averaged failure rate A over one period T (see equation (3) below). We confirm in section 3 this assertion in the general case, and provide the corrections to this asymptotic result in equation (4); a visual interpretation of the result is also provided. We conclude by a brief discussion of possible extensions of this work.

2 TWO EXACTLY SOLVABLE CASES 2.1 Link between MTTF and reliability

The reliability may be written quite generally as

R{t)=e^MT)dT, (1)

and the MTTF is given by

MTTF = jt(-R\t))dt=^R(t)dt . (2)

0 0

Let us now turn to two cases where the MTTF may be exactly computed.

2.2 Bimodal failure rate

We assume that the failure rate X takes two values: X if 0 < t < a T. and if a T < t < T (see Figure 1). After considering the successive intervals [n T, (n+a) T] and [(n+a) T, (n+1) T], and summing the easy to integrate exponentials, we eventually get

MTTF= — + A

1 ( 1 1 ]

v + ~ )

This relatively cumbersome expression of A+, A_, and T is actually very simple when considered in the T —>0 limit, that is when the period of the oscillation is small compared with A+ and A . We obtain

MTTF^— + A

J___1_

a /1

1

(a A++ (1 - a) A_) aA++(\-a)A_

v J

which is nothing but the inverse of the average of the failure rate in the time period [0,7].

A

K

T

a

X-

1

Figure 1. Simple variation of the failure rate.

Following a suggestion of Prof. O. Hryniewicz, we have also considered the case in which the system still spends a time a Tin the "A+" state and (1 - a) T in the "A,." state, but when the occurrence of the "A." state appears randomly at times t\ + i '/'during the 7th period [/ '/', (/ +1) 7] (0 < t\ < Tfor all 7 s). The calculations are straightforward and give

MTTF=—+ I,

J___1_

Â7 X

l-e

-(l-a)Â_r

l-e

-(aA++(l~a)A_)r

where L

.-K t,

denotes the statistical average on the t\s and closely looks like the moment

generating function of their distribution. When t\ is always equal to a '/', we recover the result mentioned previously. (e~A+t') maY °f course be larger than e~A+aT, and the MTTF is correspondingly modified. However, the asymptotic T —> 0 limit remains the same.

2.3 Sinusoidal failure rate

We now assume that the failure rate is given by A{t.) = An +A1 cos at (see Figure 2), so that

R(t) = exp

A, .

- Af, t —L sin at

a

2 n

(T=— is the period of the failure rate oscillations). The gist of the MTTF calculation is to expand

CO

the factor exp

A •

—L Sin CO t

in the integral as a power series m\. Each contribution is then

CO

\ /

(somewhat tediously) assessed. After some work, it is possible to show that even powers of \ contribute to an hypergeometric function , I<\ defined by

"V ^Y{a)Y{n + P)Y{n + r) n\

where T(z) is the Euler gamma function. A similar conclusion is reached for the odd powers of \.

Figure 2. Sinusoidal variations of the failure rate: A] = An (red) and /lj =An / 3 (orange).

Finally, we obtain (/ is such that P = -1)

MTTF =— An

K

2 ^

2 CO

2 co 4 co '-

K hi

Al + co

2 ^

3 / An 3 / An A[

2 ^

1----- - + -

'2 2a> ' 2 2a> ' 4®2

n

The prefactor 1/An indicates that the MTTF will be linked to the inverse of the "average" failure rate. It is indeed the exact result when 1, =0, as expected. However, when 1, >0, there are corrections to the simple result 1 /An. If we assume that the two failure rates An and \ are small compared with respect to co, keeping the first two orders of the expansion, we find, expanding the hypergeometric functions , /<'2

MTTF ~ — An

1+-

A2

A20+4CO'

4, A

A20+co2

or

MTTF ~ — An

1 _AA+_ %

CO

4 co'

We have displayed in Figure 3 the value of the MTTF as a function of co, for two different values of (An, \), using A„ as a scaling parameter. We see that both curves are monotonous and

that the asymptotic limits are quickly reached after initial, steep increases.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

When the period of the oscillations is large, we would expect the MTTF to be the inverse of

the "initial" failure rate, i.e., —-—. This is indeed observed in Figure 3 when co —>0. Similar

An + Aj

curves could be drawn for higher moments of the failure time distribution. It should be noted, however, that the variation of the average of r is not necessarily monotonous anymore, as shown in Figure 4.

Figure 3. MTTF as a function of (0 = 271 I T for the configuration of Figure 2 ( An + Al = 2 ).

Figure 4. Same as Figure 2 for the average value of t2.

3 GENERAL CASE WHEN THE OSCILLATION PERIOD IS SMALL

It may be satisfying to obtain an analytical solution to a few configurations, but this, unfortunately, is not true in general. The question is now to establish whether the MTTF may be evaluated by averaging a few quantities, and if so, the result is not too inaccurate. Recall that in

many real situations, the period of the oscillations may be one day or one week — one year or more in the context of climatologie studies — and therefore much shorter than expected failure times.

3.1 Calculation

Let us now consider the general case when the oscillation period is T. We can define an average failure rate

-if J Jo

_ J j,

A=— Jo A(t)dt .

(3)

Going back to the expression of the MTTF, we see that

J, — t H — K —

A{r)dr = AT — +\T\L\A{r)dT = At + \T\L\{A{T)-A)dr

where LvJ is the integer part of x. Turning now to the MTTF expression (see (1) and (2)), we have

MTTF

) expi-AwT-J A{T)dT \dt = ^exp(-«Ar)jexpi-jA(r)dr

n=o "T V "T J n=o u V U

n=0 CT

\dt

l-exp(-Ir) ~ D

When the oscillation period is small (T —>0 , or the failure rate is assumed to be too small to matter during 7), the expressions of the numerator N and denominator I) may be expanded. Up to second order, we get

N = [ l-lA(r)drM[Mr)dr

while

JO Jo

V

2 v Jo

2 \ f dt = T

/

D = AT

\--AT + -(AT]~ +••• 2 6V '

From the expressions of TV and I), the leading order for the MTTF's expansion gives MTTF~=.

A

Using integrations by parts and A(t) = A(t)-A, we can easily obtain the corrections to the T—>0 limit. After simplification, they give the main result of this paper

MTTF~L A

i+^[ä(t)dt (J>X*T J* ~[Mt)(t.

T_ 2

dt + ---

(4)

~ rT ~

Depending on the actual form of A(t), the first-order correction J^ tA{t)dtmay cancel (this is

A A

actually the case in Example 2.3, where the corrections to 1 are 0, —, and —respectively)

4 co- co-

| j _ J

or be finite (in Example 2.2., it is----{A+-A_)T , in agreement with the expansion of the

exact result).

3.2 Visual interpretation

It is possible to understand visually whyMTTF~=. We can indeed plot the "effective" reliability

A

corresponding to the exponential distribution characterized by the average failure rate A, which is displayed on the left of Figure 5. As a reminder of the result of equation (2), we have shaded the area under the curve, which is nothing but the MTTF. Because of an added, periodic contribution, the true reliability R(t) oscillates around exp(- At) (see the right of Figure 5: the green-shaded areas correspond to a MTTF increase, while the red ones correspond to a MTTF decrease). If the

oscillation period is short with respect to HA, we expect the plus or minus contributions to compensate. It might nonetheless be very difficult to use such a graphical construction to provide upper or lower bounds for the MTTF in the general case.

Figure 5. Comparison between the MTTFs for two different failure rates: their difference is provided by the sum of all

green areas minus the sum of red areas.

4 WHAT ABOUT TEMPERATURE EFFECTS?

We mentioned in the Introduction that temperature is an important issue in the reliability of electronic components. Some data on the failure rates may be found in hardware catalogs, in operating condition (at a given temperature, mainly 20 or 25 °C). In some cases, estimates of the failure rate at higher temperatures are also given. It might therefore be more suitable to express the instantaneous failure rate as A(l(i )). Assuming that the Arrhenius law is valid for a given physical process we would have

A(T) °c e kT,

where Ea is the activation energy of the process, k the Boltzmann constant and T the temperature.

The calculations of the preceding sections would have to take this further cause of variation into account. We would expect the MTTF to be weighted by the times spent in the higher temperature regimes.

5 CONCLUSION AND OUTLOOK

We have provided simple analytical results for the MTTF with a periodical failure rate, which may prove helpful when evaluating the lifetime of various kinds of components operating in environments for which the workload may induce failures to occur in a periodic manner.

Generalizations of the present results would of course include the assessment of the variation of the MTTF, when the initial distribution is not exponential, but a more realistic one.

ACKNOWLEDGMENTS

The author would like to express his very warm appreciation of the Third Summer Safety and Reliability Seminars (SSARS 2009) held in Gdansk-Sopot, in which a first version of this work has been presented: Professors K. Kolowrocki, F. Grabski, R. Bris and H.-P. Berg made it a thoroughly enjoyable and fruitful experience. I would like to thank many participants for discussions and suggestions that have been taken into account in the present version: M. F. Milazzo, J. Soszynka, G. Albeanu, J. H. Cha, S. Guze, O. Hryniewicz, K. T. Kosmowski, T. Nowakowski, Y. Rahim, D. Vallis to mention only a few.

REFERENCES

Andrews, C. 2005 The next generation reliability standard for PAs. Wireless Design & Development, 1 November 2005.

Baker, E. 1972 Some effects of temperature on material properties and device reliability. IEEE Trans. on Parts, Hybrids, and Packaging PHP-8 (4), 4-14.

Castillo, X. & Siewiorek, D. P. 1981. Workload, performance, and reliability of digital computing systems. Proc. of FTCS11, 84-89.

Dimitrov, B., Chukova, S. & Green D., Jr. 1997 Probability distributions in periodic random environment and their applications. SIAM J. Appl. Math. 57 (2), 501-517, and references therein.

Grabski, F. 2002 Semi-Markov models of systems reliability and operations. Systems Research Institute, Polish Academy of Sciences, Warsaw.

Kang, W. & Grimshaw, A. 2007 Failure prediction in computational grids. Proc. Of the 40h Annual Simulation Symposium (ANSS07) 6, 93-116.

Kuo, W. & Zuo, M. J., 2003 Optimal Reliability Modeling. Hoboken: John Wiley & Sons, Inc.

Parry, John D., Rantala, J. & Lasance, C. J. M. 2002 Enhanced Electronic System Reliability—Challenges for Temperature Prediction IEEE Trans. on Components and Packaging Technologies 25 (4), 533-538.

Pham, H. 2006 Basic Statistical Concepts, in Handbook of Engineering Statistics, Pham, H. ed. Springer, New York.

Prakasa Rao, B. L. S. 1997 On distributions with periodic failure rate and related inference problems, in Advances in Statistical Decision Theory and Applications, Panchapakesan, S., Balakrishnan, N. & Gupta, S. S., eds, Birkhauser, Boston, chapter 22, and references therein

Rausand, M. & Heyland, A. 2004 System Reliability Theory, 2nd edition. Hoboken: John Wiley & Sons, Inc.

Schroeder, B. & Gibson, G. A. 2006 A large-scale study of failures in high-performance computing systems. Proc. of

the International Conference on Dependable Systems and Networks (DSN2006), Philadelphia, USA, June 25-28, 2006.

i Надоели баннеры? Вы всегда можете отключить рекламу.