BOOTSTRAPPING TIME SERIES WITH APPLICATION TO RISK
MANAGEMENT
G. Albeanu & H. Madsen
Technical University of Denmark, IMM, Lyngby, Denmark B. Burtschy
Ecole Nationale Supérieure des Télécommunications, INFRES, Paris, France Fl. Popentiu-Vladicescu University of Oradea, UNESCO Chair in Information Technologies, Oradea, Romania
Manuela Ghica
Spiru Haret University, Faculty of Mathematics and Informatics, Bucharest, Romania
ABSTRACT: The bootstrap method is an extensive computational approach, based on Monte Carlo simulation, useful for understanding random samples and time series. It is a powerful tool, especially when only a small data set is used to predict the behaviour of systems or processes. This paper presents the results of an investigation on using bootstrap resampling (different types: uniform, importance based, block structured etc.) for time series appearing during software life cycle (mainly the software testing phase, and debugging), economics, and environment (air pollution generated by cement plants) in order to help the activity of staff working on risk management for software projects, risk management in finance, and those working on environment risk management.
Z. INTRODUCTION
Risk in the sense of the possibility of losses is an important chapter for many organizations, not only financial markets, but also for industry. One important principle belonging to the general scientific knowledge in the area of risk claims that "it is impossible to manage the risk without quantitative measurement and analysis of risk", according to (Solojentsev, 2005).
There are various aspects concerning risk management, depending on the field under consideration. As a consequence, different methodologies (models) were developed (Aven 2003, Solojentsev 2005, Kontio 1997, Higuera & Haimes 1996, Entrop et al.2007, Todinov 2006 etc.)
This paper addresses the usage of the bootstrap approach for analysing time series appearing when modelling some measurements during the evolution of processes. Bootstrap proved to be a valuable approach for a large class of applications according to (Efron & Tibshirani 1993, Albeanu et al. 2007), and the references mentioned related to time series.
The remainder of the paper is organized as follows. A short introduction to general bootstrap approach is given in the second section. Algorithmic aspects concerning bootstrapping time series and challenging problems related to the model selections are presented in the third section. The fourth section discusses on three case studies covering different fields of economical activities: cement plants pollution, inflation rate and software risk management.
The concluding section establishes the most important challenges to deal with when using bootstrapping time series.
2. PROBLEM DEFINITION AND BACKGROUND INFORMATION 2.1. Bootstrap methodology
Bootstrap is a simple but powerful Monte-Carlo method to assess statistical accuracy or to estimate a distribution from sample's statistics. The methods are suitable for any level of odellin being useful for fully parametric, semi-parametric, and completely nonparametric analysis. These
approaches are not only in use by statisticians, but also are applied anywhere statistics can be used: life sciences, business, social sciences, econometrics, reliability etc. For the aim of this paper we outline the basic bootstrap principle (see Efron 1979, Efron & Tibshirani 1986), and the application of bootstrap sampling for time series in order to help the activity of staff working on risk management.
Let Xbe a random variable and F the cumulative distribution function of the variable X. The Bootstrap method, introduced by Efron (1979), is useful, at least, for the estimation of: a) the distribution function of a random variable R(X, F); b) a functional relation V(F), or c) the accuracy of a statistics s obtained from a sample (X1} X2, ..., Xn) of size n from X(the accuracy describing the variability of s when independent estimations s(1), s(2), ..., of the statistics s, are obtained by resampling).
The bootstrap technique uses the sample (X1} X2, ..., Xn) to obtain the sampling cumulative distribution function Fn(x) in order to replace the true cumulative distribution function F: Fn(x) = (1/n) cardinal (xj □ x; 1 □ i □ n}. To repeatedly simulate bootstrap samples X* := (X1*, X2*, ..., Xn*) from Fn , random number generators should be used according to the Monte-Carlo approaches. Then, for each bootstrap sample, it is recalculated: a) the distribution function of the random variable R(X*, Fn) ; b) the functional relation V(Fn) or V(Fn*) and c) the statistics s*(). The accuracy of the statistics s can be derived under an appropriate statistical inference study on the sequence s*().
The bootstrap resampling can be realised in various ways. Uniform resampling and the importance resampling are the mostly used. As a common example of the usage of the uniform resampling, we refer to the bootstrap algorithm for estimating standard errors. However, when some observations are more important than others, the importance resampling can provide close to real conclusions. If resampling is based on importance resampling weights, then the bootstrap estimates are re-weighted as if uniform resampling is done.
2.2. Bootstrapping time series
Time series play an important role in modelling, analysing and forecasting the behaviour of systems (Cochrane 2005, Hamilton 1994, Burtschy 1997, Madsen 2007). There are numerous aspects concerning time series. In the following will be described only those models and algorithms required by our case studies.
Let {xt; t = 1, 2, ..., T} be a time series and L be the lag operator: Lxt = xt-1; t > 1. The ARMA model having order (p, q) is given by:
p(L) xt = 0(L) ut, (1)
where p(L)xt = (a0L0 + a1L1 + ... + apLp)xt = a0xt + a1xt-1 + ... + apxt-p, p > 0, ap ^ 0, 0(L)ut
= (b0L0 + b1L1 + ... + bqLq)ut = b0ut + b1ut-1 + ... + bput-q, q > 0, bq ^ 0, and {ut} is an uncorrelated process with zero mean and finite variance.
The ARMA bootstrap algorithm proceeds as follows:
1 Determine the order of the ARMA(p,q) process.
2 Estimate the parameters: p(L), 0(L).
3 Resample from Ut = 0)(L)p>(L)xt (after re-centring the Ut around zero).
4 Choose a large positive integer t, set x* = 0 for t < -t, and generate iid draws for u*, with t = -t, ..., T.
5 Generate pseudo-data: x* = p)(L)0(L)u* for t = -t, ., T and retain the last T values
of x*.
6 Calculate the bootstrap parameter estimates: (p*(L), 6'*(L).
7 Repeat steps 3-6 many times and built up the empirical distribution to obtain the functional relation or analyse the required statistics.
The ARMA parameters can be estimated using different methods including maximum likelihood (ML) algorithms, as presented by Boaz (1994) and others.
As Berkowitz & Kilian (2000) mentioned, the bootstrap can perform well when the parametric model provides a good approximation to the true model. In practice, for a sample of size T, the model and the order (p, q) are unknown. Different scenarios have to be considered and a selection procedure will be applied. The most used procedures are, according to (Alonso et al. 2004): the final prediction error, the Akaike information criterion, the Bayesian information criterion and the Akaike's Information Corrected Criterion (AICC). For the investigation presented in this paper, the AICC method was used:
AICC = -2ln A(a, b,a2) + 2T(p + q +1) (2)
T - (p + q) - 2
where
a is the estimated AR parameters; bis the estimated MA parameters; cr2 is the variance of the white noise, and A(-,v) is the likelihood of the data under the Gaussian ARMA model.
The block bootstrap is the best-known method for implementing the bootstrap time series, as (Hardle et al. 2003) mentioned. The method "consists of dividing the data into blocks of observations and sampling the blocks randomly with replacement." For the time series considered above, with non-overlapping blocks of length l, the first block is composed by observations (xj; j =
1, 2, ..., l}, the second block contains observations (xl+j; j = 1, 2, ..., l}, and so forth. When using the overlapping (moving) blocks of length l, the first block is composed by observations (xj; j = 1,
2, ..., l}, the second block consists of observations (xj+1; j = 1, ..., l}, and so forth. The method of resampling is based on the replacement approach. The block bootstrap with random block length is a stationary bootstrap because a stationary data series is obtained.
Seasonal time series are a special class of time series, appearing in environmental risk management or the multi-version software testing. These time series are typically modelled by equation
xt = /it + ut, and / = /it_d, t > d, (3)
where d is the period (day, week, month etc.) of some deterministic (but unknown) function /, and {ut, t>0} is a stationary process with mean zero. In general, if / is not a constant, the seasonal model is not stationary, that is a "seasonal block bootstrap" method (denoted, in the following, by SBB) is necessary. In the following let us remember the Politis (2001) approach that proved a good behaviour for time series obtained when monitoring the pollution of cement plants.
The SBB algorithm considers that there exists an integer n such as T = nd, b ( < n) a given positive integer, k = nb, and works along the following steps:
1 Let io, ii, ..., ik-i be drawn independent identically distributed uniform on the set (1, 2, ..., n-b+1};
2 Build the bootstrap pseudo-series X1*, X2*, ..., Xl*, where l = kbd, and
Xmbd + j := Ximd+j-1 (4)
form = 0, 1, ..., k-1, and j = 1, 2, ..., bd.
The estimation of seasonal component / i, i = 1, 2, . , d, and the overall mean / = dare realised by means of averages of the "sampled" series:
n-1 d
Á = n-1Z X¡+d , and t = d-1Z U • (5)
j=0 i=1
The usefulness of the SBB method consists of interval estimates obtaining for ji and / by means of successfully approximating the distribution of and jlby their bootstrap versions
computed based on the bootstrap pseudo-series X¡*, X2*, ..., X*, by
kb-1 d / = (kb)-1Z Xl]d , and /7* = d-1Z # • (6)
j=0 i=1
This model is used in the place of "residual" block bootstrap obtainable by resampling of the residuals Yt := Xt - jut . The pseudo-series i^*,YV",Y*is used to generate the bootstrap series
X*:= jut + Yl, t = ¡, 2, ..., l.
It was proved (see Politis, 2001) that overlapping plays an important role in bootstrap efficiency: "the maximum overlap leads to maximum efficiency". A data based adapted procedure for choosing the block size l, in finite samples, based on the (Berkowitz & Kilian, 2000) method, in order to maximize the average accuracy.
Given the stationary series (xt; t = ¡, 2, ..., T}, the bootstrap approach can be used to select the block size suitable for a maximum accuracy in estimating some statistic of interest, according to the following steps:
1 Approximate the given time series by a parametric ARMA(p, q), or AR(p) model, with order selected by AICC approach.
2 Generate B (> 512) Monte Carlo trials of length T from the model fitted above.
3 For each Monte Carlo trial generate overlapping blocks bootstrap data {Xt*} for different block sizes k.
4 Compute the statistics of interest {Xt*(k)}.
5 Select the block size k* which, on average, produces the most accurate test statistics, point estimate, or confidence interval across Monte Carlo trials.
6 Use the block size k* to apply the Block bootstrap or SBB method for the original data {x; t = ¡, 2, ..., T}.
There are available other methods for bootstrapping time series: (Berkowitz & Kilian 2000, Hardel et al. 2003) and Politis (2003) to mention only some references. The above selected approaches proved to be suitable (computing effort, accuracy) for the investigation on using time series for risk management in finance, environment and software reliability.
2.3. Risk management
There are many definitions of the term "risk", all of them including two important characteristics, namely uncertainty (an event may or not may occur) and loss (an event has undesired effects): risk being the possibility of suffering losses caused by an event that will probably occurs.
Generally speaking, risk management is a systematic process for identifying, analysing and controlling risks.
Multi-criteria decision aided, soft computing, and statistical analysis are some important approaches when speak about "Decide with minimum risk". Recently, time-series risk models were proposed, mainly for insurance business (Wan et al. 2005, Zhang et al. 2007). Also, other researchers proved that time-series analysis and forecasting play an important role in risk management. These progresses can be accompanied by bootstrap methodology in order to apply a risk preventive approach.
3. CASE STUDIES 3.1. Bootstrapping time series applied for software risk management
According to Kontio (1997), "software development is often plagued with unanticipated problems which cause projects to miss deadlines, exceed budgets, or deliver less than satisfactory products". Even if these problems cannot be eliminated completely some of them can be well controlled well by taking appropriate preventive action.
Practically, the software development organizations are exposed to a large plethora of risk factors. Some of them are: human resources quality, unrealistic schedule and budget, the mismatching of requirements and developed item, continuous alteration of requirements, outsourcing generated problems, overestimation of infrastructure capability etc. Software organizations may be able to avoid a large number of such problems if they use systematic risk management procedures and techniques early in projects. Any methodology has to monitor such resources and multivariate time series are obtained using a measurement methodology as provided by Fenton & Pfleeger (1996).
One approach to analysis time series for software reliability is based on soft computing techniques as shown by Albeanu & Popentiu-Vladicescu (2005). However, during this investigation we found that classical time series analysis methods when combined with bootstrap resampling provide valuable information even if the size of the sample is not large, when used for Software Risk Management (SRM).
When speak about software metrics for risk management, some metrics can be considered as critical, called SRM-critically, and will be analysed with time series methodologies. Other metrics will be analysed by graph methods, like in Risk. It methodology (Kontio & Basili 1996, Kontio 1997).
The SRM-critically metrics are: a) the difference between actual expenses and the initially declared project cost; b) the difference between actual expenses and the predicted values obtained using the COCOMO approach; c) the ratio between real project progress and the planned project progress (explained by the Gantt chart, see Figure 1); Faults received per week (critically per month), and the successful debug actions per week.
Figure 1. Waterfall model and the time series of critical bugs per month
Other metrics like internal complexity, code readability, or the portability are not as critical metrics, if these are not stated by requirements agreement.
For software project developed based on waterfall model having modular structures, but every module, except the first one, is dependent at least on previous model we experience a seasonal time series of critical bugs.
Applying the SBB approach we obtain the trend curve shown in Figure 2. This analyse was done before any moment of time indicated by Milestone (1 to 4, for the project under discussion).
Seasonal Block Bootstrap time series
Ряд1 -Ряд2 -РядЗ -Ряд4 2 линейный фильтр (Ряд4)
Figure 2. Seasonal Block Bootstrap (the last three time series, and the trend curve obtained by
ARMA model for the last generated series).
Using this type of analysis important information was obtained not only for the staff involved in preventive risk management, but also for project manager, having opportunity to improve the structure of working teams (three main partners) and for rescheduling the financial resources before any milestone point.
3.2. Bootstrapping time series for inflation forecasting
Time series analysis started to be widely used in economics and finance since the discovering of the fact that "univariate ARIMA models often have far better forecasting and explanatory power than extremely complicated multivariate macroeconomic models" as Golub & Tilman (2000) mentioned. Also, these models proved a good behaviour in software reliability prediction (Popentiu-Vladicescu 2001).
The Bootstrap proved to be an important approach for analysing interest rates in financial risk management, as shown by Dette & Weissbach (2006).
In our study, the Consumer Price Index (CPI), which measures inflation, was studied by bootstrapping corresponding time series in order to forecast the rate of inflation. The standard bootstrap approach was used for the series rate of inflation (computed based on CPI) from 1992 to 2007. The initial time series is shown in Figure 3. The last five bootstrap time series from a set of 200 Monte Carlo trials and the trend curve modelled according to ARMA(0,2) is shown in Figure 4.
Inflation Rate
Figure 3. Inflation rate 1992-2007 (according to the Romanian Institute of Statistics:
https://statistici.insse.ro/ipc/?lang=en)
Figure 4. The trend curves estimated by the initial time series and the bootstrap
time series of inflation rate
Figure 5. Pollution monitoring for increasing the health state in a cement plant region
When considered the CPI databases containing records at month level, some seasonal behaviour was identified. At global level, a moving average model was more suited.
The analyses checked the models parameters using the AICC formula given by (2). Other time series were used to investigate the bootstrap behaviour in order to provide confidence bands for dynamic financial analysis as in (Albeanu et al. 2007).
Other considerations on the accuracy of time series, interest rate and Survey forecast of inflation can be found on (Hafer & Hein 1984).
3.3. Bootstrapping pollution time series
Analysis of air pollution is important not only for meteorological point of view, but mainly for health (Gouveia & Fletcher 2000). This is the main reason that industrial pollution has to be monitored to keep the level of pollution in some limits according to the international regulations.
Environmental regulations for cement plants are becoming tougher and tougher, and cement manufacturers have to constantly review their anti-pollution measures. As presented in (Madsen et al. 2004) the best way to fight against pollution is to use computer-aided decision software being able to capture not only measurements for analysis, but also intelligent behaviour to provide information about the optimal configuration of the cement plant modules in order to keep some level of production under pollution regulations' constraints, which is similar to the Columbus approach of Solojentsev (2005).
For the time series analysed, using classical methods (Figure 5), we use, now, the bootstrap methodology to obtain information about accuracy estimation (Figure 6).
Figure 6. Bootstrapping time-series of air pollution by dust at 2400 m
We found that using AICC method is better than use the final prediction error approach as used in the initial software implementation.
4. CONCLUSIONS
Starting from idea that time-series represents an important approach in the prediction of the behaviour of some processes considered under risk management, this paper shows that bootstrap methodology is useful enough, but the researcher/manager has to choose the appropriate type of resampling.
The paper emphasizes on the utility of bootstrap resampling for different fields of practice considering three particular applications: software risk management, financial risk management and environment risk management.
ACKNOWLEDGEMENTS
The present investigation has been developed mainly under the UNITWIN program of UNESCO. The authors acknowledge their institutions for support.
REFERENCES
[2] Albeanu, G., Madsen, H., Ghica, M, Thyregod, P. & Popentiu-Vladicescu, Fl. 2007. On using bootstrap methods for understanding empirical loss data and dynamic financial analysis. In Peter Goos (ed.), ENBIS 7 Conference, Dortmund (September 24-26, 2007), CDROM.
[3] Albeanu, G. & Popentiu-Vladicescu, F. 2005. On Using the Fuzzy Nearest Neighbour Method for Time Series Forecasting in Software Reliability, Proceedings of SIG 2005, pp. 206-209, SIER Publishing House.
[4] Alonso, A.M., Peña D. & Romo J. 2004. Introducing model uncertainty in time series bootstrap. Statistica Sinica 14: 155-174.
[5] Aven, T. 2003. Foundations of Risk Analysis - A Knowledge and Decision Oriented Perspective, New York: Wiley.
[6] Berkowitz, J. & Kilian, L. 2000. Recent developments in bootstrapping time series, Econometric Reviews 19(1): 1-48.
[7] Boaz, P. 1994. Digital processing of random signals: theory and methods, Englewood Cliffs: Prentice Hall, Inc.
[8] Burtschy, B., Boros, D.N., Popentiu, F., Albeanu, G. & Nicola, V. 1997. Improving Software Reliability Forecasting. Microelectronics and Reliability, 37(6): 901-907.
[9] Bühlmann, P. 2002. Bootstraps for time series. Statistical Science 17:52-72.
[10] Cochrane J.H. 2005. Time series for macroeconomics and finance, University of Chicago: http://faculty.chicagogsb.edu/john.cochrane/research/Papers/time series book.pdf (available: February 2008).
[11] Dette, H. & Weissbach, R. 2006. A bootstrap test for the comparison of nonlinear time series - with application to interest rate odelling, Technical Report 30, University of Dortmund: http://www.ruhr-uni-
bochum.de/imperia/md/content/mathematik3/publications/nonpara0206rafael6.pdf (available: February 2008)
[12] Efron, B. 1979. Bootstrap methods: another look at the jackknife, Annals of statistics 9: 1218-1228.
[13] Efron, B. & Tibshirani R. 1986. Bootstrap methods for standard errors. Confidence intervals, and other measures of statistical accuracy, Statistical science 1: 54-77.
[14] Efron, B. & Tibshirani, R. 1993. An introduction to the bootstrap, New York: Chapman and Hall.
[15] Entrop, O., Memmel, Ch., Wilkens, M. & Zeisler, A. 2007. Analyzing the interest rate risk of banks using time series of accounting-based data: evidence from Germany, SSRN: http://ssrn.com/abstract=982070 (available: February 2008.)
[16] Fenton, N.E. & Pfleeger S.L. 1996. Software metrics: a rigorous and practical approach, London: PWS Publishing Company.
[17] Golub, B.W. & Tilman, L.M. (2000), Risk Management: Approaches for Fixed Income Markets, New York: Wiley.
[18] Gouveia, N, Fletcher, T. (2000), Time series analysis of air pollution and mortality: effects by cause, age and socio-economic status, J Epidemiol Community Health 54:750-755.
[19] .Hafer, R.W. & Hein S.E. 1984. On the Accuracy of Time Series, Interest Rate and Survey Forecasts of Inflation. Working Paper 1984-022A, http://research.stlouisfed.org/wp/ 1984/1984-022.pdf (available: February 2008)
[20] Hamilton, J.D. 1994. Time Series analysis, Princeton: Princeton University Press.
[21] Hardle, W., Horowitz, J.L. & Kreiss, J.-P. 2003. Bootstrap methods for time series. International Statist. Review, 71: 435-459.
[22] Higuera, R.P. & Haimes Y.Y. 1996. Software risk management, Technical Report CMU/ SEI-96-TR-012: http://www.sei. Cmu.edu/pub/docments/96.reports/pdf/tr012.96.pdf (available: February 2008.)
[23] Kontio, J. 1997. The Riskit Method for Software Risk Management, version 1.00, UMIACS-TR-97-38: http://www. Sbl.tkk.fi/jkontio/riskittr.pdf (available: February 2008).
[24] Kontio, J. & Basili V.R. 1996, Risk Knowledge Capture in the Riskit Method. 1996. Proceedings of the 21st Software Engineering Workshop. NASA. Greenbelt, Maryland.
[25] Madsen, H. 2007. Time series analysis, Chapman & Hall/CRC.
[26] Madsen, H., Thyregod, P., Popentiu-Vladicescu, F., Albeanu, G. & Serbanescu, L. 2004. A Decision Support System for Pollution Control in Cement Plants. In C. Spitzer, U. Schmocker and V. N. Dang (eds.), Proceedings of PSAM 07 - ESREL'04, June 14-18, 2004, Probabilistic Safety Assessment and Management 3:1784-1789, Berlin: Springer Verlag.
[27] Politis, D. N. 2001. Resampling time series with seasonal components. In Wegman E.J., Braverman A., Goodman A. & Smyth P. (eds.), Frontiers in Data Mining and Bioinformatics; Proc. 33rd Symp. Interface, California, June 13-17, pp. 619-621, Fairfax Station: Interface foundation of North America.
[28] Politis, D. N. 2003. The impact of bootstrap methods on time series analysis. Statistical Science 18(2): 219-230.
[29] Popentiu-Vladicescu, Fl., Burtschy, B. & Albeanu, G. 2001. Time series methods for odelling software quality. In E. Zio, M. Demichela, N. Piccinin (eds.), Proceedings of the
European Conference on Safety and Reliability, pp. 9 - 15.
[30] Solojentsev, E.D. 2005. Scenario logic and probabilistic management of risk in business and engineering, Boston: Springer.
[31] Todinov, M.T. 2006. An aggregated risk measure based on the cumulative distribution of the potential loss. In Guedes Soares & Zio E. (eds.), Safety and Reliability for Managing Risk, Proceedings of the ESREL 2006: 1233-1240, London: Taylor & Francis Group.
[32] Wan, L.M., Yuen, K.C &Li, W.K. 2005. Ultimate ruin probability for a time-series risk model with dependent classes of insurance business, Journal of Actuarial Practice 12:193-214.
[33] Zhang, Z., Yuen, K.C., Li, W.K. 2007. A time-series risk model with constant interest for dependent classes of business, Insurance: Mathematics and Economics 41(1): 32-40.