DISTRIBUTED COMPUTING ENVIRONMENT FOR RELIABILITY-
ORIENTED DESIGN1
O.V.Abramov, Y.V.Katueva and D.A.Nazarov
Institute for Automation and Control Processes, Far Eastern Branch of Russian Academy of Sciences, Vladivostok, Russia
e-mail: abramov@iacp.dvo.ru
ABSTRACT
A theoretical approach and applied techniques for designing analogous electronic devices and systems with due account of random variations in system parameters and reliability specifications are considered. The paper discusses the problem of choosing nominal values of parameters of electronic devices and systems for which the system survival probability or the performance assurance probability for the predetermined time period is maximized. Several algorithms for region of acceptability location, modelling and discrete optimization using parallel and distributed processing are discussed. For seeking a numerical solution of the parametric design problem a distributed computer-aided reliability-oriented design system is proposed.
1 INTRODUCTION
One of basic problems of Computer-Aided Design (CAD) systems design and usage is high computation cost of simulation, multivariate analysis and optimization. Solutions of these tasks constitute the basis of system design.
System design with account of stochastic regularity of parameter deviations and reliability requirements is one of the most computational-intensive tasks. In this task, the simulation of stochastic processes of parameter deviations, statistical simulation and optimization are added to necessity of dynamic and often nonlinear systems simulation. The optimization, in addition, is performed using stochastic criteria.
Despite the fact of continuous development of CAD tools for electronic circuit design, the examples of their successful use and particularly when the optimal design with the account of reliability criteria is used are virtually non-existent. However, in recent years a radical way to improve the efficiency of solving problems of high computational cost is successfully developed. It is based on the technology of parallel and distributed computing. The creation of CAD systems using the technology of parallel computing is very interesting and promising.
This work is an attempt to outline the tasks which arise during development of parallel (distributed) CAD systems for electronic circuits and the ways to solve them.
As a subject area the optimal parametric synthesis of analog electronic circuits with respect to random processes of parameters variations and the requirements of reliability is considered.
2 PARAMETRIC SYNTHESIS PROBLEM
1 This work was funded by the Grant 09-I-n2-03 (Basic Research Program of Presidium RAS № 2).
Suppose that we have a system which depends on a set of n parameters x=(x1,..., xn). We will say that system is acceptable if Y(x) satisfy the conditions (1):
a < Y < b, (1)
where Y, a and b are m-vectors of system responses (output parameters) and their specifications, e.g. Y1(x) is average power, Y2(x) - delay, Y3(x) - gain.
The inequalities (1) define a region Dx in the space of input parameters
Dx = {x e Rn | a < Y < b}. (2)
Dx is called tolerance margin domain (region of acceptability) for the system. It is a region inside the input parameters space.
The engineering system parameters are subject to random variations (aging, wear, temperature variations) and the variations may be considered as stochastic processes:
X(t) = {Xi(t),..., Xn (t)}.
In general the parametric optimization (optimal parametric synthesis) problem can be stated as follows (Abramov 1992).
Let the characteristics of random processes X(t) of system parameters variations, a region of admissible deviation and a service time T are given, find such a deterministic vector of parameter ratings (nominals) xr=(x1r ,...,xnr) that the reliability
P, T) P j[X,(t)-x,r,...,Xn(t)-xjeDj
P'(xr'T) = [vte[0,T] j™. (3)
Any optimization technique requires, first, a method of objective function calculation and, secondly, an extremum searching method which allows to find a solution with a minimal cost.
3 OBJECTIVE FUNCTION ESTIMATION
The practical algorithm of the stochastic criterion calculation is based on the conventional Monte Carlo method and on the method of "critical sections" (Abramov 1992, Abramov 2006).
At the beginning, the random vector of parameters is generated (this vector means random manufacturing device realization), and then the internal parameters degradation is simulated using degradation model. For example parameters variations can be approximated as follows
m
X (t) = £ xkuk (t),
k=0
where xk is a random variable; {uk(t)}, k=0,...,m are continuous deterministic functions of time.
The Monte Carlo method approximates Pr(xr, T) by the ratio of number of acceptable realizations (falling into region Dx)-Na to the total number of trials - N.
p= V
Pr /N •
Unfortunately often the region Dx is unknown. It is given only implicitly through system's equations and the systems response functions. If we do not know the region Dx, the Monte Carlo evaluation of probability Pr(xr, T) at particular nominal value xr requires N system analyses for each trial set of parameter xr. Typically, hundreds of trials are required to obtain a reasonable estimate for Pr(xr,T).
Optimization requires the evaluation of the probability Pr(xr, T) for many different values of the nominal values of parameters xr. Since objective function calculation is based on the numerical technique we can only use the non-gradient-based optimization methods. These optimization methods require top computing powers. Particularly effective way to decrease total design time on the phase of simulation and statistical optimization is to use modern supercomputing technologies and distributed parallel processing techniques. The easiest implementation of this idea would be the
use of distributed processing technologies. In this case computational tasks can be distributed over the set of networked workstations. Realization means reflection of all computing scheme to parallel architecture of the computer, taking into account topology of interprocessor communications and providing correctness of interaction of set of process in parallel carried out separately from each other (Foster 1995).
Using of parallel calculations within the Monte Carlo method is the easiest way of reduction of computational cost input of process of parametrical synthesis as the idea of parallelism -recurrences of some typical process with the various data - is incorporated in the structure of a method.
It is intuitively clear that use of s separate processors, by distribution of independent tests between them, will reduce of computational cost input of statistical modelling in s times as expenses for final summation and averaging of results are practically insignificant. The final rating can be received under the formula:
P = ±n>g / N , i = 1
where nig - the number of "good" realizations for each of processors, N - the required number of tests.
4 SEQUENTIAL ALGORITHM OF OBJECTIVE FUNCTION CALCULATION.
The yield estimation, based on a Monte Carlo and "critical sections" method, is made as follows. Algorithm 1.
Let an initial vector of nominal values of parameters is x(1nom.
1. Proceeding from the defined distribution laws of parameters xj..., xn, we generate realization of a random vector of parameters x(k).
2. For the realization of values of parameters we calculate output parameters:
yj = F(x(k= 1,...,m.
This stage is the most cumbersome since calculation of output parameters is quite often associated with the solution of systems of differential (and not always linear) equations.
3. Conditions of serviceability are checked
y e Dy,
whereDy={y | a<y(x)<b} is the known area of allowable values of output parameters y.
Satisfaction of step 3 allows to refer the given realization x(k) to the number of "good" (providing an efficient status of system) or "bad".
The first internal cycle is concluded with this operation and there is a return to item 1. We generate the next realization x(k+1 and pass steps 2 and 3 again.
The total number of iterations N is determined by the necessary accuracy of probability estimation:
P = ng / N,
where ng - the number of "good" realizations from the total number N of tests.
For calculation of probability of non-failure operation during the certain time P(t) the mentioned above procedure 1-3 is carried out several times, determined by the number of t sections, and looks as follows.
Algorithm 2. Let the following data is known:
- The distribution laws of parameters xj..., xn.
- The model of change of parameters in time, specifying number of time sections l.
1. k=1, we set an initial vector of nominal values of parameters x(11nom.
2. Proceeding from the set distribution laws of parameters x1..., xn, we generate realization of a random vector of parameters x0(k) at the moment of time t=0.
3. For the set realization of values of parameters calculate output parameters
y}. = F(x(k)),j = 1,...,m .
4. Conditions of serviceability are checked
y e Dy,
where Dy is the known area of allowable values of output parameters y.
5. In the case of satisfaction of conditions (step 4) on the given time section proceeding from models of change of parameters we form realization of a random vector of parameters x0(k) for the following time section x(), i=1,...,l. We carry out steps 3 and 4 of the given algorithms. Satisfaction of conditions of acceptability on all time sections allows to relate the given realization to the number of "good", we increase the counter ng = ng + 1.
If on the next time section the conditions (4) are not carried out, all realization concerns to the number of "bad".
6. If k <N, k=k+1, we pass on to step 2 - generation of the following realization of a random vector of parameters.
7. We receive the final rating
P = ng / N.
5 DISTRIBUTED PARALLEL MONTE-CARLO ALGORITHM.
The main processor (master):
1. Makes an exchange of seeds with the subordinated processes (initializes random numbers generator).
2. Appoints the amount of Monte Carlo calculations ni to each processor. ni is the volume of sample for the processor number i. Thus
¿n = N .
i=1
3. Carries out statistical tests using algorithm 2. As the result a number of "good" tests rig is received.
4. Receives from the subordinated processors results of Monte Carlo calculations nig, i =
1,...,s.
5. Forms a final rating
P = lLn1g /N.
i=1
The subordinated processors (slaves):
1. Receive from the main processor a seed for random numbers generator.
2. Receive the amount of statistical tests which are necessary to carry out by each of them.
3. Carry out statistical tests using algorithm 2.
4. Send to the main process a number of "good" tests n'g.
At the distributed parallel Monte Carlo method both message passing time and sleep time are reduced to a minimum.
6 DISCRETE OPTIMIZATION DISTRIBUTED ALGORITHMS.
Evaluation of extr Pr(xr,T) requires a global optimization. The simplest method of global optimization is scanning (full enumeration) method. However, such method is considered computationally inefficient. The effective way to decrease optimization time is data decomposition.
The region of extremum seeking is divided into non-overlapping subregions. These subregions are distributed between separate computation processes which perform extremum seeking. After calculations, the results are passed to main process which composes final result.
The nominal values of the schematic components xr commonly used for engineering systems should lie inside the predefined set of values as it is required by various standards and technical recommendations, it is sometimes more preferable to search the optimal vector inside the discrete set of values that conforms to the standards and lies in the acceptable region Dx.
The information on a variation of values of internal parameters can be presented as limits of their values, i.e.
x < x < x i= 1,..., n .
imin — , — ,max r
The area inside the space of internal parameters assigned by these relations represents n-dimensional orthogonal parallelepiped called box of tolerances (tolerance region) Bd:
Bd = {x g Rn\ximm < xt < xt maxi= 1,..., n}
Using the algorithm described in (Abramov, Katueva, Nazarov 2006) the circumscribed box Bo cBd is determined with the following equations:
Bo = {x g Rn \ a0 < xt < b0 Vi = }
where
a0 = min x,.,b0 = max x,.
This algorithm is based on Monte-Carlo method and can be performed in parallel mode with linear speedup.
Circumscribed box makes it possible to narrow the region of extremum searching (Abramov, Katueva, Nazarov 2006). Circumscribed box constraints do not exceed tolerance region's ones.
Figure 1 schematically illustrates tolerance region Bd , circumscribed parallelepiped Bo and acceptable region Dx in the case of 2-dimensional space of internal parameters.
Tolerance region l<
Circtuttvcribefl twx B,
on Dr
Figure 1. The approach to tolerance region discretization
Standard nominal values of input parameters form a grid inside the circumscribed box. Do is a discrete set of grid nodes.
Let the vector of internal parameters values xrGDo is known. Therefore at the each point of discrete set
D? = (x;"|xr g Dx n Do}
we need to find the Pr(xr'n) estimation (3). The optimum of nominal vector xr we are looking for can be found as a solution of the following task
x 0pt = max € (x) (4)
x r
In the simplest case the solution can be found by complete verification of each element of the set Dr'n with the probability estimation for each of them. The construction of set Dr'n can be implemented as a preliminary procedure that puts the element values to the database.
The optimum search process can be performed in parallel mode.
The set Dr'n is distributed between separate processes. Each of the processes searches solution of task (4) on the subset given (local optimization). Then each of the processes passes the result of solution to main process. Main (Master) process composes final result (global optimization). The average speedup for distributed discrete optimization is close to linear.
7 COMPUTER-AIDED RELIABILITY-ORIENTED DISTRIBUTED DESIGN SYSTEM
All algorithms described above were included in the computer-aided reliability-directed distributed deign system (CARD). The CARD system was developed for parametric synthesis of analogous electronic devices with respect to reliability requirements.
The CARD system includes:
1. the simulation module (it facilitates the use of a variety of simulation programs for electronic circuits design);
2. the module for multivariate (deterministic and statistical) analysis;
3. the module for objective function (reliability and/or manufacturing yield) calculation;
4. the optimization module.
The system is organized from group of computers connected to a network. Such system allows using all advantages of client-server technology. It is necessary, however, to notice that tasks of the server and client stations in such system differ from usual client-server architecture. Let's consider it in detail.
The first task is to connect clients to the server. Thus we do not only increase computing resources but also obtain the amount of prospective tasks for the analysis and decomposition of electronic circuits. In this case connections scheme at LAN looks as shown in Figure 2.
Figure 2. Connection to the server Any circuit represents a set of elements and a set of connections between these elements (node points). The decomposition of a circuit can be implemented by nodes or by elements. In both
cases the circuit is easier for representing as undirected graph where nodes represent circuit elements and edges represent connections between the elements. The other way of circuit decomposition is splitting an element. Necessity of implementation of transient element algorithms considerably complicates the task. In the first case need only to transfer volt-ampere characteristic of one circuit part to another place of circuit and obtain this characteristic. Then these data is analysed on the both parts.
But taking into account digitization aspect of data transmitted there is a small computation inaccuracy and insignificant delays of signal stabilization and, as probably, unexpected signal attenuation in cyclic circuits. It is expedient to examine the extended model of network interactions.
LAN represents set of the clients connected by switching device. The connections for data transfer can be established between any computers of the network. The network is represented by the complete graph. The server's duty of switching data blocks is not a purpose for great amount of clients. In this case, to increase the rate of data transmission at the network it is necessary to establish connections between the clients; it reduces volume of the data transmitted twice. Thus, the network graph construction is required and then the rules of connection to be organized:
• installation of uniform connection between two separate computers if necessary;
• classification clients onto groups "connected" and "expecting" (distribution of client-server role between the clients).
The example of splitting and connection for 5 clients is shown on Figure 3.
Figure 3. Fragment of network CAD system architecture
The following tasks are executed by server:
1. generation of initial circuit according to the certain requirements, or granting of convenient manual input;
2. gathering information about clients, IP address and information of clients performance;
3. splitting the circuit with special algorithms by pieces for modelling realization;
4. transfer the data to client stations and start designing;
5. receiving of the circuit parts finally optimized and associate it to the uniform circuit.
The following tasks are executing by clients:
1. reception of the information from the server and sending signal about readiness for begin designing;
2. simulation on basis of the algorithm chosen;
3. transfer results to the server.
For network communications sockets are used. Socket is a final point of the network communications. Every socket used has a type and process associated with it. Sockets exist inside communication domains. The domains are abstraction which means concrete addressing structure and set of protocols and defines various sockets types inside the domain. As the protocol of sockets exchange TCP/IP protocol is used. It concerns to the transfer protocols with guaranteed delivery. It
is most convenient in this case to use stream sockets. With stream sockets the pipe created between two applications in stream form. The streams can be input or output, normal or formatted, with or without buffering. Note that stream sockets allow transferring the data only between two applications, as they assume a channel between these applications. However sometimes it is necessary to provide interaction of several client applications with one server or several client applications with the several server applications. The separate tasks and separate channels for each client application are created in this case in the server application.
The CARD system uses a modification of widely distributed PSPICE circuit simulation program that allows simulating a large class of analogous devices in direct current, frequency and time domains. CARD also consists of features for nominal design, design centering, tolerances assignment, etc. Mathematical models of semi-conductor devices are used in many similar programs, and the lists of connections of the circuit in a format SPICE are made by the majority of applications (Micro-Cap, Dr. Spice, OrCAD, P-CAD, ACCEL EDA, Viewlogic, COMPASS, Design Architect etc.). These ones and the subsequent versions use the same algorithms as SPICE, the same format of the input data. PSPICE allows simulation and support of circuit development containing as analogue, and digital components without manufacturing real circuits. The circuits on input influences, circuit behaviour on various frequencies, noises and other characteristics of the circuit can be designed by the user. PSPICE allow user to create "a computer model of circuit" for testing and debugging of the developed circuit before the beginning of its manufacturing. Using the circuit tests it is possible to be convinced that in all cases PSPICE works in 1.3-30 times faster, than other similar programs. The CARD system has been tested on a number of complex designs involving filters, amplifiers and control systems.
8 CONCLUSION
We have attempted to describe some our work in progress on the problem of facilitating the phase of reliability analysis and optimization based on distributed CAD system. On the negative side, reliability optimization requires many stochastic function evaluations which can be expensive in terms of circuit simulation and optimization cost. The expenses of optimization can be reduced by the implementation of efficient parallel algorithms and distributed processing technologies. New computer-aided reliability-oriented distributed design (CARD) system was described. This CARD system had some initial success towards making reliability optimization applicable.
REFERENCES
1 Abramov O. 1992. Reliability-Directed Parametric Synthesis of Stochastic Systems. Moscow: Nauka.
2 Abramov O. 2006. Reliability-Directed Computer-Aided Design System. Reliability: Theory & Application. San Diego-Moscow. no. 1: pp. 35-40.
3 Foster I. 1995. Designing and Building Parallel Programs. London: Addison-Wesley.
4 Mascagni M. and Srinivasan A. 2000. Algorithm 806: SPRNG: A Scalable Library for Pseudorandom Number Generation. ACM Transactions on Mathematical Software vol. 26, no. 3: pp. 436461.
5 Abramov O., Katueva Y. and Nazarov D. 2006. The Definition of Acceptability Region for Parametric Synthesis Problem. Proceedings of the 6th Asian Control Conference (ASCC'2006): pp. 780-786.