VLADIMIR GUREVICH, Izrail'
A NEW CRITERION NEEDED TO EVALUATE RELIABILITY OF DIGITAL
PROTECTIVE RELAYS
Для оценки надежности в технике существует целый набор различных критериев и параметров, однако, для оценки надежности цифровых устройств релейной защиты (DPR) выбран один из них: «наработка на отказ», который получил повсеместное распространение и указывается в технической документации, рекламных проспектах, тендерной документации как основной показатель, характеризующий надежность DPR. Но насколько оправдан выбор именно этого критерия? Ответ на этот вопрос пытается найти автор данной статьи.
Ключевые слова: цифровые реле защиты, надежность, наработка на отказ, гаммапроцентная наработка до отказа.
There is a wide range of criteria and features for evaluating reliability in engineering; but as many as there are, only one of them has been chosen to evaluate reliability of Digital Protective Relays (DPR) in the technical documentation: Mean (operating) Time Between Failures (MTBF), which has gained universal currency and has been specified in technical manuals, information sheets, tender documentation as the key indicator of DPR reliability. But is the choice of this criterion indeed wise? The answer to this question is being sought by the author of this article.
Keywords: digital protective relays, reliability, mean time between failures, MTBF, gamma-percentage operating time to failure
INTRODUCTION
Reliability is defined as the property of an object to maintain over time, within a given range, the value of all parameters characteristic of its ability to perform required functions in predetermined modes of operation and conditions of use, maintenance, repair, storage and transportation. As can be seen from this definition, reliability is a multidimensional property which may include, depending on the purpose of an object and the environment within which it is placed, fail-safe, durability, serviceability and storageability or some combination of any of these.
One of the key reliability indicators is “Mean (operating) Time Between Failures” (MTBF), defined as total operation time (or the sum of the operational periods) of a restorable item divided by number of observed failures within this time. That is to say, it is one of reliability indicators of a repairable device or an engineering system characteristic of the average time (in hours) of device operation between failures (repairs).
PROBLEMS WITH USING MTBF FOR EVALUATE RELIABILITY OF DPR
Manufacturer’s technical manuals generally claim this period for DPR to be equivalent to 50 through to 90 years? Does it mean that the time between two DPR failures is really 50 to 90 years? Despite the definition given to this term, common sense suggests that in real life, as opposed to virtual reality, it can’t be so. As they say, more is the pity for common sense.
There are many variations of MTBF, for example “Mean Time Between Unit Replacement” (MTBUR) that is defined as arithmetic mean (average) time to failure (replacement) of a replaceable unit.
It is quite apparent, that given modular design of DPR and non-serviceability of multilayer printed circuit boards (PCB) with electronic Surface Mount Devices (SMD) being the basis of the state-of-the-art DPR “replaceable units” might only mean integral modules (PCB), and DPR repair (restoration) may generally be carried out only by module (PCB) replacement. In this case there is no practical difference between MTBF and MTBUR indicators and the consumers will continue to stare bewilderedly at amusing many-digit numbers corresponding to 50 to 90 years and wonder what they can mean and how they correlate to between 15 to 18 years’ real service life of DPR.
From where can such mind-blowing MTBF numbers be derived? Naturally, they may be obtained only theoretically through calculations. Briefly, these calculations appear as follows. Let us assume that 1,000 units were subjected to test during the year. During the test, 10 units failed. Then MTBF will be equal to 1 year x (1,000 units/10 units) = 100 years or, in round figures, 900,000 hours. It is this many-digit number that the consumer will see in technical manual or information sheet in respect of DPR.
But why then don’t DPRs last this long if the calculations suggest they should. There may be dozens of reasons for this. First, testing during one (or even more) years does not allow consistent failure results since failure rate varies substantially over time and application of a constant failure rate (as is the case above) does not ensure consistent results. In actual practice, failure rate over time is constant only in single region and is described by the Weibull - Gnedenko function:
\ (tVм P \ A(t)= - .£ UJ в
Я=const 4 ►
« i > « 2 > « 3 \
Fig.. 1. Failure rate - time relationship: 1 - running-in period (early failures); 2 - normal operating period (random failures); 3 - deterioration period(wear-out failures); t - unit in-service time, 9-scale parameter, ( - shape parameter. In running-in period (<1, in normal operating period (3=1
and in deterioration period (>1.
In the case of a variable failure rate (i. e., when 1(1) const) the above example of MTBF calculation is irrelevant and it should be calculated using other, far more complicated formulae.
Second, in actual practice many manufacturers, instead of pilot testing large quantities of their units in field operating conditions (which is both costly and time consuming effort), carry out theoretical calculation of MTBF based merely on failure rate data for basic electronic components contained in DPR and on their number in DPR. This calculation appears as follows. A device is comprised of, say, 10 components having a failure rate 10‘7 h"1 each. Then the device failure rate on the whole would be 10-10’7 h”1 = 10~6 h"1, and time to failure would amount to 10s h = 1 million h. And this is where many uncertainties arise that cannot be foreseen in any calculations. High-quality electronic components themselves supplied by a renowned and trustworthy manufacturer may safely operate as a part of equipment for dozens of years and have rather low failure rates. But it is only the case in particular operating conditions for which these parts are. It is for these conditions that failure rates are referenced in parts’ reference sources. It is these failure rates that are assumed in calculations carried out by DPR manufacturers. But what’s the real state of affairs?
Example 1. Electrolytic capacitors intended for DPR switched-mode power supplies. Even high-quality general purpose industrial grade electrolytic capacitors produced by well-known Japanese manufacturers fail fairly soon when affected by high frequency currents flowing through them in switched-mode power supplies, see Fig. 2. Leaking electrolyte results in the substantial damage of many other circuit components as well and even conductors, and via interconnections in a printed circuit board.
Power supply failure for this reason occurs after some 12 to 15 years of operation in DPRs of various types produced by different manufacturers. The problem is brought about by the wrong choice of electrolytic capacitor types by DPR manufacturers, lack of technology used to protect
electrolytic capacitors from high frequency currents in DPR circuits resulting in electrolyte heating and increase of its chemical activity. Has this problem with electrolytic capacitors been taken into account when calculating MTBF? Obviously, not!
Fig. 2. Faulty DPR switched-mode power supplies of different types with damaged electrolytic capacitors. Left figure shows taints in printed circuit boards from electrolyte leakage.
Example 2. Disk ceramic capacitors encased in a molded plastic shell, see Fig. 3. In DPRs operating in subtropical climates with high air humidity whose capacitors often lead to DPR failure due to the conduction path between the capacitor plates resulting from the migration of silver ions from one ceramic disk surface to another induced by applied voltage in high humidity environment when the capacitor sealing is not really perfect. As a result, ceramic capacitors generally known to be highly reliable components with long life factors result in multiple DPR failures after some 15 years in operation. Has this problem been taken into account when calculating MTBF? Obviously, not!
Fig. 3. A part of DPR logic input module of REL316 type with damaged capacitors C
and failed optocouplers Opt.
Example 3. Transistor optocouplers abounding in input module circuits of any DPR, see Fig.
3. The specific feature of an optocouplers is the gradual decrease of Current Transfer Ratios (CTR) caused by the degradation of optical plastic (decrease in transparency) used to connect light-emitting and light-detecting components of an optocouplers. Consequently, if the operation mode for optocouplers inside a DPR has been chosen in the initial section of characteristic (to limit the power dissipated by logic input circuits), then after 13 to 16 years of DPR operation, epidemic failures of their logic inputs will occur. Has this problem been taken into account when calculating MTBF? It’s black and white.
Example 4. In technical manuals for such essential components of any microprocessor units as EEPROM (Electrically Erasable Programmable Read Only Memory) inherent data retention is claimed greater than 100 years, see Fig. 4. Yet in actual practice they have started to ‘clear’ the data recorded therein as early as after 15 years of operation inside a DPR. Has this effect been taken into account when calculating MTBF?
An article by the employees of a DPR [1] manufacturer claims that their relays have an MTBF of 74 years and every single failure was detected at the time of operation by the in-house embedded DPR self-diagnostics system. Let us beg leave to doubt the credibility of such claims since no in-house embedded DPR self-diagnostics system is able to detect capacitor electrolyte leakage, or degradation of optocoupler’s transfer ratios, or higher rates of self-discharge of flash memory components, or the problems with the control element called watchdog. As a result, we have a burst-type DPR failure flow occurring after 15 to 18 years of operation while manufacturers claim an MTBF to be 50 to 90 years.
1996-2011
Xicor E2PROMs are designed and tested for applica tions requiring extended endurance. Inherent data re tention is greater than 100 years.
Fig. 4. EEPROM components manufactured in 1996 that failed after 15 years of operation against the backdrop of extracts from technical manuals that guarantee retention of
data recorded therein for 100 years.
Interestingly, these kinds of problems have never occurred with electromechanical protective relays that have served hand and foot (and in fact are still in service) for many dozens years.
The examples include inverse time relays of RI types which were manufactured about one hundred years ago by Allmanna Svenska Elektriska Aktiebolaget - ASEA (in English spelling: General Swedish Electrical Limited Company) in Swedish city of Vasteras. In the upper left corner of the relay you can see a swastika bearing the letters A, S, E, A - logo (trademark) of ASEA Company, see Fig.5, placed on these relays up to 1933 when this symbol was assumed by German Nazi.
Fig. 5. Electromechanical relay RI (ASEA) manufactured about a hundred years ago that
retained its operability to this day.
Quite a few such relays were in service in power sector of the former Soviet Union (with the swastika carefully defaced) and they are familiar to the old generation of protection engineers. Until recently, these relays could still be found operating on the sites including in the territory of Russia, and it was not because they could no longer fulfill their functions that they were replaced, but rather it was nothing but a shame to keep using a hundred years’ old relays any more.
Over the past few years, the professional community has developed an awareness of the fact that DPRs are less reliable than electromechanical relays. The solution of this problem is generally thought to be DPR redundancy.
Utility System
Fig. 6. M-3430 multifunction DPR. The numbers shown in white circles designate standard relay protection functions under ANSI classification.
The problem is becoming ever more relevant with the number of functions being performed by a single DPR terminal. In multifunction relays which “put all the eggs in one basket” (see Fig. 6), a failure or malfunction of only one of these “eggs” may result in the disconnection of a generating unit thus causing great damage. For this reason, if multifunction DPRs are used to protect critical objects, manufacturers themselves advise [1] (notwithstanding the MTBF values claimed by them to be equal to many dozens of years!) to use double DPR sets, see Fig. 7. This way, the calculated value of the MTBF per such a double set has been obtained in [2] to be equal to ... 500 years!
Here we face some more questions. First, what are such absolutely fantastic MTBF values having nothing to do with reality for, and what are they worth? Second, massive accidents in power grids may be caused both by the failure to shut off the sections running in the emergency mode and the false tripping of healthy grid sections (generating units, loaded lines) with load swing to other generating units and loaded lines (this scenario was pursued in one of the largest accidents in the USA). It means that DPR are subject to two rather than one faulty states: both failure to operate and false operation. Herein practical use of two identical sets, live and standby, is not all that simple since it is unclear how in this case you should connect DPR output contacts actuating the circuit breakers, in logical AND circuit or in logical OR circuit?
Any one connection option reduces the probability of one DPR faulty state while accordingly enhances the probability of the other. That is, the use of two identical DPR sets is apparently inadequate to enhance the reliability of relay protection for critical objects and it is wise to use three sets with output signal majorization based on “two out of the three” principle.
Utility System
High-lmpedance
Grounding
Fig. 7. Double (redundant) generation unit protection set for more reliable protection
One more problem relating to the MTBF application may occur in the near future. The market entry by versatile functional modules [3, 4] sold and acquired as standalone products that are used
to construct DPRs (as is the case with PC desktops today) moves these individual PCB modules from the “replaceable component part” category to the category of “standalone non-restorable part”, items that are highly versatile and have different reliability values. It is obvious that in this case reliability values will not only have to be calculated on a per module basis but also that the MTBF rate cannot be applied to them collectively since they are non-restorable items.
One more doubt as to the application of MTBF to DPR is that even single failure damages may be very high indeed hence a substantial time span between the first and the second failure (high MTBF rate) will be of little use.
A NEW CRITERION FOR DPR RELIABILITY EVALUATION
Considering that MTBF indicator has completely defamed itself by great values having nothing to do with reality and giving no actual information on DPR reliability and by its obvious limitations, application of MTBF for DPR reliability measurement should be dropped.
A new DPR reliability indicator is recommended [5]: Gamma-Percentage Operating Time to Failure (operating life, operating service), i.e., the time during which an item failure shall not occur with a particular probability expressed as a percentage. For example, 95-percentage operating time to failure within at least 5 years means that during 5 years’ operation failed devices shall make up a maximum of 5 % of all devices in service. Besides, this value shall be specified for both DPR as a unit and separate PCB functional modules of which it is comprised. With such an intuitive and straightforward indicator the consumer could trace the number of failed DPRs (or separate modules from which it is built) during a particular period of time and to make claims against the manufacturer if within the observed time many more DPRs failed than what is guaranteed by the manufacturer. With such an indicator it is much easier for the consumer to be guided in the future market of versatile modules [4] to choose the most cost-effective alternative.
Besides, manufacturers shall be required to specify, in both technical and tender documentation, average service life for individual modules and include guidelines on the frequency of preventive replacement of these modules to maintain high reliability of relay protection. Such periods may amount to, for example, 8 to 10 years for power supplies, 12 years for logical input modules, 15 years for central processor units, 17 years for analog input modules, etc. These data shall be known to manufacturers respecting the code of good practice who keep a close watch on product failure and damage statistics. The question of who shall bear the costs of such preventive module replacement shall be decided by agreement between a manufacturer and a consumer. For example, a manufacturer might guarantee nonrecurring (possibly partial, for example, covering power supplies only) preventive module replacement while any further replacements shall be carried out at the expense of a consumer. Large-scale preventive maintenance has already been carried out upon the author’s recommendations (although it is limited to electrolytic capacitors contained in healthy DPR power supplies, type REL/REC/RET, series 316, manufactured by ABB using technology that still allows such replacement) in an power company operating many DPR of this series. The question of commencement of capacitors preventive replacement in DPR power supplies produced by Siemens after 10 years in operation is now pending.
SUMMARY
Application of suggested criterion for measuring DPR reliability and of additional reliability data discussed above will make it possible to change the nature of relationships between DPR consumers and manufacturers and to enhance reliability of relay protection. Practical implementation here depends on the consumer who is to specify appropriate requirements to DPR reliability in tender documentation along with basic technical requirements [6] since soon changes of regulatory documents are nothing to hope for.
REFERENCES
1. MOZINA C. J, YALLA M. V.V.S. Design, Manufacturing and Application of Multifunction Digital Relays for Generator Protection. - Canadian Electrical Association, Montreal, 1996.
2. WARD S. Improving reliability for power system protection - Relay Protection and Substation Automation of Modern Power Systems (RFLElectronicsInc.,US A), Ee6oiccapbi,9-13ceHT5i 6pa 2007 r
3. GUREVICH VI. Relay Protection: Thinking about Future. - “Electrical Networks and Systems”, 2011. No. 1. p.73-80.
4. GUREVICH VI. The New Concept of Digital Protective Relays Design. - “Components and Technologies”,2010, No.6,p.l2-15.
5. GUREVICHV.I. Problemswith EvaluationsoftheReliaOility of Rela2 Protection. -“Electrichestvo”, 20-1, No.2, p.28-31
6.GUREVICHV.LProplems for Standardization of theDigitalProtective Relays. Components and Technologies”, 2012, No.1, p.6-9. Received 20.12.20 H