# Prognostics Implementation Methods for Electronics

Jie Gu, University of Maryland Nikhil Vichare, PhD, University of Maryland Terry Tracy, Raytheon Company Michael Pecht, PhD, University of Maryland

Key Words: electronics, failure mode, failure mechanism, prognostics, reliability

#### SUMMARY & CONCLUSIONS

Prognostics is a method that enables monitoring the state of reliability of a product in real time, and therefore can be used to provide advance warning of a failure, to minimize unscheduled maintenance, to provide condition-based maintenance, and to help in product design and development.

This paper identifies six levels of prognostics implementation for electronics, from on-chip packaging to complete systems of systems. An approach is then presented for selecting the implementation levels to cost-effectively optimize coverage. The process of selecting the prognostic approach and its implementation at various levels in electronics is enabled using failure modes, mechanisms, and effects analysis.

#### 1 INTRODUCTION

The purpose of prognostics is to identify potential failures in advance and to provide the information necessary for risk mitigation and management. For example, this information can be used to minimize unscheduled maintenance or to extend maintenance cycles by the use of condition-based maintenance management.

Prognostics has been used for many applications, including aerospace vehicles, civil infrastructures, nuclear facilities, and mining machinery [1,3]. The need for prognostics of electronic systems is growing since most control functions are being conducted by electronics.

Several studies have proposed techniques for applying prognostics to electronic products and systems, includes field effect transistors, power converters [9], printed circuit boards (PCB) and interconnects [2], global positioning systems, and enterprise servers [17,18]. These efforts have predominantly focused on prognostics implementation for custom–specific configurations.

At this time, there has been no literature to address the question of how to begin the prognostics implementation process for a new or legacy system, and at what level(s) the implementation makes sense. This extremely important question directly influences the implementation costs and also the costs incurred due to failures that could have been prevented by prognostics implementation. This paper addresses this question, to enable asset managers to provide maximum failure/fault coverage using minimum resources

(sensors, implementation costs).

#### 2 PROGNOSTIC APPROACH FOR ELECTRONICS

There are three broad categories of prognostic implementation: (1) using expendable prognostic cells, such as "canaries" and fuses, that fail earlier than the host product to provide advance warning of failure; (2) monitoring and reasoning of parameters, such as shifts in performance parameters, progression of defects, that are precursors to impending failure; and (3) modeling stress and damage in electronics utilizing exposure conditions (e.g., usage, temperature, vibration, radiation) coupled with physics—of—failure (PoF) models to compute accumulated damage and assess remaining life[3].

Prognostic cells, such as fuses and canary devices, are mounted on or incorporated into product to provide advance warning of failure for specific wear-out failure mechanisms. The time to failure of these prognostic cells can be precalibrated with respect to the time to failure of the actual product. Because of their location, these cells experience substantially similar dependencies as does the actual product. Stresses that contribute to degradation of the circuit include voltage, current, temperature, humidity, and radiation. Since the operational stresses are the same, the damage mechanism is expected to be the same for both the prognostic cell and the actual circuit. However, the prognostic cell is designed to fail faster due to increased stress on the cell structure by means of scaling. For example, scaling can be achieved by a controlled increase of the current density inside the cell. With the same amount of current passing through both circuits, a higher current density is achieved if the cross-sectional area of the current-carrying paths in the cell is decreased.

A failure precursor is an event that signifies impending failure. A precursor indication is usually a change in a measurable variable that can be associated with subsequent failure. For example, a shift in the output voltage of a power supply might suggest impending failure due to a degrading feedback regulator and opto—isolator circuitry. Failures can then be predicted by using a causal relationship between a measured variable that can be correlated with subsequent failure.

The life-cycle loads of a product can be generated from manufacturing, shipment, storage, handling, operating and

non-operating conditions. The life-cycle loads (thermal, mechanical, chemical, electrical, and so on), either individually or in various combinations, may lead to performance or physical degradation of the product and reduce its service life [3]. In the stress-damage prognostics approach, the extent and rate of product degradation depends upon the magnitude and duration of exposure to loads (usage rate, frequency, and severity). In this approach the life cycle loads are monitored in-situ, and used in conjunction with PoF-based damage models to assess the degradation due to cumulative load exposures.

The application of the stress-damage approach was illustrated in two case studies [2,11]. The test vehicle consisted of an electronic component—board assembly placed under the hood of an automobile and subjected to normal driving conditions in the Washington, DC, area. The test board incorporated surface—mount leadless components soldered onto a circuit board. Solder joint fatigue was identified as the dominant failure mechanism. Temperature and vibrations were measured in—situ on the board in the application environment. Using the monitored environmental data, physics-of-failure models were used to estimate consumed life. This approach accurately predicted remaining life.

# 3 LEVELS FOR PROGNOSTICS IMPLEMENTATION IN ELECTRONICS

To a semiconductor manufacturer, the device is their final product. To the manufacturer of the power supply, the semiconductor device is just one component among many; the power supply is the product. To the computer company, the power supply is just a component or sub–system inside its product. For an avionics company, the computer is a piece of equipment within a system, which may in turn be part of a system of systems.

For the discussion here, six prognostics levels for electronics have been defined [4]. Level 0 includes the chip and on-chip sites, such as circuits, and metallization. Level 1 includes parts and components, as well as the wirebonds, lead-frames, and encapsulants comprising the component. This level includes integrated circuits and discrete components such as resistors, capacitors, and inductors. Level 2 includes the circuit board and interconnects (leads, solder balls, etc.) connecting the components to the circuit card. This level also includes sites on the circuit board such as pads, platedthrough holes, vias, and traces. Level 3 includes the enclosure, chassis, drawer, and connections for circuit cards. This level includes products or sub-systems, such as hard drives, video cards, and power supplies. Level 4 includes electronic product, such as a notebook computer, the single line-replaceable unit (LRU), and the connections. Level 5 includes electronic systems and external connections between different systems (e.g. the connection from computer to printer or LRU and cockpit display). A system of systems is included in this level.

Given a level, if a potential failure cannot be tracked down to the next lower level, the analysis is confined to the given level and higher-level faults and failures. For example, the output current, as a failure precursor of a power supply failure (level 3), can be monitored in lieu of the ability to identify a change in output current due to transistor

degradation inside the power supply (level 0).

To assess whether a failure can be tracked (or traced) to a level, the failure modes and associated failure mechanisms must be known. The failure mode is the effect by which a failure is observed [5]. Failure mechanisms are the physical, chemical, thermodynamic, or other processes that result in the failure. The failure mechanisms at the various levels of electronics are listed in Table 1.

Various studies have been conducted to implement prognostics for electronic products. Table 2 includes the potential failures, prognostics levels, prognostics approaches, and monitored parameters for monitoring electronic products.

An example of prognostics implementation at multiple levels involved extended life assessment of circuit cards inside the space shuttle solid rocket booster (SRB) [13]. The vibration profile and history were recorded on the SRB from the pre-launch stage to splashdown. These data were used in conjunction with physics-of-failure (PoF) based models to assess the damage caused due to vibration and shock loads. Using the entire life-cycle loading profile of the SRBs, the remaining life times of the boards, components, and structures on the circuit cards were predicted. It was determined that an electrical failure was not expected within another forty missions. The prognostic analysis for this project was conducted at prognostics levels 1, 2, and 3. Interestingly, the aluminum bracket used to mount the circuit had lost significant remaining life in the shock loading, and this was at prognostics level 3. This study pointed out that prognostic analysis at just one level can be insufficient, and that we need to consider how all possible levels precipitate failures under life-cycle conditions.

#### 4 COMPARISON OF DIFFERENT PROGNOSTICS APPROACHES AT DIFFERENT LEVELS FOR ELECTRONICS

Prognostic cells in the form of canaries have been most popular at the lower prognostics levels (level 0 and 2). This may be because at these levels it is easier to manufacture a pre-calibrated cell at this level, compared to the higher levels in the electronic system.

Failure precursor approaches have been explored for all prognostic levels. In some cases, the product was treated as a "black-box"—that is, there is the assumption that it is not necessary to know exactly what happens inside the product. An empirical relationship or algorithm is then applied, based on the data collected from output signals, to identify the failure precursor. The advantage of this approach is that it is not necessary to know the failure mechanism and failure models. However, the disadvantage is that, before implementing prognostics, a significant amount of data must be collected for training the algorithm and / or to derive the empirical precursor prediction formula. The precursor approach is only as good as the training, and if failure modes are not uncovered during training, they can not be appropriately incorporated into a precursor approach. For example, if operational configurations or environmental parameters change, then the algorithm and empirical formula needs to be trained again.

Monitoring the environmental and operational loading

| Prognostics level                                                                                                              | Failure site                     | Failure mode                               | Failure mechanism                                          |
|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------|--------------------------------------------|------------------------------------------------------------|
| Level 0 (chip and on-chip sites)                                                                                               | Die metallization                | Short circuit                              | Electromigration                                           |
|                                                                                                                                | Gate-oxide                       | Breakdown                                  | Electrical overstress (EOS), electrostatic discharge (ESD) |
|                                                                                                                                |                                  | Short circuit                              | Time-dependent dielectric breakdown (TDDB)                 |
|                                                                                                                                |                                  | Change of leakage current                  | Hot carrier                                                |
|                                                                                                                                | Die                              | Crack                                      | Crack initiation and propagation                           |
|                                                                                                                                | Transistor                       | Short circuit                              | Contact migration                                          |
| Level 1 (parts and components that cannot be disassembled and reassembled with the expectation that the item would still work) | Between die and molding compound | Delamination Crack initiation and propagat |                                                            |
|                                                                                                                                | Bond                             | Bond lift, open circuit                    | Mechanical overstress                                      |
|                                                                                                                                |                                  | Open circuit                               | Corrosion                                                  |
|                                                                                                                                | Encapsulant interface            | Delamination                               | Corrosion                                                  |
|                                                                                                                                | Capacitors                       | Short circuit                              | Dielectric breakdown                                       |
| Level 2 (circuit board and interconnects connecting the components to the circuit card)                                        | Solder joint                     | Open circuit                               | Thermal fatigue, vibration fatigue                         |
|                                                                                                                                | Printed-through hole             | Open circuit                               | Fatigue                                                    |
|                                                                                                                                | Via                              | Short circuit                              | Electromigration                                           |
|                                                                                                                                | Printed circuit board            | Metallization shorts                       | Conductive–filament formation                              |
|                                                                                                                                |                                  | Loss of polymer strength                   | Glass transition                                           |
|                                                                                                                                | Lead pad                         | Open circuit                               | Corrosion                                                  |
|                                                                                                                                | Trace                            | Open circuit                               | Corrosion                                                  |
| Level 3 (enclosure, chassis, drawer and connections for circuit cards)                                                         | Connection                       | Open circuit                               | Mechanical wearout, corrosion                              |
| Level 4 (entire electronic system)                                                                                             | Connection                       | Open circuit                               | Mechanical wearout, corrosion                              |
| Level 5 (multi–electronic systems<br>and external connections between<br>different systems)                                    | Connection                       | Open circuit                               | Mechanical wearout, corrosion                              |

*Table 1 – Potential Failure in Electronic Products* 

to assess remaining life has been implemented at levels 2 and 3. This approach requires knowledge of failure modes, mechanisms, and associated PoF models to identify the parameters required for assessing dominant failure mechanisms. This approach is easier to implement on components and interconnects than on entire systems. However, by monitoring the real life—cycle loading of the product, this approach provides a use/load history that can be used later if models become available. This data can also be used for design and qualification of future products.

Prognostics level 5 (between electronic systems) has not been covered in previous research. However, this level is important for systems of systems.

### 5 A PROCEDURE TO IMPLEMENT PROGNOSTICS FOR ELECTRONICS

The prognostics implementation procedure for electronics is summarized in Figure 1.

First, it is necessary to analyze the electronic product using design-of-reliability tools, such as Failure Modes, Mechanisms, and Effects Analysis (FMMEA) [21]. The risk priority number (RPN) of the failure modes and mechanisms can be calculated.

For failure modes analysis, the RPN is the product of severity, occurrence, and detection. Severity describes the seriousness of the effect of failure for the customer. Occurrence describes how frequently the failure mode is

projected to occur as a result of a specific cause. For manufacturers, detection is the ability to detect problems or possible cause for defects, including external failures, before they reach customers. For the customers, detection is their ability to spot the initiation of a failure before it result in a malfunction in use. Typically, these are rated on a scale from the level of highest impact on reliability to lowest [21].

For failure mechanisms analysis, the RPN only includes the severity and occurrence, since the failure mechanisms are not detectable [21]. A prioritized assessment of failure modes and mechanisms and the environmental conditions that affect the modes and mechanisms, need to be established to ensure that the appropriate data is collected and utilized for prognostics.

Based on the failure modes and effects analysis from the FMMEA, the possibility of monitoring a failure precursor is determined. Here, precursors refer to changes in the operational or performance parameters. Once this is determined, engineering efforts can be assigned to develop sensors for monitoring the precursor variables and developing models/algorithms to analyze the data for predicting failures. This effort could be implemented without knowledge of the failure mechanisms, but with the disadvantages list earlier.

The failure mechanisms and models identified by FMMEA can be used for selecting a prognostic cell (fuse

| Monitoring product                                                             | Potential failure mode /failure mechanism                                                     | Prognostics level | Prognostics approach                       | Data monitored /<br>analyzed                                                                                                      |
|--------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|-------------------|--------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
| Circuit on the semiconductor [6]                                               | TDDB, electromigration                                                                        | Level 0           | Fuse and canary device                     | Current density                                                                                                                   |
|                                                                                | Die wearout                                                                                   | Level 0           | Fuse and canary device                     | Temperature                                                                                                                       |
| Circuit board [7]                                                              | Interconnect thermal or vibration fatigue                                                     | Level 2           | Fuse and canary device                     | Temperature and acceleration                                                                                                      |
| Power supply [8,9,10]                                                          | Transistor thermal runaway                                                                    | Level 0           | Failure precursor                          | Transistor temperature                                                                                                            |
|                                                                                | Power efficiency of<br>transistor decrease;<br>current transfer ratio of<br>isolator decrease | Level 1           | Failure precursor                          | Power efficiency of transistor and current transfer ratio of isolator                                                             |
|                                                                                | Solder joint thermal fatigue failure                                                          | Level 2           | Monitoring environmental and usage loading | Temperature profile                                                                                                               |
|                                                                                | Power efficiency and output voltage of power supply decrease                                  | Level 3           | Failure precursor                          | Power efficiency and output voltage of power supply                                                                               |
| PCB under hood in the automobile [2,11]                                        | Solder joint thermal or vibration fatigue                                                     | Level 2           | Monitoring environmental and usage loading | Temperature, acceleration                                                                                                         |
| End effector<br>electronics unit inside<br>the space shuttle robot<br>arm [12] | Solder joint thermal or vibration fatigue                                                     | Level 2           | Monitoring environmental and usage loading | Temperature, acceleration                                                                                                         |
| Circuit cards inside a rocket booster [13]                                     | Thermal fatigue and vibration fatigue of electronic parts                                     | Level 2           | Monitoring environmental and usage loading | Temperature, acceleration                                                                                                         |
|                                                                                | Vibration fatigue of connection                                                               | Level 3           | Monitoring environmental and usage loading | Acceleration                                                                                                                      |
| Notebook and desktop [14,15]                                                   | N/A                                                                                           | Level 0           | Monitoring environmental and usage loading | Temperature near the CPU                                                                                                          |
|                                                                                | N/A                                                                                           | Level 2           | Monitoring environmental and usage loading | Temperature of the motherboard                                                                                                    |
|                                                                                | N/A                                                                                           | Level 3           | Monitoring environmental and usage loading | Temperature of the hard disk                                                                                                      |
| Hard disk drive [16]                                                           | Poor writing or reading                                                                       | Level 3           | Failure precursor                          | Flying height of the head,<br>error counts, variations in<br>spin time, temperature,<br>data transfer rate                        |
| Global positioning<br>system (GPS) [17]                                        | Change of the MOSFET's parameter, including gain, series Resistance and threshold voltage     | Level 3           | Failure precursor                          | Principle feature value<br>(NMEA 0183 protocol)                                                                                   |
| Sun Microsystems<br>computer server [18]                                       | Voltage intermittent shift                                                                    | Level 4           | Failure precursor                          | Current, voltage, temperature, bit error rate                                                                                     |
| Refrigerator [19,20]                                                           | N/A                                                                                           | Level 4           | Monitoring environmental and usage loading | Total runtime, compress<br>run time, door opening<br>time, compressor cycles,<br>defrost cycles, power on<br>/off cycles          |
| Game console [19,20]                                                           | N/A                                                                                           | Level 4           | Monitoring environmental and usage loading | Ambient temperature,<br>heat sink temperature,<br>humidity, spike of the<br>voltage, rotation speed of<br>CD, product orientation |

Table 2 – Applications of Prognostics for Electronics

or canary device) or the stress and damage accumulation approach. In both cases, knowledge of life-cycle loads, obtained in the FMMEA process, helps to identify the dominant failure mechanisms that can be precipitated under the given loads. With the prognostic cell approach, the geometries or material properties of the cell can be scaled to accelerate the failure under use conditions, based on potential failure mechanisms. The time-to-failure (TTF) of the cell is then calibrated with the TTF of the actual product for the particular failure mechanism(s).

With the damage accumulation approach, environmental and usage load profiles are captured using sensors. Sensor data is then converted into a format that can be used in PoF models. The availability of sensors and appropriate monitoring locations can be a limiting factor in implementation of this method, although similar limitations occur with the other prognostic approaches. Fortunately, electronic parts often have embedded sensors for monitoring temperature, current, voltage, and so on. The prognostic implementation process should investigate the availability of internal sensors and the option of interrogating these sensors to collect data.

One purpose of prognostics is to reduce costs, including costs of logistics and maintenance. However, the implementation of prognostics will also add cost. The

implementation costs will include the cost of sensors, telemetry, data processing, power, additional size and weight, etc. The exact costs depend on the prognostics approach and level of implementation.

To optimize costs, the prognostic methods should be applied to the critical levels using the FMMEA approach. In other words, it may not be necessary to monitor all failure modes and mechanisms. The RPN should be used to determine the seriousness of the effect of failure for the customer, how frequently the failure mode is projected to occur as a result of a specific cause, and the ability to detect defects, including external failures, before the product / system is fielded.

#### **CONCLUSIONS**

A systematic analysis of failure modes, mechanisms, and effects analysis is necessary for selecting the prognostic approach and its implementation level in electronics. Prognostics can be implemented and may be necessary on all six levels of electronics, from on—chip packaging to complete systems. This prognostics implementation procedure enables the electronics to have maximum fault/failure coverage with minimal resources.



Figure 1. Prognostics Implementation Procedure for Electronics

## REFERENCES

- 1. Adams, Douglas E.; "Structural Health Monitoring Methods for Flight Safety," Journal of Failure Analysis and Prevention, v4, n5, October, 2004, pp.9–14.
- 2. Ramakrishnan, A. and Pecht, M.; "A Life Consumption Monitoring Methodology for Electronic Systems," IEEE Transactions on Components and Packaging
- Technologies, Sept, 2003. Vol. 26, No. 3, pp. 625-634.
- 3. Vichare, N. and Pecht, M.; "Prognostics and Health Management of Electronics," IEEE Transactions on Components and Packaging Technologies, Vol. 29, No. 1, March 2006. pp. 222–229.
- 4. Pecht, M.; Handbook of Electronic Package Design, Marcel Dekker, Inc., New York, NY, 1991.
- 5. Pecht, M.; Product Reliability, Maintainability, and

- Supportability Handbook, CRC Press, New York, 1995.
- Mishra, S.; Pecht, M.; and Goodman, D.; "In–situ Sensors for Product Reliability Monitoring," Proceedings of the SPIE Conference, 2002, Vol. 4755, pp. 10–19.
- 7. Anderson, N. and Wilcoxon, R,; "Framework for Prognostics of Electronic Systems," Proceedings of the COTS Conference., Seattle, WA, Aug. 3–5, 2004.
- 8. Orsagh, R.; Brown, D.; Roemer, M.; T. Dabney; and A. Hess; "Prognostic Health Management for Avionics System Power Supplies," Proceedings of the IEEE Aerospace Conference, Big Sky, MT, March 2005.
- Goodman, D.; Vermeire, B.; Spuhler, P.; and Venkatramani, H.; "Practical Application of PHM/Prognostics to COTS Power Converters," IEEE Aerospace Conference, Big Sky, MT, March, 2005.
- Nasser, L. and Curtin, M.; "Electronics Reliability Prognosis through Material Modeling and Simulation", IEEE Aerospace Conference, Big Sky, MT, 2006.
- 11. Mishra, S.; Pecht, M.; Smith, T.; McNee, I.; and Harris, R.; "Remaining Life Prediction of Electronic Products Using Life Consumption Monitoring Approach," Proceedings of the European Microelectronics Packaging and Interconnection Symposium, Cracow, June 16–18, 2002. pp. 136–142.
- 12. Shetty, V.; Das, D.; Pecht, M.; Hiemstra, D.; and Martin, S.; "Remaining Life Assessment of Shuttle Remote Manipulator System End Effector," Proceedings of the 22nd Space Simulation Conference, Ellicott City, MD, October 21–23, 2002.
- 13. Mathew, S.; Das, D.; Osterman, M.; Pecht, M.G., and Ferebee, R.; "Prognostic Assessment of Aluminum Support Structure on a Printed Circuit Board," Accepted for publication in ASME Journal of Electronic Packaging.
- 14. Searls, D.; Dishongh, T., and Dujari, P.; "A Strategy for Enabling Data Driven Product Decisions through a Comprehensive Understanding of the Usage Environment," Proceedings of IPACK'01 Conference, Kauai, Hawaii, USA, July 8–13, 2001. pp. 1279–1284.
- Vichare, N.; Rodgers, P.; Eveloy, V.; and Pecht, M.; "In–Situ Temperature Measurement of a Notebook Computer

   A Case Study in Health and Usage Monitoring of Electronics," IEEE Transactions on Device and Materials Reliability, Vol. 4., No. 4, December 2004, pp. 658–663.
- Hughes, G.F.; Murray, J.F.; Kreutz–Delgado, K.; and Elkan, C.; "Improved Disk–drive Failure Warnings," IEEE Transactions on Reliability, Vol. 51, Issue 3, September 2002, pp. 350–357.
- 17. Brown, D.; Kalgren, P.; Byington, C.; and Orsagh, R.; "Electronic Prognostics A Case Study Using Global

- Positioning System (GPS)," Proceedings of the IEEE Autotestcon, September 2005, pp. 832–838.
- 18. Gross, K.; "Continuous System Telemetry Harness," Sun Labs Open House, 2004, research.sun.com/sunlabsday/docs.2004/talks/1.03 Gross.pdf, viewed in August 2005
- Bodenhoefer, K.; "Environmental Life Cycle Information Management and Acquisition — First Experiences and Results from Field Trials," Proceedings of Electronics Goes Green 2004+, Berlin, September 5–8, 2004, pp. 541–546.
- 20. ELIMA Report, "D-19 Final Report on ELIMA Prospects and Wider Potential for Exploitation," April 30, 2005, www.ELIMA.org, viewed in December 2005.
- Das, D; Azarian, M.; and Pecht, M; "Failure Modes, Mechanisms, and Effects Analysis (FMMEA) for Automotive Electronics," 11<sup>th</sup> Annual AEC Workshop, Indianapolis, IN, May 9–11, 2006.

#### **BIOGRAPHIES**

Jie Gu CALCE Electronic Products and Systems 1103 Engineering Lab Building University of Maryland, College Park, MD 20742 USA

e-mail: jiegu@calce.umd.edu

Nikhil Vichare, PhD CALCE Electronic Products and Systems 1103 Engineering Lab Building University of Maryland, College Park, MD 20742 USA

e-mail: nikhilv@calce.umd.edu

Michael Pecht, PhD CALCE Electronic Products and Systems 1103 Engineering Lab Building University of Maryland, College Park, MD 20742 USA

e-mail: pecht@calce.umd.edu

CALCE Electronic Products and Systems 1103 Engineering Lab Building University of Maryland, College Park, MD 20742 USA

Terry Tracy P.O. Box 11337 Building MO2, Mail Station T15 Tucson, Arizona 8734–1337 USA

e-mail: tatracy@raytheon.com