Introduction
Traditional handbook-based reliability prediction methods for electronic products include Mil-Hdbk-217, Telcordia SR-332 (formerly Bellcore), PRISM, FIDES, CNET/RDF (European), and the Chinese GJB-299. These methods rely on analysis of failure data collected from the field and assume that the components of a system have inherent constant failure rates that are derived from the collected data. These methods assume that the constant failure rates can be tailored by independent “modifiers” to account for various quality, operating, and environmental conditions; despite the fact that most of the failure mechanisms are wear-out and thus not constant. Furthermore, none of these handbook prediction methods identify failure modes or mechanisms, nor do they involve any uncertainty analysis. Hence, they offer limited insight into practical reliability issues.
There are numerous well-documented concerns with the handbook prediction approaches, all showing mathematical and physical modeling fallacies. The overwhelming consensus is that these methods should never be used because they are inaccurate for predicting actual field failures and they provide highly misleading predictions, which can result in poor designs and logistics decisions [1-2]. IEEE Standard 1413.1, “IEEE Guide for Selecting and Using Reliability Predictions Based on IEEE 1413” [3], also found that the handbook-based reliability prediction methods do not provide adequate useful information to users. In the face of overwhelming technical and business evidence against their use, there are nevertheless some misguided practitioners who continue to seek solace in the familiarity of these tools and assume that upgrading them with new data will make them better and more useful1. At the same time, many companies have stopped using these methods, and the U.S. military has abandoned these approaches.
A practical alternative way of looking at product reliability and life cycles conditions is called prognostics and health management (PHM). PHM is the process of monitoring the health of a product and predicting the remaining useful life of a product by assessing the extent of deviation or degradation of a product from its expected state of health and its expected usage conditions [4]. The benefits of PHM include: (1) providing advance warning of failures; (2) minimizing unscheduled maintenance, extending maintenance cycles, and maintaining effectiveness through timely repair actions; (3) reducing the life cycle cost of equipment by decreasing inspection costs, downtime, and inventory; and (4) improving qualification and assisting in the design and logistical support of fielded and future systems. In 2003, the U.S. Department of Defense stated that having a prognostics capability would be a requirement for any U.S. military system [5].
The book Prognostics and Health Management of Electronics [6] provides an overview of the concepts of PHM and the techniques being developed to enable prognostics for electronic products and systems. The general PHM methodology is shown in Figure 1. The first step involves failure modes, mechanisms, and effects analysis (FMMEA), which includes design data, failure modes, failure mechanisms, failure models, life cycle profile, and possible maintenance records. The next step involves risk assessment to rank the risk priority, which includes the estimation for detection, severity, and occurrence of failure. Then the results for the virtual (reliability) life assessment can be given. Based on this information, the monitoring parameters relevant to key failure mechanisms are selected, and existing sensor data, bus monitor data, and built-in test results can be used to identify health status (e.g., abnormal conditions) and parameters. Physics of failure (PoF) models, data trending for precursors, and fusion approaches, which combine both the data-trending and PoF methodologies can then be used to predict reliability.
PHM is currently implemented at many different levels, including the component level, the circuit board level, and the system level. At present, there are several organizations implementing PHM, in a wide spectrum of applications, from military and aerospace applications to computer and automotive applications, and even home applications. There are also many more organizations that wish to take advantage of developments in PHM. Examples of PHM implementation in different fields are given below, and references can be found in the book Prognostics and Health Management of Electronics [6].
Military
Tuchband et al. (2007) [7] presented the use of prognostics for military line replaceable units (LRUs) based on their life cycle loads. The study was part of an effort funded by the U.S. Office of the Secretary of Defense to develop an interactive supply chain system for the U.S. military. The objective was to integrate prognostics, wireless communication, and databases through a Web portal to enable cost-effective maintenance and replacement of electronics. This study showed that prognostics-based maintenance scheduling could be implemented into military electronic systems. The approach involves an integration of embedded sensors on the line replaceable units, wireless communication for data transmission, a data simplification tool, a PoF-based damage estimation algorithm, and a method for uploading this information to the Internet. The use of prognostics for electronic military systems enables failure avoidance, high availability, and the reduction of life cycle costs.
Aerospace
Shetty et al. (2002) [8] applied the PHM methodology for conducting a prognostic remaining-life assessment of the end effector electronics unit (EEEU) inside the robotic arm of the space shuttle remote manipulator system. A life-cycle loading profile for thermal and vibration loads was developed for the EEEU boards. A damage assessment was conducted using failure mechanical and thermo-mechanical damage models. A prognostic estimate using a combination of damage models, inspections, and accelerated testing showed that there was little degradation in the electronics and that the EEEU could be expected to last another twenty years.
Mathew et al. (2007) [9] applied the PHM methodology in conducting a prognostic remaining-life assessment of circuit cards inside a space shuttle solid rocket booster (SRB). Vibration time history recorded on the SRB from the pre-launch stage to splashdown was used in conjunction with physics-based models to assess the damage caused by vibration and shock loads. Using the entire life cycle loading profile of the SRBs, the remaining life of the components and structures on the circuit cards was predicted. It was determined that an electrical failure was not expected within another forty missions.
Automotive: Underhood Electronics
In the studies of Mishra et al. (2002) [10] and Ramakrishnan et al. (2003) [11], the test vehicle was a circuit board assembly placed under the hood of an automobile and subjected to normal driving conditions in the Washington, DC area. The test board incorporated eight surface-mount leadless components soldered onto an FR-4 substrate using eutectic tin-lead solder. Solder joint fatigue was identified as the dominant failure mechanism. Damage accumulated through solder joint fatigue was updated periodically using in-situ collected data on temperature and vibration. It was found that the predicted life of the solder joint failure based on PHM algorithm was within 8% of the actual experimental life.
Electronic Systems: Computer Server
Systems for early fault detection and failure prediction are being developed using variables, such as current, voltage, and temperature, continuously monitored at various locations inside the system. Sun Microsystems (2003) [12] refers to this approach as continuous system telemetry harness. Along with sensor information, soft performance parameters such as loads, throughputs, queue lengths, and bit error rates are tracked. Characterization is conducted by monitoring the signals (of different variables) to learn a multivariate state estimation technique model. Once the model is established using this data, it is used to predict the signal of a particular variable based on learned correlations among all variables. Based on the expected variability in the value of a particular variable during application, a sequential probability ratio test (SPRT) is constructed. During actual monitoring SPRT is used to detect the deviations of the actual signal from the expected signal based on distributions (and not on single threshold value). The monitored data is analyzed to (1) provide alarms based on leading indicators of failure and (2) enable use of monitored signals for fault diagnosis, root cause analysis of no-fault-founds (NFF), and analysis of faults due to software aging.
Electronic Systems: Notebook Computers
Vichare et al. (2004) [13] conducted in-situ health monitoring of notebook computers. The authors monitored and statistically analyzed the temperatures inside a notebook computer, including those experienced during usage, storage, and transportation. After the data was collected, it was used to estimate the distributions of the load parameters. The usage history was used for damage accumulation assessment and remaining life prediction.
Electronic Systems: GPS System
Brown et al. (2005) [14] demonstrated that the remaining useful life of a commercial global positioning system (GPS) can be predicted by using a data precursor to failure approach. The failure modes for the GPS system included precision failure due to an increase in position error and solution failure due to increased outage probability. These failure progressions were monitored in situ by recording system-level features reported using the National Marine Electronics Association Protocol 0183. The GPS system was characterized to collect the principal feature values for a range of operating conditions. The approach was validated by conducting accelerated thermal cycling of the GPS system with the offset of the principal feature value measured in-situ. Based on experimental results, parametric models were developed to correlate the offset in the principal feature value with solution failure. During the experiment the built-in-test (BIT) provided no indication of an impending solution failure.
Electronic Systems: Power Supply
Simons et al. (2006) [15] performed a PoF-based prognostics assessment of the failure of a gull-wing lead power supply chip on a DC/DC voltage converter printed circuit board assembly. First, three-dimensional finite element analyses (FEA) were performed to determine strains in the solder joints were due to thermal or mechanical cycling of the component. The strains may have been due to lead bending resulting from the thermal mismatch of the board and chip, or from a local thermal mismatch between the lead and the solder as well as between the board and the solder. Then the strains were used to set boundary conditions for an explicit model that could simulate initiation and the growth of cracks in the microstructure of the solder joint. Finally, based on the growth rate of the cracks in the solder joint, estimates were made of the cycles to failure for the electronic component.
Nasser et al. (2006) [16] also applied PHM methodology to predict failure of a power supply. They subdivided the power supply into component elements based on specific material characteristics. They predicted that degradation in any individual or combination of component elements could be extrapolated into an overall reliability prediction for the entire power supply system. Their PHM technique consisted of four steps: (1) acquiring the temperature profile using sensors; (2) conducting finite element analysis to perform stress analysis; (3) conducting fatigue prediction for each solder joint; (4) predicting the probability of failure of the power supply system.
Electronic Systems: Home Appliances
The European Union funded a project from September 2001 through February 2005 called the Environmental Life Cycle Information Management and Acquisition (ELIMA) for consumer products, which aimed to develop better ways of managing the life cycles of products [17]. The objective of this work was to provide a basic model for predicting the remaining lifetime of parts removed from products based on dynamic data collected by the ELIMA system. The ELIMA technology included sensors and memory built into a product to record dynamic data such as operation time, temperature, and power consumption. This was added to static data about materials and manufacturing. As a case study, the member companies monitored the application conditions of a game console and a household refrigerator. The work concluded that for remaining life time prediction it was usually essential that the environments associated with all life intervals of the equipment be considered. These included not only the operational and maintenance environments, but also the pre-operational environment, when stresses imposed on the parts during manufacturing, assembly, inspection, testing, shipping, and installation might have a significant impact on the eventual reliability of the equipment.
Electronic Components: Circuit Board Components
Gu et al. (2007) [18] developed a methodology for monitoring, recording, and analyzing the life cycle vibration loads for remaining-life prognostics of electronics. A printed circuit board (PCB) with electronic components was mounted on a vibration shaker, which generated random vibration loading. The responses of the PCB to vibration loading in terms of bending curvature were monitored using strain gauges in situ. The interconnect strain values were then calculated from the measured PCB response and used in a vibration failure fatigue model for damage assessment. Damage estimates were accumulated using Miner’s rule and then used to predict the life consumed and remaining life. Uncertainty analysis was also performed, which included measurement uncertainty, parameter uncertainty, model uncertainty, failure criteria uncertainty, and future usage uncertainty. Sensitivity analysis was used to identify the dominant input variables that influenced prediction results. Then uncertainty propagation was conducted to perform reliability assessment with confidence levels. The methodology was shown to be effective for the remaining-life prognostics of a printed circuit board.
Electronic Components: Battery
Rufus et al. (2008) [19] presented prototype battery health monitoring algorithms (support vector machine, dynamic neural network, confidence prediction neural network, and usage pattern analysis). The health of batteries is important in back-up environments such as telecommunications, uninterruptible power supply (UPS) and other storage applications. The various algorithms were used and tested on the battery data (voltage, current, temperature, etc.) collected from several lithium ion battery cells. The battery data was collected under different operating conditions (storage and charge/discharge cycling at room temperature and 50�C). The results showed that the battery health monitoring algorithms were helpful for determining the health status of a lithium ion cell, allowing for estimation of the probability of battery failure with time.
Electronic Components: Insulated Gate Bipolar Transistors
Insulated Gate Bipolar Transistors (IGBTs) are used in applications such as the switching of automobile and train traction motors, high voltage power supplies, and in aerospace applications such as switch mode power supplies to regulate DC voltage. The failure of these switches can reduce the efficiency of the system or lead to system failure. Patil et al. (2008) [20] developed a prognostics methodology to predict and avert IGBT failures by identifying failure precursor parameters and monitoring them. In this study, IGBTs aged by thermal/electrical stresses were evaluated in comparison with new components to determine the electrical parameters that change with stressing. Three potential precursor parameter candidates, threshold voltage, transconductance, and collector-emitter (ON) voltage, were evaluated by comparing aged and new IGBTs under temperatures ranging from 25 to 200�C. The trends in the three electrical parameters with temperature were correlated to device degradation. Then these precursors were monitored in-situ and precursor trending data were input into PoF models to allow for anomaly detection and prediction of remaining life of these devices.
Electronic Components: Capacitor
Gu et al. (2008) [21] presents a prognostics approach that detects the performance degradation of multilayer ceramic capacitors (MLCCs) under temperature-humidity-bias conditions and then predicts remaining useful life. In the tests, three performance parameters (capacitance, dissipation factor, and insulation resistance) were monitored in situ. A prognostics approach was developed to detect and predict failures using a multi-parameter regression, residual, detection, and prediction analysis on four types of MLCCs. It was found that the training process for the prognostics approach depended only on the capacitor type and not on the test conditions (such as different DC bias levels). For eight failed capacitors out of the 96 capacitors, all failures were detected with no missed alarms. Five out of the eight failed capacitors yielded advance warning of failure.
Electronic Components: Metal-oxide Semiconductor Field-effect Transistor
Goodman et al. (2006) [22] used a prognostic cell to monitor the time-dependent dielectric breakdown of the metal-oxide semiconductor field-effect transistor on integrated circuits. The tests were conducted under accelerated conditions. Acceleration of the breakdown of an oxide was achieved by applying a voltage higher than the supply voltage to increase the electric field across the oxide. When the prognostics cell failed, a certain fraction of the circuit lifetime was used up. The fraction of consumed circuit life was dependent on the amount of over-voltage applied and was estimated from the known distribution of failure times. Thus the prognostics cell operated autonomously and was able to give advance warning of impending failure of integrated circuits.
Conclusions
Traditional reliability predictions based on handbook methods are inaccurate and misleading. PHM is more suitable for reliability prediction and remaining life assessment, since it considers actual operational and environmental loading conditions. Currently, research is being conducted to build-up physics-based damage models for electronics, obtain the life cycle data of product, and assess the uncertainty in remaining useful life prediction in order to make PHM more realistic. Research is also being conducted on advanced sensor technologies, communication technologies, decision-making methods, and return on investment methods. In addition, from the applications and examples listed above, it is clear that PHM can be incorporated into various electronics products and can benefit many facets of daily life. In the future, due to the increasing amount of electronics in the world and the competitive drive toward more reliable products, PHM will be looked upon as a cost-effective solution for predicting the reliability of all electronic products and systems.
1 Attempts to “correct” handbook-based reliability prediction do not stand up to scientific scrutiny. For example, the Reliability Information Analysis Center published 217Plus to address the shortcomings of Mil-Hdbk-217. In particular, the 217Plus handbook assigns constant failure rates for solder joint failure and temperature cycling as two independent values. This is erroneous science, since the failure of a solder joint and other failures caused by temperature cycling do not occur at a constant rate, nor are these two types of failures independent.
References
- Pecht, M., “Why the Traditional Prediction Models Do Not Work�Is There an Alternative?” Electronics Cooling, Vol. 2, No. 1, 1996, pp. 10-12.
- Pecht, M., and Nash, F., “Predicting the Reliability of Electronic Equipment,” Proceedings of the IEEE, Vol. 82, No. 7, 1994, pp. 992-1004.
- IEEE Standard 1413.1., “IEEE Guide for Selecting and Using Reliability Predictions Based on IEEE 1413,” 2002.
- Vichare, N., and Pecht, M., “Prognostics and Health Management of Electronics,” IEEE Transactions on Components and Packaging Technologies, Vol. 29, No. 1, 2006, pp. 222�229.
- DoD 5000.2 Policy Document. Defense Acquisition Guidebook, Chapter 5.3�Performance Based Logistics, 2004.
- Pecht, M., “Prognostics and Health Management of Electronics,” Wiley-Interscience, New York, NY, 2008.
- Tuchband, B., and Pecht, M., “The Use of Prognostics in Military Electronic Systems,” Proceedings of the 32nd GOMACTech Conference, Lake Buena Vista, FL, 2007, pp. 157-160.
- Shetty, V., Das, D., Pecht, M., Hiemstra, D., and Martin, S., “Remaining Life Assessment of Shuttle Remote Manipulator System End Effector,” Proceedings of the 22nd Space Simulation Conference, Ellicott City, MD, 2002.
- Mathew, S., Das, D., Osterman, M., Pecht, M., Ferebee, R., and Clayton, J., “Virtual Remaining Life Assessment of Electronic Hardware Subjected to Shock and Random Vibration Life Cycle Loads,” Journal of the IEST, Vol. 50, No. 1, 2007, pp. 86-97.
- Mishra, S., Pecht, M., Smith, T., McNee, I., and Harris, R., “Remaining Life Prediction of Electronic Products Using Life Consumption Monitoring Approach,” Proceedings of the European Microelectronics Packaging and Interconnection Symposium, Cracow, 2002, pp. 136-142.
- Ramakrishnan, A., and Pecht, M., “Life Consumption Monitoring Methodology for Electronic Systems,” IEEE Transactions on Components and Packaging Technologies, Vol. 26, No. 3, 2003, pp. 625-634.
- Whisnant, K., Gross, K., and Lingurovska, N., “Proactive Fault Monitoring in Enterprise Servers,” 2005 IEEE International Multi-conference in Computer Science and Computer Engineering, Las Vegas, NV, 2005.
- Vichare, N., Rodgers, P., Eveloy, V., and Pecht, M., “In-Situ Temperature Measurement of a Notebook Computer – A Case Study in Health and Usage Monitoring of Electronics,” IEEE Transactions on Device and Materials Reliability, Vol. 4, No. 4, 2004, pp. 658-663.
- Brown, D., Kalgren, P., Byington, C., and Orsagh, R., “Electronic Prognostics � A Case Study Using Global Positioning System (GPS),” IEEE Autotestcon, 2005.
- Simons, J., and Shockey, D., “Prognostics Modeling of Solder Joints in Electronic Components,” IEEE Aerospace Conference, 2006.
- Nasser, L., and Curtin, M., “Electronics Reliability Prognosis through Material Modeling and Simulation,” IEEE Aerospace Conference, 2006.
- Bodenhoefer, K., “Environmental Life Cycle Information Management and Acquisition � First Experiences and Results from Field Trials,” Proceedings of Electronics Goes Green 2004+, Berlin, 2004, pp. 541-546.
- Gu, J., Barker, D., and Pecht, M., “Prognostics Implementation of Electronics under Vibration Loading,” Microelectronics Reliability, Vol. 47, No. 12, 2007, pp. 1849-1856.
- Rufus, F., Lee, S., and Thakker, A., “Health Monitoring Algorithms for Space Application Batteries,” Proceedings of the 1st International Conference on Prognostics and Health Management, Denver, CO, 2008.
- Patil, N., Das, D., Goebel, K., and Pecht, M., “Failure Precursors for Insulated Gate Bipolar Transistors,” Proceedings of the 1st International Conference on Prognostics and Health Management, Denver, CO, 2008.
- Gu, J., Azarian, M., and Pecht, M., “Failure Prognostics of Multilayer Ceramic Capacitor in Temperature-Humidity-Bias Conditions,” Proceedings of the 1st International Conference on Prognostics and Health Management, Denver, CO, 2008.
- Goodman, D., Vermeire, B., Ralston, J., and Graves, R., “A Board-Level Prognostic Monitor for MOSFET TDDB,” IEEE Aerospace Conference, 2006.