IEEE Standard 1332-1998 [IEEE 1998] requires that, “The [equipment] supplier shall determine the customer’s requirements and product needs,” and that, “The [equipment] supplier, working with the customer, shall include the activities necessary to ensure that the customer’s requirements and product needs are fully understood and defined, so that a comprehensive design specification can be “Among the design specifications, thermal environment is an input parameter for a variety of design, manufacturing and test functions.
The thermal environment influences parts selection, reliability assessments, manufacturing processes, and qualification procedures, and indirectly influences system architecture, maintenance plans, warranties and life cycle costs. Traditionally, the reduction in operating temperature was taken as the primary means to improve reliability and performance. An indication of the importance attached to temperature reduction is demonstrated by the willingness to accept heavy complex avionics cooling systems by the aircraft industry. However, studies of the various thermally influenced reliability failure mechanisms have suggested that steady state temperature itself may not be as important as the spatial and temporal gradients of temperature [Lall, Pecht, and Hakim 1997]. Thus, environmental specifications should include thermal parameters such as temperature cycling, ramp rates and thermal gradients.
Thermal analysis of electronic products and systems during design is an iterative process. The limitations of the thermal analysis process must be understood for its effective utilization as a design tool. It is most important to understand the influence and sensitivity of the final temperature results on actual performance and reliability of the parts and systems. If one attempts to identify the effects of this variation without considering the failure mechanism and/or performance metric, the results will likely be erroneous.
Temperature-Reliability Considerations
Reliability, defined as the ability of a device to fulfill its intended function, is often expressed in terms of number of years of useful life. Reliability-related failures render the device non-operational due to damage caused by a failure mechanism, actuated generally by external and internal stresses. Failure mechanisms determine device reliability; most often, some failure mechanisms will dominate and cause device failure before others. Pecht and Ramappan [1992] found that most electronic hardware failures over the past decade were not component failures, but were attributable to interconnects and connectors, system design, excessive environments, and improper user handling.
Steady-state temperature, temperature cycles, temperature gradients, and time-dependent temperature changes all have the potential to affect the reliability of modern avionics. However, because of the use of reliability prediction methods, such as Mil-Hdbk-217, steady-state temperature has often been considered the only stress parameter affecting reliability. At its core are the Arrhenius-based models, formulated to predict the influence of steady-state temperature on electronic device reliability. In this case, the mean time to failure, MTF (hours), for a given steady-state temperature is represented as:
where, MTFref is the mean time to failure at a specified reference temperature, KB is the Boltzman’s constant and Edev is the device activation energy (eV).
One problem with the use of this relationship is illustrated by Figure 1. The lack of correlation between observed failure rate and junction temperature suggest the inadequacy of Arrhenius-based reliability prediction models. Also, the effects of temperature on electronic devices are often assessed by accelerated tests carried out at extremely high temperatures. For example, electromigration tests are often conducted at temperatures above 250oC and at current densities ten times those applied in actual operation; the test results are then extrapolated to operating conditions to obtain a value for the thermal acceleration of device failures.
Figure 1: Scatter diagram showing the lack of correlation between observed failure rate and junction temperature of different bipolar integrated circuits [Hallberg 1994].
Implicit in this accelerated test strategy is the assumption that the failure mechanisms active at higher temperatures are also active in the equipment operating range, and that the Arrhenius relationship holds for the whole range of temperatures. Problems arise when the failure mechanisms precipitated at accelerated stress levels are not activated in the equipment operating range. (A NIST study noted that, “there is ample evidence that a straight forward application of the Arrhenius equation, with activation energies determined from high temperature accelerated stress testing, is not strictly valid for predicting real device lifetime” [Kopanschi, 1991].)
Sources of Uncertainties in Thermal Analysis
For the thermal engineer, it is crucial to know the purpose and level of accuracy required in thermal simulations, as this has a major impact on the modeling and calculation requirements. Many thermal engineers believe that the purpose of thermal design is only to lower temperature. Statements like “The reliability of a silicon chip is decreased by about ten percent for every 2oC rise in temperature” [Yeh 1995] is often repeated as axioms. There are many evidences to the contrary [Lall, Pecht, and Hakim 1997] to this statement. However, for thermal engineers who subscribe to this view, the goal of thermal analysis is to determine the temperature with a high degree of certainty. The degree of accuracy of temperature determination depends on the significant sources of uncertainties.
Uncertainties arise from the possible variations in design, manufacturing, thermal modeling parameter selection, and assumptions associated with thermal modeling. It is necessary to identify, characterize and control these variations. Another type of variation is fundamental, arising from incomplete knowledge about the physics of heat transfer and fluid flow, and from the inherent inadequacies of the numerical solution schemes. Sources of uncertainties in thermal analysis are identified and described with examples from avionics equipment development in the following sections. There are variations in the environment and operating conditions of a system. However, these variations are essential to the proper operation of a system.
The Environmental Profile
The temperature profile which avionics equipment undergoes depends on factors such as the location of take-off and landing, the number of flights per day, and the duration profile of the flights. In addition, one cannot assume a fixed idealized ambient environment for all the equipment, even on the same airplane. Variations in temperature of about 30oC for the EE bay equipment, compared to about 15oC for the instrument panel equipment, are often seen [Cluff 1996]. The boundary conditions used in the heat transfer analysis should reflect these environmental conditions.
In avionics applications, the Institute for Interconnecting and Packaging Electronic Circuits (IPC) suggests some limiting cases for different thermal parameters, which should be maintained in commercial aircraft. Table 1 lists those worst-case operating environments. However, actual thermal profile experienced by avionics systems tend to go beyond these limits. As we can see from Figure 2, the temperature extremes are much lower than the range specified. The temperature difference in cycles and the number of cycles are much higher than the limits set by IPC.
Table 1: IPC recommended worst case thermal conditions for commercial aircraft | |
Temperature Cycle Environment | IPC 1992 Commercial Aircraft |
Minimum Temperature | -55oC |
Maximum Temperature | 95oC |
T of Cycle | 20oC + Power Dissipation |
Dwell Time at Extremes | 12 hours |
Cycles per Year | 365 |
Temperature Ramp Rate | < 20oC/min |
Required Life | 20 years |
Figure 2: Actual min./max. variations in temperature profiles for avionics equipment in service do not correspond to IPC recommended worst case conditions.
Similar trends are also present in other prevailing standards for commercial [ARINC 1993] and military [DOD 1986] avionics. Table 2 shows some of the thermal parameters prescribed by these standards. The environmental testing [DOD-160C 1989] standards also specify the temperatures and altitudes for operation of military aircrafts. Due to the obvious concerns about size, cost and reliability of cooling equipment, one needs to review these limits based on device functionality and physical models of temperature-reliability relationships.
Table2: Avionics thermal standards |
||
Parameter | ARINC 600 | DOD-STD-1788 |
Cooling air flow rate (30oC) | 136 kg/hr kW | 58 kg/hr kW |
Emergency operation without cooling air | 90 min | 5 min |
Maximum external case temperature | 60oC (overall) 65oC (local) |
76oC |
Maximum weight | 20 kg | 27 kg |
Operating Conditions & Functional Profiles
In avionics, heat generation comes primarily from active components. The thermal profiles inside the equipment depend on the dissipated power. It is not sufficient to use the rated power output of the components, as this is often significantly higher than the actual operating value. Furthermore, in CMOS devices dissipated power must be associated with the actual operational frequency.
Analysis Methodology and Assumptions
Several assumptions must be made about the system for thermal analysis. Examples of some of these assumptions include modes of heat transfer involved at different locations in the system, the contact resistances between different solid surfaces and heat transfer coefficients at the surfaces where the mode of heat transfer is not completely known. Even under the best of circumstances, there will be some errors in these types of assumptions. All these sources of error are in addition to the lack of complete knowledge about the physics of heat and mass transfer and the numerical imprecision of any computer model.
Boundary Conditions
The difficulties in predicting the actual boundary conditions can add error to thermal analysis results. Fluid dynamics-based thermal analysis methodologies require well defined boundary conditions for fluid flows and temperatures. Lasance and Joshi [1998] discuss problems associated with the common boundary conditions and the methodologies used to determine those conditions.
For modeling system-level heat transfer, the heat transfer coefficient at the boundaries can introduce significant errors. For example, in natural convection heat transfer mode, the Nusselt number relationships are different for vertical and horizontal walls. Thus, the use of a constant heat transfer coefficient for all walls of a rectangular enclosure will not be appropriate. One must either use appropriate relationships for each type (vertical or horizontal) of wall or ascertain that simplifying assumptions, such as use of a constant heat transfer coefficient, do not introduce uncertainty into the thermal solution beyond the acceptable level.
Thermal Resistance of Electronic Parts
Thermal resistance is a common way of characterizing electronic parts. The thermal resistance value provided by the part manufacturer is valid only for the test and analysis conditions in which the data is calculated. The European DELPHI/SEED projects [Rosten et al. 1997, Lasance et al. 1997] propose an alternative approach called “Boundary Condition Independent Compact Models” which overcomes this major drawback. The method is currently under discussion within the JEDEC thermal standardization committee to replace the ja and jc thermal metrics. However, the implementation of the alternative approach depends on the co-operation of the part manufacturers and will take time to gain acceptance. For the present, when using thermal resistance data from part manufacturers for characterizing a system, it is strongly advised to validate this data if accuracy is the objective of the analysis.
Uncertainties in Thermophysical Properties
In many cases, the thermal analysts are not aware of the exact material for the components of the system. For example, the materials used in a commercial off-the-shelf part are not listed in the part data sheets. For example, in area array packages like ball grid arrays (BGA), where the solder balls provide a major thermal path from the die to the circuit board, lack of knowledge about correct solder composition may alter the results of thermal analysis. The level of variation from lack of knowledge about solder ball composition will depend on the heat flow pattern in the particular system.
Many thermophysical properties of materials are functions of temperature [Touloukian 1967]. Some manufacturing processes can also affect the thermophysical properties. Contact resistance depends on the pressure used in the manufacturing process. The thermophysical properties also depend on the processing methods used for composites, plastics and similar materials.
How Uncertainties in Thermal Analysis Affect the Design Process
As part of conducting thermal analysis, the design team should perform a study of the sources of errors from all the cases listed. The effects of the individual sources of error and their cumulative consequences must be considered to specify how much uncertainty can arise for each of the factors, how these factors can “stack-up” and how the results of these uncertainties will be used and for what purpose.
If the predicted changes in temperatures or temperature gradients at the parts were comparable to or larger than their respective allowable uncertainties, it would imply that a better characterization of the responsible parameters (e.g., the material properties, thermal resistance data) is required. For cases where these parametric uncertainties have only a small impact on the part level temperatures and temperature gradients, the overall uncertainty will be dominated by the inaccuracies in the thermal simulations (e.g., selection of grid size, convergence criteria). The accuracy of the thermal simulations for all cases must be such that results stay within the allowable temperature and thermal gradient uncertainties.
Recommendations
A timely, cost effective and productive approach to thermal management requires that design engineers, heat transfer specialists, and reliability engineers work together as a team. The appropriate thermal environment can be maintained for an avionics system only through understanding of the effects of thermal environment. An approach which has been successfully used in Japan, Taiwan, Singapore and Malaysia, [Kelly 1995] and in the U.S. by the CADMP Alliance (now known as Electronic Components Alliance) [Evans 1995] as an alternative to the Arrhenius relation and the Mil-Hdbk-217 is called the physics-of-failure methodology. Key points in the implementation of the methodology include the following:
- Identify the environment in which the equipment will operate. In avionics applications, the customer will specify the operating environment in terms of absolute physical parameters, such as temperature ranges, or will quote the relevant chapter in some handbook or specification. While this may be a useful starting point for the designer, it rarely identifies the actual range of environments experienced by the equipment in the location in which it will be used. It may be better, and from the customer’s point of view, more contractually sound, to state where and how the equipment will be used. The supplier then determines the “actual” environment. As a point of interest, consumer goods manufacturers, such as the automobile industry, have never had the benefit of a detailed environmental specification supplied by their customers (the public), but have been able to effectively ascertain the environment for themselves.
- The target for the design team for avionics equipment should be to allow the maximum possible limits for thermal parameters (including temperature, thermal gradient, and number of thermal cycles) without compromising the functionality, reliability, and overall safety. This objective can be compared with the current outlooks in manufacturing where the trend is to provide the maximum possible tolerances on dimensions and geometry for reduction in manufacturing cost. A similar outlook in the thermal limit specification will not only reduce the cost for thermal analysis and design, but it will also help lower the overall cost of the system.
- Design the system to account for temperature-related performance degradation. Steady-state temperature and temperature gradients can have an influence on electrical functional parameters, including propagation delays and noise margins. It is important to identify the important performance parameters and to select or design the system architecture to ensure that the temperature and gradients are maintained in the system so that those parameters remain within effective range.
- Based on the performance parameter (temperature-dependent) limits of the devices used in systems, the design team must decide how these parameters will be maintained. A “trade-off” study should be conducted to determine whether the parameter control would be achieved through device design or by setting thermal limits. The potential cost of setting thermal limits should be taken into consideration at this point.
- Learn how systems fail under various degrading influences. This involves assessing the potential failure mechanisms and determining the role of stresses, including steady-state temperature, temperature cycling, temperature gradients, and time-dependent temperature changes, on the failure mechanisms.
- Control manufacturing and assembly processes to reduce those variabilities that cause performance and reliability degradations. In particular, any manufacturing and assembly parameters, which affect the contact resistances and internal thermal resistances, must be understood and controlled.
References
1. ARINC 600, 1993, Air Transport Avionics Equipment Interface, Aeronautical Radio, Inc., Annapolis, Maryland.
2. Cluff, K., 1996, “Characterizing the Humidity and Thermal Environments of Commercial Avionics for Accelerated Test Tailoring,” PhD Dissertation, University of Maryland, College Park.
3. DO-160C, 1989, Environmental Conditions and Test Procedures for Airborne Equipment, Radio Technical Commission for Aeronautics, Washington D.C.
4. DOD-STD-1788, 1986, Avionics Interface Design Standard, U. S. Department of Defense, Washington, D.C.
5. Evans, J., Lall, P., and Bauernschub, R.A., 1995, “Framework for Reliability Modeling of Electronics,” Reliability and Maintainability Symposium, pp. 144-151.
6. Hallberg, O., 1994, “Hardware Reliability Assurance and Field Experience in a Telecom Environment,” Quality and Reliability Engineering International, Vol. 10, pp. 195-200.
7. Hall, P., Pecht, M., and Hakim, E., 1997, Influence of Temperature on Microelectronics and System Reliability – A Physics of Failure Approach, CRC Press, Boca Raton, FL.
8. IEEE 1332-1998, 1998, “Standard Reliability Program for the Development and Production of Electronic Systems and Equipment.”
9. Kelly, M. J., Boulton, W. R., Kukowski, J. A., Meieran, E. S., Pecht, M., Peeples, J. W. and Tummala, R. R., 1995, JTEC Panel Report on Electronic Manufacturing and Packaging in Japan, Loyola College, Maryland.
10 . Kopanschi, J.K., Blackburn, D.L., Harman, G.G., and Berning, D.W., 1991, “Assessment of Reliability Concerns for Wide-temperature Operation of Semiconductor Devices and Circuits,” First High-Temperature Electronics Conference.
11 . Lasance, C.J.M, and Joshi, Y., 1998, “Thermal Analysis of Natural Convection Electronic Systems: Status and Challenges,” Advances in Thermal Modeling of Electronic Components and Systems, Vol. 4, A. Bar-Cohen and A.D. Kraus, Eds., ASME Press Series, Chapter 1, pp. 1-177.
12. Lasance, C. J. M., 1995, “The Need for a Change in Thermal Design Philosophy,” Electronics Cooling, Vol. 1, No. 2, pp. 24-26.
13. Lasance, C. J. M., Rosten, H. I., and Parry, J. D., 1997, “The World of Thermal Characterization of According to DELPHI-Part II: Experimental and Numerical Methods,” IEEE Transactions on Components, Packaging, and Manufacturing Technology, Part A, Vol. 20, No. 4, pp. 392-397.
14. Pecht, M., and Ramappan, V., 1992, “Are Components Still a Major Problem? A Review of Electronic Systems and Device Field Failure Returns,” IEEE Transactions on Components, Hybrids, and Manufacturing Technology, Vol. 15, No. 6, pp. 1060-1064.
15. Rosten, H. I., Lasance, C. J. M., and Parry, J. D., 1997, “The World of Thermal Characterization of According to DELPHI-Part I: Background to DELPHI,” IEEE Transactions on Components, Packaging, and Manufacturing Technology, Part A, Vol. 20, No. 4, pp. 384-391.
16. Touloukian, Y. S. (Editor), 1967, Thermophysical Properties of High Temperature Solid Materials, Thermophysical Properties Research Center, Purdue University, Macmillan Company, New York.
17. Yen, L.T., 1995, “Review of Heat Transfer Technologies in Electronic Equipment,” Transactions of the ASME, Journal or Electronic Packaging, Vol. 117, No. 4, pp. 333-339, December.