Thermal management in high performance electronics has become a leading challenge for design and reliability engineers. Harsh temperatures can be caused either by harsh operational conditions (high power dissipation), or by harsh environmental conditions. Both require adequate thermal management to ensure that the product temperature stays within acceptable limits. High temperature, significant cyclic excursions of temperature, strong temperature gradients, and rapid temperature transients, can cause multi-physics degradation to electronic/photonic systems (including components, substrates, and interconnects), leading to progressive aging, degradation of system performance and eventual failure. Multi-physics refers to thermal, chemical, and mechanical aging/degradation mechanisms. ‘Systems’ includes components, substrates, and interconnects.
Modern electronic systems consist of complex multiscale (nanoscale-to-macroscale) features. A typical integrated circuit (IC) component may utilize heterogeneous integration (HI) technology (Figure 1a). typical HI architecture contains a diverse set of co-packaged dies that perform multiple functions, such as digital ICs (of diverse nodes) for computing and memory, analog and RF ICs, silicon-photonics ICs, wide band-gap (WBG) power ICs, MEMS sensor ICs, etc. As shown in Figure 1b, typically, these ICs could be arranged in 2.5D or 3D stacked configurations on/in silicon or glass or organic interposers and substrates and interconnected with bridges, microbump and C4 solder joints, copper-bump hybrid joints and wire bonds. There are typically many ultra-dense layers of metal traces and through-thickness vias in the BEOL (Back-end-of-the-line) and RDL (redistribution layers) of individual chips/substrates. The substrate/interposer usually also contains numerous surface-mounted or embedded passive components. 3D die-stacking often creates novel challenges for getting the heat out, such as requiring advanced microchannel multi-phase cooling solutions. Such HI advanced packages typically use a wide variety of highly engineered material systems (and corresponding interfaces): doped semiconductors, complex metal alloys, polymer dielectrics and adhesives, ceramic dielectrics. Such complex components can be termed system-in-package (SiP). At the next higher packaging level, the SIP substrate can be soldered with larger area-array solder joints to surface mount printed wiring assemblies (PWAs) that may also contain other technologies, such as electromechanical components, (transformers, relays, switches), multi-layer circuitry, through-thickness via interconnects.
The Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland has been a premier research organization in this subject-matter for the past 35 years, partnering with industry and government partners around the world, on physics-based studies and on physics-informed AI-based studies of degradation mechanisms in advanced electronic systems, to enable more dependable (robust, reliable, safe, and secure) technologies. CALCE research includes:
• predictive multi-physics modeling to enable co-design for dependability (c-DfD), process optimization, not just for yield but also for long-term dependability (PfD)
• model-assisted quantitative accelerated stress testing to enable qualification for dependability (QfD)
• model-based and data-based prognostics and continuous real-time health management throughout the life-cycle to enable sustainment for dependability (SfD)
• managing a trustable supply chain through the life-cycle, for dependability (MfD)
In particular, CALCE has conducted extensive research in temperature-driven degradation mechanisms in electronic systems. To appreciate the full scope of the influence that temperature can have on such a complex system, it is instructive to first list out the various multi-physics degradation and damage mechanisms that electronic systems can experience (e.g. excessive mechanical forces or deformations, excessive temperature or temperature cycles, excessive electrical potential gradient or current density, excessive concentration of harsh chemical contaminants and moisture, etc.).
Figure 2 shows a sample list (expressed as a chart, for convenience). The multiphysics degradation/failure mechanisms are broadly grouped as ‘overstress’ and ‘wearout’ mechanisms. Overstress mechanisms are damage mechanisms that can occur from a sudden exposure to extreme levels of any of the multiphysics loading types listed above that takes the system (and the materials it’s comprised of) beyond its performance limits. In contrast, wearout mechanisms refer to gradual progressive accumulation of incremental damage associated with sustained exposure to moderate levels of the multiphysics loading types listed earlier.
Figure 2 provides a convenient framework to discuss the many multiphysics effects that temperature can have on the performance of electronic systems over their lifetime:
Effects of Temperature on Mechanical Degradation Mechanisms: Temperature is known to reduce the stiffness, strength and creep resistance of many materials (mostly polymers and metals) and interfaces. Since electronics use many different polymer-based dielectrics and attachment materials, as well as low-melting conductors (such as solders), excessive temperature can reduce their strength and creep resistance, thus exposing them to the risk of sudden fracture or delamination or gradual creep rupture/cavitation while under thermal and/or mechanical stresses. Creep cavitation is known to lead to problems such as stress-driven diffusive voiding (SDDV) in electronic interconnects The mechanical stresses can be a result of temperature changes (either heating or cooling) combined with thermal expansion mismatches between dissimilar materials used in electronics. These stresses can:
(i) cause warpage and complex deformation of chips, packages and distortion of photonic waveguides
(ii) affect electronic bandgap energy and dislocation mobility of semiconductor devices
(iii) generate cracking/delamination at interfaces or in bulk packaging materials (especially overstress cracking in brittle materials, e.g. in semiconductor die materials, in Extremely Low K (ELK) BEOL structures, or in ceramic dielectrics used in resistors and capacitors, or in glass substrates used in advanced packages, or in intermetallic layers seen in bonded metallic structures such as in soldered interconnects, die attach layers, wire-bonds, etc.)
(iv) cause fatigue failures in ductile materials if the temperature excursions are cyclically repeated (due to power cycling or cyclic environmental conditions).
While overstress cracking is mostly seen in brittle materials/ interfaces, fatigue cracking can occur even in ductile materials/ interfaces. Harsh cyclic temperature excursions, due to operational power cycling and environmental temperature swings, generate thermo-mechanical stresses at chip-package-PWB interfaces (such as thermal expansion mismatch between the die and substrate/interposer, package and the PWB, via and the PWB). CALCE has studied fatigue damage due to these cyclic thermo-mechanical stresses at packaging interfaces (such as die interfaces with the package molding compound, underfill, TIM or with surrounding substrate/interposer materials in the case of embedded dies) and in interconnection features (such as die-attach, RDLs in the package/substrate/interposer, ELK layers in the BEOL structures of the die, vias, wirebonds, solder interconnects and conductive adhesives). Examples of such failure modes in the literature are seen in Figure 3.
In extreme temperature spikes, materials may even experience phase transitions, e.g. polymers may cross glass transition temperatures or softening temperature, polarized ceramics can experience de-poling, etc. Temperature can also exacerbate the problem of fretting wear in separable connectors, caused by thermomechanical/vibration micromotion. Temperature can also increase the risk of whisker growth on metallic surfaces (such as tin-plated surfaces) by increasing the thermo-mechanical stresses that can assist in whisker formation and growth. [See for example Table 3 of Chapter 24 (Reliability) in Heterogeneous Integration Roadmap, eps.ieee.org/images/files/HIR_2021/ch24_rel.pdf].
Effects of Temperature on Electrical Degradation Mechanisms: Since electrical power dissipation causes temperature increases, due to self-heating effects (SHE), harsh temperatures can arise from both environmental and operational conditions. This combined temperature can cause electrical failures if there is a sudden runaway electrical condition, such as electrical overstress (EOS) in conductors due to extreme current density. Sudden temperature spikes can also increase the risk of breakdown in dielectrics and oxide layer in transistors under extreme potential gradients. Sustained exposure to high temperature can accelerate a whole host of electrical degradation mechanisms such as: hot carrier injection (HCI), Bias temperature instability (NBTI/PBTI) in transistor devices; time-dependent dielectric breakdown (TDDB) in device oxide layers, slow charge trapping and contact spiking in conductor and semiconductor structures, electromigration and thermo-migration in metallic structures within the device or in the BEOL and RDL conductor layers or in external conductors and interconnects in substrates and interposers; loss of surface insulation resistance (SIR) due to electrochemical migration mechanisms such as conductive anodic filament growth (CAF) and cathodic dendritic growth.
Effect of Temperature on Chemical Degradation Mechanisms: Temperature increases the energy state and mobility of defects in materials and is therefore a well-known accelerator of diffusion and other defect migration mechanisms and chemical reactions. As a result, temperature increases the risk of corrosion of metallic conductor features in the presence of harsh ionic contaminants (either from the environment or residual impurities leftover from process chemicals), growth of brittle fragile intermetallic compounds at interfaces of bonded metallic structures (with concurrent risk of Kirkendall voiding in metal joints, e.g. at interface of solder joint and copper pad), and aging in polymers due to de-polymerization and side-chain reactions.
Temperature has a relatively low effect on degradation mechanisms due to radiation of high-energy particles, so these degradation mechanisms and modes are not discussed here.
This discussion has highlighted the complex and inter-dependent set of reliability risks that temperature can pose in complex electronic/photonic systems throughout their life cycle. Cooling solutions face increasing challenges due to ever increasing package complexity, miniaturization, and power density. However, temperature definitely has to be managed judiciously for all the reasons mentioned in this paper. Our ability to keep pace with system-level Moore’s law (SysMoore) depends on academic and industry research groups working effectively together on the dual challenges of developing more effective cooling solutions and also on developing more temperature-resistant material systems for electronics applications. When designing cooling solutions, researchers and engineers need to keep in mind that the goal is not just lowering the peak temperatures, but also co-optimizing the severity of temperature cycles, temperature gradients and temperature transients.
The role of temperature in any one degradation mechanism can be quite complex. As an example, consider a study of solder joint fatigue in flip-chip assemblies due to accelerated temperature cycling tests during design verification testing (DVT) [https://doi.org/10.1115/1.2793846]. Flip chip dies are routinely mounted on organic substrates/interposers, causing a large CTE mismatch (2.5-3.5 ppm/°C for Si vs 15-20 ppm/°C for in-plane expansion of typical fabric-reinforced organic substrate materials) [Figure 4]. Consequently, the corner solder joint in a 10mm x 10 mm flip-chip component can experience as much as 4% shear strain for every 10 °C change in temperature, resulting in fatigue failures in a few hundred temperature cycles in accelerated stress testing. A common solution therefore is to add an underfill, which is typically a filled polymer that is carefully tailored to the flip chip assembly for the appropriate mechanical, thermomechanical, electrical and moisture absorption characteristics. Underfills can improve fatigue durability by 1-2 orders of magnitude, but the addition of these new materials and processes further complicates the assembly process.
When it comes to fatigue damage, a decrease in temperature can be as damaging as an increase in temperature. In the case of solder however, increasing temperature not only causes thermal expansion mismatch but also reduces creep resistance of the solder, thus increasing the strain severity in the solder. High temperature causes several additional degradation modes (Figure 5), such as:
(i) aging of solder microstructure (‘ripening’), further decreasing the creep resistance of the solder material
(ii) growth of intermetallic layers at the solder pads (with concurrent risk of Kirkendall voiding in the pad), further embrittling the connection between the solder and the UBM/ pad
(iii) electromigration and thermo-migration in the solder joint, further weakening the joint
(iv) recrystallization of solder grains due to temperature cycling, further decreasing the creep resistance and increasing the risk of intergranular fatigue fractures.
Managing these multiple risks requires the right combination of solder alloy material, under-bump metallization (UBM) system and plating system on the substrate pad.
In contrast, a corresponding drop in temperature causes the same expansion mismatch, but increases the solder’s creep resistance, thus reducing the amount of shear deformation in the solder joint. However, an increase in creep resistance of the solder may also produce competing negative effects on the rest of the assembly, due to corresponding increase in the stresses in copper pads/ traces, RDLs, ELK structures, and microvias.
Continued research focuses on balancing these competing risks through careful and precise system-level optimization, thus making DfR (design for reliability) a very important part of system co-design.