Disk drive reliability and thermal management
Hard disk drives are electromechanical systems which store information. They are the most complex electronic sub-assembly within a computer. If a microprocessor can qualify as the “heart” of a computer system, a disk drive can qualify as the “brain”.
Drives store and retrieve data in milliseconds, measured as access and transfer rate time. The design of a drive is extremely complex. It requires design and integration of mechanical and electrical parts, continuous exchange of information which takes the form of electrical signals between electronics and mechanical sub-assemblies, and also linear and rotational motion.
The mechanical portion of the drive has a motor bearing assembly, spindle, actuator and E-block, or a head stack assembly. The disk stack is mounted on the motor bearing spindle. The motor bearing of an operational drive may rotate as high as 10,000 rpm continuously. The flying height of a head is maintained at micro-inches above the disk during the operation of a drive. The drive performs several functions in different operating modes such as random read, random write, sequential read, sequential write, idle and seek.
Temperature related typical failure modes and mechanisms in a disk drive assembly could be eliminated with proper thermal design. Excessive heat in the motor bearing assembly can cause bearing failure, or excessive friction may breakdown additives in the lubricant to cause catastrophic failure. High internal drive cavity air temperature can cause head-stack assembly, pre-amp IC, and the flex circuit assembly failures. High temperature gradients are known to cause thermal asperity, servo, and read-write problems either as intermittent, soft error, or hard failures. Other heat related failures are servo off-track problems such as bumps, write fault errors and track mis-registration. High temperature may cause the electrical characteristics of the drive to deteriorate significantly and cause signal degradation. At high and low temperature extremes, drive access and seek time slows substantially to cause DNR “Drive Not Ready?” “Command Time Out”, non-recoverable data errors, or cold start failures. Excessive heat from ICs such as the motor driver, ASICs, the read-write channel IC, or a resistor pack may cause the overall operating temperature of the drive to rise, hence adversely impacting on the reliability and the performance of a drive. Typical failure modes due to heat and coefficient of thermal expansion mismatch between the substrate and the IC package are; pad delamination, solder joint cracks, die crack, wire bond and die attach failures.
The thermal management requirements for a good design are; power budget specifications, power de-rating factor, thermal resistance measurements ja and jc, junction temperature measurements and temperature control, and optimum board layout.
Drive form factor restriction requires drives to stay within the industry specified dimensions for height, length, and width for each class of drives. Thermal design and thermal management becomes extremely difficult since the use of heat-sinks, cooling fans, or any other cooling method may violate the form factor requirements set by the electronic industry. Ingenuity and thermal design experience are key factors required to maintain a drive’s overall operating temperature within limits, and junction and case temperature within each device specification.
Thermal management and thermal design margins are critical factors and must not be ignored if high standards of performance and reliability in the field are desired for disk drive products.