The traditional pathway for development of thermal management technologies for electronics has been to analyze using heat transfer theory augmented by computational tools such as finite difference or finite element methods, CFD tools, or thermal management system simulation tools, sometimes in tandem with prototype system fabrication and testing. Combining recently available high speed computing processors, faster memory, and machine learning strategies now offers ways to enhance these traditional tools for thermal control component and system development. These trends are producing an evolution of thermal management technology development towards use of physics-inspired machine learning approaches that combine machine-learning with heat transfer theory.
To illustrate this point, the development of a heat pipe system for a server machine is considered. The example system of interest, shown schematically in Figure 1, has two evaporators and a single condenser.
This is a simplified example of a real heat pipe system that could have multiple evaporators with a single condenser heat rejection heat exchanger. Note that in this example heat pipe system, operating conditions correspond to the specified values of condenser inlet water temperature Tcfi and flow rate ṁc, and the heat rejection rates in the evaporators q̇a and q̇b. If we combine the governing heat transfer relations and conservation equations indicated in Figure 1, the performance parameters, the temperatures of computer chips a and b (Tchip,a and Tchip,b) can be computed if the conductances (UA e,a , UA e,b , UA c) are known.
Modeling performance of this system can be accomplished with physical modeling alone if the conductances can be predicted with submodels, or determined from experiments. These conductances
are typically the result of combined conduction and convection effects, and the geometry of the components may require multidimensional analysis. Multiphysics modeling tools (ANSYS, COMSOL, etc.) could be used to determine the conductances, or separate experiments on the device components could be used to determine conductance values. These approaches are well known and extensively used in traditional thermal control system development.
The availability of machine learning approaches opens the door to at least two other approaches: (1) using physical modeling with a genetic algorithm and (2) modeling using a neural network data-based model. The features and pros and cons of each of these are discussed below.
A Physics Inspired Model and Genetic Algorithm
In this model it is presumed that there is a data set in which each point is a list of variables that includes the operating conditions variables Xi and the resulting performance parameter values Yj for those conditions: [X1, X2, X3, … Xm, Y1, Y2, Y3, … Yn].
Here the recommended practice of normalizing the data would be applied, with each variable being normalized with the median value for that parameter in the data set before it is input to the model, and the model is trained to predict the normalized chips temperatures, which equal the respective actual chip temperature divided by their median value in the data set. Once the model is trained in this way, to predict performance, the operating conditions are normalized with the corresponding median value used in training, and the model predicts the normalized chip temperatures. The physical chip temperatures are then determined by multiply each by their corresponding median temperature used in training.
The model framework discussed above dictates that once a set of conductances are specified, the model can predict the performance parameters for a given set of input parameters. This perspective is the basis for using a genetic algorithm model. In such a strategy, we want to find the set of set of (UA e,a , UA e,b , UA c) that best fit the data set. The genetic algorithm accomplishes this in the manner described below.
Genetic Algorithm Structure
An initial population (ensemble) of solutions (sets of (UA e,a , UA e,b , UA c) combinations here) is established. Each individual is a candidate solution to the problem (a set of (UA e,a , UA e,b , UA c) values) analogous to a biological organism in a population of a specie, which is characterized by a set of parameters (variables) known as genes. Genes are joined into a string to form a chromosome (solution). Once the population is established, the following steps are iteratively executed:
(i) Fitness function determination
To begin each iteration, the algorithm computes a fitness score for each individual. Here that will be done by randomly pairing the solution with one point of the database, computing output Yi variables for that point’s input variables, and computing the RMS fractional error between the solution predicted Yi values and the values in that data point. The probability that an individual will be selected for reproduction is based on its fitness score – the lower the RMS fractional error, the higher the survival probability.
(ii) Selection of fittest to survive each generation
Individuals with a fitness score below a threshold are eliminated from the population and replaced with offspring.
(iii) Survivor offspring production with gene choice from surviving parents and mutation
In new offspring formed, some of their genes can be subjected to a random mutation with a low random probability.
Steps (i)-(iii) are repeated for successive generations until the RMS fractional error (fitness function) for population reaches a sufficiently low value.
The details of how the above steps are handled may vary somewhat in genetic algorithm applications, but the elements of the genetic algorithm generally conform to the features described above. In the example calculation summarized here, as described above, a data set of normalized performance data points were used, each having the form: [X1, X2, X3, … Xm, Y1, Y2, Y3, … Yn], where Xi are the operating conditions variables and Yj are the resulting performance parameter values.
A genetic algorithm with the features described above was used to determine the set of conductances (UA e,a , UA e,b , UA c) that minimized the mean RMS error between predictions of the model and randomly selected data points in the population of gene sets (UA e,a , UA e,b , UA c). Figure 2a shows computational convergence of the mean gene values as successive generations are analyzed.
The resulting best fit constants were used in the model equations to predict the two chip temperatures, and comparison of the predictions with data chip temperatures for the same operating conditions is shown in Figure 2b. The resulting model with the fitted constants agrees with the data to a mean absolute fractional error of about 0.075.
The results in Figure 2b are for the following genetic algorithm model determined mean (generationaveraged) values providing a best fit: (UA)e,a = 65.32 W/K, (UA)e,b = 13.44 W/K, (UA)c = 8.66 W/K. These values resulted in a fit with a mean fractional error of 0.075 (~ 7.5%).
Performance Prediction with a Conventional Neural Network Data Based Model
A conventional neural network can be trained to predict the trends in output performance parameters for prescribed input parameters using performance data like that described in the genetic algorithm model described above. In the simple example system of interest here, the input parameters are the operating conditions corresponding to the specified values of condenser inlet water temperature Tcfi and the heat rejection rates in the evaporators q̇a and q̇b. The output performance parameters are the chip operating temperatures: Tchip,a and Tchip,b. The neural network model is shown schematically in Figure 3.
Note that the neural network has three inputs, one output layer, and two layers between. The presence of these “hidden” layers is sufficient to categorize this as a deep learning model. Each neuron in the model multiplies each input it receives by an adjustable constant, sums them, and adds a biasing constant which is adjustable. The result is handed to a specified activation function. Here that is chosen to be a rectilinear exponential linear unit (RELU) function [1]. Although the network structure for this example is simple, this model has 210 adjustable parameters, which allows it to model fairly complex, non-linear, multivariate behavior in the data set. The code to set up the model and train it was set up using keras [2] and other standard python tools. This model was trained with the same data used in the genetic algorithm model described above. The trained model was used to predict the two chip temperatures, and comparison of the resulting predictions with data chip temperatures for the same operating conditions is shown in Figure 4. The neural network model fits the data to a mean absolute fractional error of 0.018 (~1.8%).
Once the model is trained, its learned knowledge of the parametric performance trends in the data is stored in its 210 neuron parameter values. It can then be used to predict performance for an arbitrarily chose set of operating conditions. This is illustrated in Figure 5. For a heat pipe condenser coolant temperature of 10°C, this figure shows the model predicted variations of the two chip temperatures as functions of the heat power generation in each of the two chips A and B. Note that for each pair of heat dissipation rates experienced by the system, the corresponding points on these surfaces predict the operating temperature of each of the chips.
Closure: Pros and Cons
The results of the example models presented here demonstrate that either of the two machine learning methodologies described here could be a useful methodology for predicting performance of the heat pipe system shown in Figure 1. Both of these are data-based models, so they both would require fabrication and testing of a prototype heat pipe system to create the database to train the model. Here are some other pro’s and con’s of each:
Physics-Inspired Genetic Algorithm
Once trained, the genetic algorithm physics-inspired model is embodied in a set of mathematical equations with known coefficients that can be used to predict the heat pipe system performance. This model has the further advantage that it back-infers parameters in the model that might be difficult to predict from theory, difficult to measure separately, and/ or involve physical properties that are not accurately known. Knowledge of the inferred conductances may be very valuable in that they may indicate that one or more conductances are much different than expected from heat transfer theory, implying that a change in the design, material, or manufacturing process is needed to achieve the desired level of performance. Also, if operating conditions and performance are monitored during field operation, this heat pipe model can be retrained to reflect shifts in performance due to condenser water side fouling or shifts in contact resistance at the chip-evaporator interface. This can facilitate model-based control of the condenser cooling water flow rate and inlet temperature for appropriate chip cooling. A drawback of this type of model is that it assumes that the conductance parameters are fixed constants for all of the operating conditions of interest, which may not be fully accurate. Adjustments the model to account for parameter variation wit operating conditions could be made.
Artificial Neural Network Model
Once trained, the neural network model can be used to predict performance for the heat pipe system, but to do so, the neural network model code must be used with the best-fit values of neuron weights and biases determined in the training. This model can fit trends in the data with a higher degree of adaptability than the specific model relations used in the genetic algorithm model. However, this type of model does not provide an explicit mathematical relation for the performance as a function of operating condition. Consequently, to define the parametric trends for the system, the model must be used to generate predictions over the parameter space of interest and the results must be analyzed to assess the trends.
While only two types of machine learning tools have been explored here, the example results and observations reflect the potential usefulness and advantages of using tools of this type to enhance development of better-performing systems, and/or facilitating adaptive control of heat pipe based thermal control systems for electronics applications.
References
[1] https://keras.io/api/layers/activations/
[2] https://keras.io/guides/