Server manufacturers and data center managers are showing a greater concern regarding the energy efficiency and cooling of the new generation of servers for data centers. With very large data centers exceeding 100 000 servers, some even consuming more than 50 MW [1] to operate, this electrical energy is directly converted to heat and then simply “wasted” as it is dissipated into the atmosphere.
A recent solution to this “energy crisis” adopted by thermal designers of data centers is the confinement of the air cooled servers inside of racks with air-to-water cooling coils for heat removal, as an attempt to maximize the cooling performance and to reduce the overall thermal resistance between the chip and the external environment. Another solution is relying on the use of outside cold air and/or water for cooling (i.e., free cooling [2]), which is highly dependent of external environment conditions, and requires additional components, such as filters, ducts and fans, dampers, etc. This solution requires high levels of specialized controllers, continuous maintenance and is susceptible to errors [3].
A long-term solution is to upgrade to on-chip two-phase cooling [4], which besides providing very high cooling performance at the chip level without requiring a heat spreader with a large footprint, also eliminates the poorly performing air as a coolant all together [5, 6] and adds the capability to reuse the waste heat in a convenient manner, since higher evaporating and condensing temperatures of the two-phase cooling system (evaporating its dielectric refrigerant at the chip at temperatures up to 60°C whilst still maintaining the chip comfortably below 85°C) are possible with such a new green cooling technology.
Single-phase (water) on-chip cooling technologies have been implemented in new supercomputers, showing reductions in power consumption up to 45% when compared with air cooling technologies [7]. On-chip cooling has also yielded a significant increase in computing performance in terms of computing throughput (lower chip temperature, lower gate current leakage, lower voltage and higher frequency) and computing throughput per electrical energy use. Thus, the appeal here is to improve even more the computing performance using two-phase on-chip cooling, which due to the latent heat of the coolant, removes much higher heat fluxes while requiring smaller coolant flow rates than in the single-phase cooling [8]. Better temperature uniformity across the chips is also achievable.
In the present work (condensed version of the paper presented in [4]), two such two-phase cooling cycles using micro-evaporation technology were experimentally evaluated with specific attention being paid to energy consumption, overall energetic efficiency and more specifically controllability. The cooling cycles were comprised of a tube-in-tube counter flow condenser (heat rejection), two parallel micro-evaporator (ME)/pseudo chip packages (mimicking the cooling of the chips of blade servers) and a stepper motor valve (SMV) at the inlet of the MEs for flow control reasons. The two alternative drivers tested were a mini-vapor compressor (VC cycle) and a gear pump (LP cycle). Additionally, two internal heat exchangers were considered in the VC cycle to guarantee subcooling at the inlet of the MEs and superheating at the inlet of the minicompressor.
Figure 1 shows the multi-purpose test bench constructed to experimentally evaluate the performance of these cooling systems under various typical blade server operating conditions of transient, steady-state, balanced and unbalanced heat loads on the system’s two pseudo CPU’s. More details about the different cooling loops can be found in [9]. Since limited experience is available on two-phase cooling flow control for servers, this was the major objective to demonstrate here, implementing simple controllers.
FLOW CONTROL
The operational goal here is to maintain the chip temperature below a pre-established level by controlling the inlet conditions of the micro-evaporator cold plate (pressure, inlet subcooling and mass flow rate). Futhermore, it is imperative to keep the ME’s outlet vapor quality below that of the critical vapor quality, which is associated with the critical heat flux (premature dry out in the channels). Notably, the coolant flow rate is modulated to control the exit vapor quality to a target value and thus match the heat load fluctuation during the chips’ operation. Hence, greatly reduced energy consumption of the driver during normal operation and providing low energy consumption during standby operation and also off capability when the server is not in operation for even greater energy savings.
The condensing pressure must also be controlled since it sets the saturation temperature of the coolant. If the aim is to recover the energy dissipated by the coolant in the condenser to heat buildings, residences, district heating, pre-heat boiler feedwater, etc. (here represented by a thermal bus), this can be achieved using either a compressor (VC cycle) to reject the waste heat at a higher temperature (at the cost of higher energy consumption) or using a pump (LP cycle) to reject the heat at the ME’s exit saturation temperature, both without requiring any refrigeration chiller. Otherwise, using a pump as the driver, the ME’s exit saturation temperature can be modulated to follow the outside air temperature for heat dissipation into the ambient air via a compact air-cooled heat exchanger (viz. Figure 2).
For the experimental evaluation, specific controllers were first designed and tested [9]. The variables controlled here were the ME’s outlet vapor quality, the condensing pressure (LP cycle) and the approach temperature in the condenser (VC cycle). The actuators used were a variable stroke length oil-free vapor compressor, a variable speed condenser water pump and an electronically controlled stepper motor valve (over-dimensioned to modulate the refrigerant mass flow with a negligible pressure drop).
Two ME’s in parallel (typical for blade server boards) assembled on two pseudo chips to emulate actual ones, each composed of 35 heaters and temperature sensors (2.5 mm by 2.5 mm in size made from a Delphi thermal test die), were used. The ME’s copper microchannel geometry consisted of 53 parallel channels having a height of 1.7 mm and a width of 0.17 mm, with the fins between channels being 0.17 mm thick. The effective “footprint” area of the ME’s is 12 mm length from inlet to outlet and 18 mm width. In the present work only uniform heat fluxes were considered and HFC134a (a common refrigerant that is a dielectric fluid) was tested as the working fluid.
Finally, for the present experimental campaign, only one SMV was considered for modulating the flow to both MEs. The outlet vapor quality used for control was that at the exit after both flows from the ME’s are mixed. The condenser used water as the secondary fluid, where the driver was a controllable speed gear pump.
EXPERIMENTAL RESULTS
Experiments for set point tracking (for each controller developed), disturbance rejection and non-uniform heat load (last two considering the developed controllers integrated / dual SISO, SISO and SIMO strategies) were developed and a short description is presented below. More details regarding the development of the controllers can be found in [9]. The results presented are for the LP cycle, but the authors highlighted that similar results were obtained with the VC cycle.
A. Flow distribution for unbalanced heat loads
The experimental results showed that for different heat loads applied on the parallel ME’s an unbalanced flow exists, which generated a higher temperature on the pseudo chip with higher heat load. Temperatures of 75 °C against 60 °C were obtained when the difference in heat load was 60 W (90 W on ME1 against 30 W on ME2, respectively, emulating the maximum and idle clock speeds of real microprocessors). Despite this, it is important to mention that the temperatures obtained were lower than the typical CPU operating limit of 85 °C and that the difference of temperatures was reduced when the set point of the outlet vapor quality was reduced from 22% to 15% (viz. Figures 3 and 4). As can be seen, a total of eight different combinations of heat loads and three outlet vapor qualities were evaluated.
Regarding the controllability, the cooling systems were found to be fast and effective, controlling the condensing pressure or the secondary fluid temperature (more details in [9]) and the outlet vapor quality at the defined set points under steady state and transient conditions of heat load, and hence indirectly the chip temperature.
B. Heat load disturbance rejection tests
The heat loads on ME1 and ME2 were varied between 90 W and 75 W and 75 W and 60 W, respectively, with a periodic disturbance time of 1.4 s (emulating a fast and periodic change in the pseudo-microprocessors’ clock speed). Figure 5 shows the input power disturbance on the pseudo chips and the effect on the average temperature of each chip. The maximum temperature variation is only 1.5 °C, which is acceptable when compared to the temperature gradient along the chip for on-chip single-phase cooling using water (about 2-3 K for a uniform heat flux and without heat load disturbance [10]).
Figure 6 shows the controller’s reaction under the situation of a disturbance. It can be seen that the SMV controller was able to maintain the exit vapor quality to within ±5% of the set point. What is important to observe is that the controller was effective, i.e. it showed fast response for the induced disturbance and no instability was observed.
Finally, it can be highlighted that the control strategies adopted (SISO, dual SISO and SIMO) were simple but still effective for controlling the specific variables while maintaining the pseudo chips within a safe operating range. In fact, this is done without a temperature signal from the chip, which is very convenient because of the limited bandwidth available on actual CPU’s.
C. Energy comparison
To compare the performance of the liquid pumping and vapor compression cooling systems, which were experimentally evaluated and analyzed beforehand, a steady state condition was selected from the flow distribution tests.
Table 1 shows the results for the power consumption of the drivers, the two systems’ input and output energies associated with components and piping, and the thermodynamic conditions in the condenser for the main and secondary working fluids. The experimental condition selected for the comparison was that the input powers on pseudo chips 1 and 2 were 90 W (41.7 Wcm-2) and 75 W (34.7 Wcm-2), respectively.
The results show a higher driver input power for the VC system, about 6 times, which naturally is associated with the energy expended to lift the pressure from the ME’s to the condenser. If one compares the results with a hypothetical air cooling system, considering a COP of 1.22 (45% of the total energy consumption for air cooling system [11, 12] ), the energy consumption would be 134.8 W versus 17.4 W when compared with the LP cycle and 237.8 W versus 102.1 W when compared with the VC cycle. This represents a reduction of 87% and 57% in energy consumption, respectively. The differences in air cooling system energy consumption are due to the input power on the post heater (which emulates the heat load of auxiliary electronics of servers, i.e. memories, DC/DC converters, etc.), which was only considered for the VC cycle (viz. Table 1). A pump or compressor optimized for this application would consume much less than 17.4 W and 102.1 W, probably less than one-half.
It can also be seen that 50.6% and 62.5% of the energy out of the VC and LP systems, respectively, are associated with heat losses. It shows that improvements can be done to improve the overall performance of the system, which would mainly be associated with the reduction of the driver and piping losses and, consequently, to increase the energy recovered in the condenser. The test bench here is a “plug-and-play” unit designed for versatile testing of components and flow control, not an optimized compact system.
The results showed a much higher temperature for the secondary fluid at the outlet of the condenser when using the VC system, which is related to the higher condensing temperature. This implies that a higher economic value is obtained for the waste heat available in the condenser. In Europe in particular, many cities have district heat lines (even the small city of Lausanne) and they are potential consumers for the waste heat.
CONCLUSIONS
The present study has demonstrated that simple control schemes are sufficient for management of two-phase on-chip cooling systems for servers, that the cooling is very effective and rapidly responds to step changes in heat dissipation rates, and that this technology provides a low energy consumption relative to air-cooling.
BIBLIOGRAPHY
Dr. Jackson Braz Marcinichen is a research post doc at the
Laboratory of Heat and Mass Transfer at the EPFL (Lausanne-Switzerland) and
has more than 20 years experience in HVAC & R systems. He received his BE in Mechanical Engineering from the Federal University of Santa Catarina, Brazil in 1996, and his Ph.D. in mechanical engineering from the same university in 2006. He has authored more than 30 scientific and technical papers in indexed journals and international peer-reviewed conferences, book chapters and US patents. He has designed and evaluated several experimental facilities characterizing the thermo-hydrodynamic and control of cooling systems (calorimeters, wind tunnel, hybrid systems etc). Today he is engaged in the development of new novel hybrid cooling systems (passive and active) to cool high heat flux electronics components using on-chip cooling.
Prof. John Richard Thome has been working at the Swiss Federal Institute of Technology Lausanne (EPFL) since 1998, where he is a director of the Laboratory of Heat and Mass Transfer (LTCM) and the director of Doctoral Program in Energy (EDEY). He received his Ph.D. in mechanical engineering at Oxford University in 1978 and worked as an assistant/associate professor in the US for five years at Michigan State University. He worked full-time as a consulting engineer for 15 years from 1984 through 1998 with his own firm. He has more than 170 journal papers and four books since joining the EPFL. His current main areas of research are two-phase flow and heat transfer in microchannels, two-phase flow control for electronics cooling using new hybrid cooling cycles; using either speed control of oil-free pumps and compressors or passive systems such as thermosyphons, and energy recovery systems.
ACKNOWLEDGEMENTS
Wolverine Tube Inc. (Huntsville, AL) provided MicroCool cold plates to our specification while Embraco (Joinville, Brazil) provided the linear oil-free mini-compressor.
REFERENCES
1. Marcinichen, J.B., Olivier, J.A., Lamaison, N., and Thome, J.R., Advances in Electronics Cooling. International Journal of Heat Transfer Engineering, 2013. Vol. 34(5-6): pp. 434-446.
2. Pawlish, M. and Varde, A.S. Free Cooling: A Paradigm Shift in Data Centers. in 5th International Conference on Information and Automation for Sustainability (ICIAFs). 2010.
3. Mulay, V., Humidity Excursions in Facebook Prineville Data Center, in Electronics Cooling. December 2012.
4. Marcinichen, J.B. and Thome, J.R. Two-Phase Flow Control of On-Chip Two-Phase Cooling Systems of Servers. in The 29th Annual Thermal Measurement, Modeling & Management Symposium SEMI-THERM 29. 2012. San Jose, CA, USA.
5. Samadiani, E., Joshi, S., and Mistree, F., The Thermal Design of a Next Generation Data Center: A Conceptual Exposition. Journal of Electronic Packing, 2008. Vol. 130: pp. 041104-1 – 041104-8.
6. Patel, C.D. A Vision of Energy Aware Computing from Chips to Data Centers. in The International Symposium on Micro-Mechanical Engineering – ISMME2003-K15. December 1-3, 2003. Tsuchiura and Tsukuba, Japan.
7. Campbell, L. and Ellsworth Jr, M.J., Back to the Future with a Liquid Cooled Supercomputer, in Electronics Cooling. August 2009.
8. Marcinichen, J.B., Olivier, J.A., and Thome, J.R., Reasons to Use Two-phase Refrigerant Cooling, in Electronics Cooling. March 2011. p. 22-27.
9. Marcinichen, J.B., Olivier, J.A., Oliveira, V., and Thome, J.R., A Review of On-Chip Micro-Evaporation: Experimental Evaluation of Liquid Pumping and Vapor Compression Driven Cooling Systems and Control. International Journal of Applied Energy, 2011. Vol. 92: pp. 147-161.
10. Brunschwiler, T., Meijer, G.I., Paredes, S., Escher, W., and Michel, B. Direct Wast Heat Utilization from Liquid-Cooled Supercomputers. in 14th Int. Heat Transfer Conference. 2010. Aug. 8-13, Washington, DC, USA.
11. Joshi, Y. and Kumar, P., eds. Energy Efficient Thermal Management of Data Centers. 1st ed. 2012, Springer New York Dordrecht Heidelberg London.
12. Rasmussen, N., Electrical Efficiency Measurement for Data Centers – WP 154 revision 2. 2010, American Power Conversion by Scheider Electric. p. 1-19.