The following transcript and video were originally presented at Thermal LIVE by Rehan Khalid. For access and information on all Thermal LIVE events, both past and present, please visit https://thermal.live/.
Today we’re going to be looking at the work I did for my master’s, titled Transient and Steady State Thermodynamic Modeling of Modular Data Centers. So what we’re going to be looking at today is basically introducing modular data centers, what they are and why use them, then the choice of the model data center that we have modeled and the modeling tool used to do that. Then getting into the brick, work, the nuts and bolts and looking at steady state thermodynamic modeling of these MDCs or model data centers, then their transient modeling, and finally, we’re going to be rounding up and looking at the conclusions, what are the key takeaways from all of this?
So what’s the need for using modular data centers and what is actually this problem all about? So as of today, they are roughly one and a half billion active Facebook users, and they’re about four and a half billion Facebook likes generated daily. Similarly, there are about 300 million Twitter users and they’re tweeting about 500 million times every day. That’s a lot of tweets. And similarly, there are about three and a half billion searches on Google every day, which translates into roughly 40,000 searches every second. And 90% of all of this data has been created in just the past few years, which is really remarkable. And this has all been possible using the millions of data centers worldwide.
Now, what is the biggest issue with using these data centers? One is their cost. Next is the time of deployment. Next is their high energy consumption leading to their high PUE values their power usage effectiveness values, and lastly, their environmental impact, the carbon emissions from all the energy that they’re consuming. That energy is being produced in power plants, which are then getting off these emissions. And just a statistic, in 2013 alone, US data centers consumed roughly 91 billion kilowatt hours of energy. And this number continues to double every seven years, so imagine where we’ll be at, by say the year 2050. And so this is a problem for the modern-day and which needs to be addressed. So what’s a possible solution going forward. So a possible solution is to use the modular approach, but using these modular data centers. And why would we want to use these modular data centers is because they cost less, they cost a few million dollars worth of several million dollars as compared to their regular brick and mortar counterparts.
Similarly, their build time is on the order of a few months as compared to like a couple of years for their regular data centers. And most importantly, I think their PUE value from a technical point of view, their PUE view values are lower and much more controlled. You have much more control over them. And lastly, they’re scalable. So you can add these modular data centers and actually go on to increase your data center size capacity. So all in all it boils down to cost, capital expenditures and taxes for something that you’re building now and using later on in case of regular data center versus something say a modular data center where you’re building, you get a turnkey product from the factory and you install it onsite in a couple of weeks and you’re good to go.
So in today’s talk, we’re going to be looking at a type of data centers model. They’re the centers which are known as CDCs or container data centers. And the one specifically we’re going to be looking at is called the all-in-one type. And as you can see from these pictures, the all-in-one type is where you have the power and networking as well as the IT equipment, as well as the cooling equipment, all housed into one pod into one data center. And these are like a special class of fully pre-configured data centers. Like I said, they’re turnkey products so you can purchase them and just have them delivered and installed onsite and they’re good to go.
The cooling techniques used in these data centers and these modular data centers, or just MDCs for short, are similar to what we use in the regular data centers. So we can seek cooling techniques such as conventional DX direct expansion cooling, and then passive cooling techniques, such as evaporative cooling and free air cooling. These are techniques use in render data centers, as well as these modular data centers.
Now, what are the objectives of today’s talk? So we’ve been looking at how to model these modular data centers using a tool called energy plus, specifically looking at hot and cold aisle modeling as well as CRAC modeling. We’re going to be analyzing cooling load requirements, so the requirements of the HPAC system using different cooling techniques, such as DX cooling, and other passive cooling techniques. And lastly, we’re going to be looking at a transient server modeled implemented into energy plus to study dynamic loads shifting or fluctuating server loads.
So with that in mind, let’s look at the data center that’s been used for this study. And we chose to go with Huawei’s IDS 1000A, A as an all in one data center, and you can see a picture of it here. So essentially these are built out of regular containers, shipping containers, and they tend to come in two sizes, either 40 feet, long or 20 feet long. This here is the 40 feet long container. It’s made out of corrugated steel on the outside. And when you convert this into a data center, what they actually do at the factory is they add fully written insulation on the inside and add all the equipment and the bells and whistles. So for this particular data center, you have five sets of racks and you have three hot aisles and two cold aisles.
On the right you can see the networking and power equipment. And you can see on the left are located the cooling units outdoor units. So they’re actually located outside the physical, outside the physical container and this rounds up your entire data center. So an entire data center in this one box. And the modeling tool that we’ve used for this study is we’ve chosen to go with EnergyPlus, which is an open-source software and managed by NREL the National Renewable Energy Lab. It acts as the primary simulation engine for all our simulations. And in terms of technical capabilities, it’s better than some other commercially available software out there, such as the DUE 2.2. Another software that we used to model the geometry of the building, what you saw on the previous slide is SketchUp. So SketchUp is another free verse software provided by Google and it’s used by architects and engineers to create 3D models of buildings. And you can create a model in SketchUp for your data center and then actually import it into EnergyPlus.
And with that, let’s look at the locations chosen for data center activity. So as you can see on this map of the US, Golden, Colorado and Chicago, Illinois are chosen as the primary areas of data center activity. These are chosen due to their favorite climate, such as low outdoor air temperatures and humidity, which allows using passive cooling techniques, such as evaporative cooling and free air cooling. Whereas Phoenix in Arizona and Tampa in Florida way down south are chosen as control locations in order to compare the results to the two previous locations. And on the left, you can see the variation of outdoor air temperature for all of these four sites. Clearly, Phoenix and Tampa have much higher outdoor temperatures as compared to Golden and Chicago.
Moving on. Now, let’s look at some steady state modeling of how we’ve modeled this data center in EnergyPlus. So the racks and servers are modeled as plug loads. Let’s just bear in mind that the version of EnergyPlus that I used back in the day, it wasn’t specifically tailored towards data centers, and you had to make a lot of adjustments modeling-wise in order to be able to model all of these. So the racks and servers, you don’t have specific models and they’re modeled as plug loads. The crack units are actually modeled as DX air conditioners. So they’re modeled as comfort control rather than precision control. And the air-cooled condensers are located outside the thermal zones like we saw on the diagram on slide number 10, they’re located outside the thermal zones and hence they don’t take any part in the actual simulation. And finally the hot and cold aisles are modeled as hot and cold zones respectively. So they’re modeled as two separate zones, and it’s the density difference between these two zones that drives inter-zone air transfer, and that way you can effectively model hot and cold aisles.
And here on slide number 12 actually we can look at a thermal model of the data center that we have modeled. As you can see on the extreme left and right, you have the neutral zones shown in green here, you have your power networking equipment, as well as your outside air units. And then towards the middle, the pink zones show your hot aisles whereas the light blue zones show your cold aisles. And you can also see the CRAC units and the air flowing through them, the blue arrows showing cold air coming out of the CRAC units and into the servers and the red arrows showing hot air going out of the back of the servers and back into the CRAC units. And this way you can set up the terminal model for your data center and simulate it.
For reference purposes, we’ve chosen to use Ashrae class A3, for comparison of our results as you can see on this height chart. This is class A3 given by this boundary. It’s an Ashrae class specifically tailored for Datacom equipments such as servers and workstations and the parameters it provides are a dry bulb temperature of anywhere between five and forty degrees Celsius, a wet bulb temperature of two to twenty-eight degrees Celsius and relative humidity, anywhere between eight and eighty-five percent. So that’s quite a generous range to keep your equipment in and also to minimize your energy consumption.
Now to round-off our steady state modeling, let’s look at some DX cooling results. So on the left, you can see the HVAC power consumption for the DX cooling case for all four locations and as expected Chicago and Golden tend to provide the least energy consumption, whereas Phoenix and Tampa have the highest HVAC power consumption. And similarly, this leads to lower PUE values for Chicago and Golden versus Tampa and Phoenix.
So to summarize the steady state modeling, here we can see a comparison of all three cooling schemes, the DX cooling, which is the base case for comparison with the passive cooling techniques, such as the direct evaporative cooling and the free air cooling. And from the start, we can clearly see that if we compare the HVAC power consumption of all three cooling cases, direct evaporative cooling leads to the lowest energy consumption followed by free air cooling. And you can save up to 38% in HVAC power savings by using the direct evaporative cooling scheme. And up to 36% HVAC power can be saved by using free air cooling as compared to the base DX case. And this will eventually lead to lower PUE for your data center.
Now we get into the transient modeling part and here what we’re looking to do is to essentially incorporate a transient scheme for server and exit air temperature into EnergyPlus, taking into account the server thermal mass. And this is actually based on the capacitance-effectiveness model for a server developed at Syracuse University, where they model the server as a single stream heat exchanger with heat transfer effectiveness Epsilon. And since EnergyPlus doesn’t have this capability inherently built in, what we actually choose to do is that we couple EnergyPlus with Matlab using a software called MLE+ Matlab EnergyPlus. And this allows us to co-simulate this entire system using a master slave scheme in which EnergyPlus acts as a master, as the primary simulation engine.
And it exchanges data with Matlab where you actually have the Scranton server model and EnergyPlus sends in parameters, like the server and the air inlet temperature, the air mass flow rate, the server power dissipation and the server fan rise in temperature. MATLAB processes all of this data and add the EnergyPlus timestep and sends it back data, which is the server exit air temperature and the server temperature in itself. And so now having these data, you can go on to make your calculations and do your analysis.
So what we found out is that if we look at a five-hour simulation with a single rise and fall in server load, we see that we get up to like 660 Watts of power savings if server temperature profile is fallen by the HVAC system, when demand goes up and when the demand goes down, you instantly drop your HVAC cooling load. Versus, you just save up to 24 Watts of power if you ramp up and ramp down in accordance to the server temperature. And what this translates into is annual energy savings of roughly 1156 kilowatt hours, in the first case, and only 42 kilowatt hours in the second case, when you have instant ramping up, when you have slow ramping up and slow ramping down. At 8 cents per kilowatt hour this translates into financial and monetary savings of just $92 in the first case, and just $3 in the second case. So what we can essentially gauge from this is that there is no specific need to take into account the server thermal mass, and no need to ramp up or ramp down cooling according to the server power, according to the server temperature profile.
So to summarize this, basically what we’ve looked at today is we’ve looked at what modular data centers are and their advantages over regular data centers. Then we looked at a Huawei data center and we’ve seen how to model this using EnergyPlus. Thirdly, we’ve looked at EnergyPlus steady-state models for a container data center using different cooling schemes, such as mechanical cooling using direct expansion cooling and passive cooling using direct evaporative cooling and free air cooling. And lastly, we’ve incorporated a transient server model into EnergyPlus to study the effect of varying HVAC systems power according to the server temperature profile.
And what we conclude from this as the key takeaways are that in steady state, you get significant HVAC power savings if you use passive cooling techniques as compared to your mechanical cooling DX cooling. And in the transient case, the key takeaway is that there is no need to follow the server temperature profile. You can instantly ramp up and instantly ramp down cooling, and it will not eventually lead to a significant amount of power being used. And if you actually choose to ramp up and ramp down as per the server temperature profile, there is no significant energy or power savings to be had, and hence it can be neglected. And lastly, I’d like to acknowledge first and foremost, the national science foundation for partly funding the study, and further my advisors, Dr. Joshi from the Woodruff School at Georgia Tech and Dr. Wemhoff from the department of mechanical engineering at Villanova University. And I thank you all for lending a listening ear and I’m open to any questions.
Graham Killshaw:
Hi, Rehan. Thanks for that presentation. We’re going to see if our attendees have any questions for this presentation. Now is the time please to place them. Simply use the Q and A button at the bottom of your screens. I do have one question that’s come in Rehan, I’m going to ask here. You mentioned that EnergyPlus is open source with a wide developer and helping community, but not intended for data center modeling. Can you elaborate on that please? You know, the inherent issues with using EnergyPlus for modeling data centers in case others in the audience might be interested in using that tool?
Rehan Khalid:
Sure. So if we go back and look at slide number 13. So basically what we’re seeing here is how we’ve modeled the data center using EnergyPlus. Since EnergyPlus at that time, didn’t have these inherent capabilities built into it, so the racks and servers are modeled as, like I said, plug loads, but in order to incorporate different characteristics or features such as the exit air temperature and the inlet air temperature, you actually need to come up with your own models. And like, in my case, since I’m for the transient case I needed the exit air temperature and the server air temperature, I chose to co-simulate it using MLE+, which is a very handy tool to use, especially if you’re using EnergyPlus.
Similarly for the hot and cold aisles, in order to model those, you can model them as essentially hot and cold zones, but in order to separate them, the way I did it was I used an iron wall, so it’s an infrared wall, which physically separates out the two zones. So it actually models it as a physical divide without actually hindering the simulation. So it provides no mass, no volume, but yet again, it allows for thermal exchange between the two zones. So you really have to dive into these things in order to be able to effectively model them.
Graham Killshaw:
Okay. We have another question here from the audience about EnergyPlus, I think you’ve kind of touched on this, but maybe you could touch on it again please Rehan. This question comes from [inaudible 00:21:46] . He says, EnergyPlus, what type of simulation is it and what are the limitations of modular data centers in the IT industry?
Rehan Khalid:
So EnergyPlus is basically a successive and iterative simulation tool. It uses the [inaudible 00:22:12] if you’re aware of this. It uses algorithm for iterating between your HVAC loads and between your thermal loads and any other loads that you may have. And it sequentially and simultaneously iterates all of your loads until convergence is achieved based on the tolerance that you’ve specified for your simulation.
And as far as modular data centers go, the biggest thing as far as I’m concerned and what draws it to the industry is the fact that they’re scalable. So you can actually set up pods as for your needs. So at one point in time, if you want to provide a certain amount of server usage, you can just have that many servers, and it’s not that you built a regular data center and you have so many servers sitting idle, even just have as many servers as you need. And once your demand grows, you can just scale it up, you can add further pods, you can connect them. So you can transfer power, as well as HVAC cooling between those pods and that adds the extra data center capability to your firm. And you can just scale it up this way as much as you need. So I think it’s very promising as far as IT firms are concerned.
Graham Killshaw:
Okay. Thanks, Rehan for sort of touching on that again, and these questions came from Sreshale, I hope I pronounced your name correctly. We will have all of the Q and A posted on the thermal live website shortly after the event. I think we’ve got time for one last question here, Rehan. In the transient modeling section. Can you briefly elaborate the transient server model used in this study please?
Rehan Khalid:
Yes. So I actually have a slide on that in the appendix and it should be available on your screen now. So as you can see here, you have a mathematical model for the server. As I alluded to earlier in the presentation, we chose to model the server as a single stream heat exchanger. So you basically have the server as a hot be of electronics so it’s basically a heat source and you have cooling air flowing through the server, so mathematically that’s essentially what’s happening.
It’s a black box, one source of heat, air flowing through it at a certain mass flow rate with a certain specific heat and goes in at a certain temperature, comes out at a higher temperature. It absorbs the heat and you basically do an energy balance for the server and energy balance for the air stream. And then you just coupled them up. You equate them so you say basically that the heat loss from the server is equal to the heat, gained by the air, neglecting the thermal stories of the air and by equating them. You can actually come up with these two constants, which you can see at the bottom of the screen. This K and tall, tall is basically the server time constant and K is basically representing the server thermal capacitance. And using these you can model your server for transient loads.
Graham Killshaw:
Okay. Thanks.
Rehan Khalid:
You’re welcome.
Graham Killshaw:
That’s all we have time for in this presentation. Thank you everybody for logging on today and listening. I hope you enjoyed the presentation. A big thanks to Rehan Khalid from Villanova University. Very, very informative presentation. Very interesting. Attendees, you’ll receive an email within 24 hours with a link to the on demand version of this presentation so that you can watch it again. You can continue to submit questions if you wish and we’ll post all of the answers on the thermal live website. Don’t forget, there are several more presentations coming up this afternoon on heat sink design, PCB design, and Thermoelectric device design. So if you wish to watch those jump over to the thermal live website and register quickly, they’re going on this afternoon. I hope you enjoy the rest of thermal live 2017. Thank you and goodbye for now.
Rehan Khalid:
Thank you. Bye bye.