Veerendra Mulay
Facebook Inc.
Facebook’s data center in Prineville, OR, has been one of the most energy efficient data center facilities in the world since it became operational [1]. Some of the innovative features of the electrical distribution system are DC backup and high voltage (480 VAC) distributions, which have eliminated the need for centralized UPS and 480V to 208V transformation. The built-in penthouse houses the chiller-less air conditioning system that uses 100% airside economization and evaporative cooling to maintain the operating environment. These features have enabled significant reduction in energy consumption of the data center, which is reflected in Power Usage Effectiveness (PUE) of the facility. The PUE is defined as the ratio of total energy consumption of the data center to total energy consumed by IT equipment. The PUE of the Prineville data center is 1.07 at full load, which was verified during commissioning.
The data center design:
The data center is a three-story building. The first floor holds the data hall and office space, along with the receiving yard and storage area. The second floor houses a large plenum for hot return air. The third floor is a built-up mechanical penthouse that holds the air handling equipment line-ups. These line-ups are divided into the intake corridor, the filter room, the Evaporative cooling/Humidification (EC/H) room, the fan-wall room, the supply corridor, and finally, the exhaust corridor. The airflow path is shown in figure 1.
Figure 1: Side view of the data center indicating Airflow path [2]
The outside air enters the intake corridor. It is then introduced into the filter room, which acts as a mixing chamber. There are motorized dampers for outside air (top) as well as return air (bottom). Depending on temperature and humidity of the outside air, these dampers modulate to vary the proportion of outside air and return air. This mixed air then exits the mixing chamber through a filter wall. The filter wall consists of a 2” pleated pre-filter followed by a MERV (minimum efficiency reporting value) 13 filter. The mixed air enters into the EC/H room. The EC/H sprays a fine mist into the mixed air stream. This misting allows humidification of the mixed air or when required, cooling of mixed air. Multiple cooling stages are modulated based on the temperature and humidity of the supply air. The sprayed air then passes through a mist eliminator media which arrests any water molecules that are not evaporated, thereby preventing a moisture carry-over. The extra water is collected in drain pans and returned to the water loop for further processing and recirculation. After the mist eliminator, the air goes through a fanwall and is delivered into data hall via supply shafts.
In the data hall, the cabinets are laid out in hot aisle/cold aisle arrangement. Server fans, aided by a pressure differential, pull an afflux of cold air over the motherboards, and exhaust the air into contained hot aisles. The return air then rises to the return air plenum. This containment avoids the recirculation of hot air and the bypass of the supply air.
From the return air plenum, the hot return air is introduced into the mixing chamber if the outside air conditions dictate it. The modulating dampers determine the quantity of the return air for mixing and the remainder of the hot return air is rejected to the atmosphere via relief fans in the exhaust corridor. In typical operation, these fans remain idle. During winter, the hot return air is used to partially heat the office space.
The design is based on the 50-year weather data collected at Redmond, OR, which is the closest weather station to Prineville. The maximum dry bulb (DB) temperature recorded in this data was 40.9°C (Fig. 1), whereas the maximum wet bulb (WB) temperature recorded was 21.3°C. The winter extreme condition was recorded as -0.6°C dry bulb temperature at 50% relative humidity (RH). This climate is advantageous for outside air and evaporative cooling; the coincident wet bulb temperatures are low when the dry bulb temperatures tend to be high, allowing free cooling most of the year and efficient use of evaporation when needed [3].
The sequence of operation:
The psychrometric chart as indicated in figure 2 is used to plot the state of air by using any two known properties such as dry bulb (DB) temperature, wet bulb (WB) temperature, dew point (DP) temperature, relative humidity (RH), humidity ratio etc. There are eight distinct operational regions as shown in Figure 2, which cover all possible outside air conditions. The sequence, in which the air handling line-ups respond while in those regions, is as follows.
Region A (< 11.1°C WB and < 5.5°C DP): When outside air conditions lie within this region, the target supply air dry bulb temperature is 18.3°C. The outside and return air dampers modulate to mix both airstreams. If required, the EC/H system stages on to provide the necessary humidification for maintaining wet bulb temperature of the supply air at 12.2°C and the dew point temperature at 5.5°C.
Region B (>11.1°C WB and < 5.5°C DP): This region calls for 100% outside air. The return air dampers are completely closed and the outside air dampers are fully open. EC/H stages on to provide the required humidification or cooling. The supply air dry bulb temperature is maintained between 18.3°C and 26.6°C while the dew point temperature is maintained at 5.5°C.
Region C (> 18.3°C DB and > 5.5°C DP and < 26.6°C DB and < 15.0°C DP and < 65% RH): In this region too, the return air dampers are completely closed and the outside air dampers are fully open. 100% outside air is admitted. The EC/H system remains off, since no evaporative cooling or humidification is required. The outside air is delivered into the data hall “as is” (after filtration).
Region D (> 26.6°C DB and > 5.5°C DP and < 18.9°C WB): The economizer is at 100% in this region as well, meaning that outside air is not mixed with return air. EC/H stages on to provide required humidification or cooling. The supply air dry bulb temperature is maintained at 26.6°C while dew point temperature is maintained between 5.5°C and 15°C.
Figure 2: Regions of operation
Region E (> 26.6°C DB and > 5.5°C DP and > 18.8°C WB): Once more, the dampers modulate to bring in 100% outside air. EC/H stages on to provide the required humidification or cooling. The supply air dry bulb temperature is maintained at 26.6°C while dew point temperature is kept above 15°C.
Region F (< 26.6°C DB and > 15.0°C DP and < 21.2°C WB): In this region, the dampers modulate to mix outside air with return air to increase cold aisle temperature as necessary for reducing cold aisle relative humidity to a 65% maximum. The supply air temperature is maintained between 18.3°C and 26.6°C. The dew point temperature is kept above 15°C. The direct evaporation system is bypassed, since no evaporative cooling or humidification is required.
Region G ({> 18.3°C DB and < 15.0°C DP and > 65% RH} or {< 18.3°C DB and > 5.5°C DP and > 65%RH and < 15.0°C DP}): Again, the dampers modulate to mix outside air with return air to increase cold aisle temperature as necessary for reducing cold aisle relative humidity to a 65% maximum. The supply air temperature is maintained above 18.3°C and the dew point temperature is kept below 15°C. The direct evaporation system is bypassed, since no evaporative cooling or humidification is required.
Region H (Unacceptable OA conditions): When outside air is inadmissible to the datacenter (such as excessive smoke or dust particulates in the air), the external dampers are shut.
Humidity Events:
Although these features have resulted in high efficiency, we have learned some lessons along the way [4]. The rapid changes in temperature and humidity of the outside air between day and night have presented challenges to control the air handler line-ups in a manner where they are “fighting” with each other. For example, if outside air dampers of one line-up are at 70% open position, the adjacent line-ups would have their outside air dampers at 20-30% open position. This alternate modulation or fighting often lead to stratification of air streams.
Another issue was an error in the sequence of operation controls that led to complete closure of the outside air dampers, causing the one-pass airflow system to function like a recirculatory system. The problem began to manifest in late June, 2011 as outside air conditions started changing rapidly. The economizer demand signal began responding to the changes; that’s when the erroneous control sequence drove economizer demand to 0, leading to complete closure of the outside air dampers. Thus the data center was recirculating the hot exhaust air at high temperature and low humidity. The evaporative cooling system reacted to this high temperature and low humidity. It staged on to 100% spraying to maintain the maximum allowed supply temperature and dew point temperature. This resulted in cold aisle supply temperature exceeding 28°C and relative humidity over 95%. The Open Compute servers that are deployed within the data center reacted to these extreme changes. Numerous servers were rebooted and few were automatically shut down due to power supply unit failure.
Figure 3: Failed components in power supply unit
The high temperature and high humidity supply air caused condensation on the concrete slab floor (because concrete has high thermal mass and was in contact with much cooler supply air for a long time). Similarly, upon investigation of the failed power supply units (figure 3), it was observed that the failure was condensation related.
We began investigating this failure by subjecting the server to rapidly changing temperature and humidity conditions in a controlled test chamber. The relative humidity level was raised to 97% and the temperature was ramped up from 15°C to 30°C (59°F to 86°F) in the span of 10 minutes. Under these conditions, the condensation was observed on the non-heated components. The server chassis was dripping wet as indicated in figure 4. The motherboard however showed no signs of condensation due to the fact that it always ran above the dew point temperature.
Figure 4: Condensation on the server chassis
Condensation was also evident on the surfaces of power supply components such as capacitors and inductors, as shown in figure 5.
Figure 5: Condensation on the power supply components
Figure 6 below shows the surfaces of inductors in front of capacitor 1 and the forward vertical surface of capacitor 1. We can see the water droplets formed on the surfaces of these non-heated components.
Figure 6: Inductors and capacitor surfaces
Figure 7 shows the variation in different temperatures monitored during the test interval. These are both targeted and actual values of ambient as well as dew point temperature. The surface temperature of capacitor 1 (CAP1) is also plotted.
Figure 7: Temperature variation
The plot shows that the surface of CAP1 falls below the dew point at about 6 minutes into the temperature ramp. This is exactly the same time the borescope video starts showing a slight change in the reflectivity of the component surfaces. The condensation then continues for another 9 minutes until the surface temperature of CAP1 rises above dew point. During the entire test interval, the PCB in the power supply always ran above the dew point temperature and showed no signs of condensation.
All these findings lead to the possibility of water droplets being blown onto the PCB of the power supply causing the failures rather than condensation occurring on the PCB itself. As shown in figure 8, the water droplets were observed on the AC/DC cables and connectors. It is highly likely that these droplets were blown into the power supply units when the facilities’ maintenance staff increased the airflow in efforts to mitigate the problem.
Figure 8: Condensation on cables and connectors
The erroneous control sequence was promptly corrected and additional safeguards were added to eliminate the possibility of repeated occurrence of such an event. Even though the supply air humidity, which was more than 95% at times, was out of the operational range of the power supply units (10-90% RH, non-condensing), conformal coating was applied locally in selective areas of the PCB to protect the surfaces from condensation and to strengthen the power supply units against such corner cases.
A year later after this, another event occurred that lead to condensation in data hall. Due to a severe thunderstorm, the outside air conditions changed rapidly in last week of June 2012 resulting in condensation on the component surfaces. This time however, the PSUs with conformal coating survived the high humidity without causing any interruption in the operation.
References:
- Frankovsky F., “Most Effective Computing”, opencompute.org, July 27, 2011; http://opencompute.org/2011/07/27/more-effective-computing/
- Personal Communication, Lawrence Berkley National Laboratory, high-Tech and Industrial Systems Group.
- Frachtenberg E., Lee D., Magarelli M., Mulay V. and Park J., “Thermal Design in The Open Compute Datacenter”, ITherm 2012, San Diego CA, May 30 – June 1, 2012
- Mulay V., “Learning Lessons at Prineville Data Center”, opencompute.org, November 17, 2011; http://opencompute.org/2011/11/17/learning-lessons-at-the-prineville-data-center/