Ten Steps to Increasing Data Center Efficiency and Availability through Infrastructure Monitoring
A White Paper from the Experts
in Business-Critical Continuity Summary. . .
The first decade of the 21st century was one of rapid growth and change for data centers. For most of the decade, data center managers were forced to react to rapid, continuous changes dictated by the capacity and availability requirements of their organizations, and the density of the equipment being deployed to meet those requirements.
Now, data centers must enter a new stage of maturity marked by a more proactive approach to management to enable increased efficiency, better planning and higher levels of service. Achieving actionable visibility into data center operations requires the ability to collect, consolidate and analyze data across the data center, using advanced devices, sensors and management software.
The ten steps outlined in this paper provide a systematic approach to building the foundation for data center infrastructure management by deploying and leveraging measurement, intelligent controls and centralized monitoring and management. Data centers employing these 10 prescribed point solutions for infrastructure performance monitoring stand to gain an operational, strategic and transformative advantage for their enterprise or business.
1. Sensing temperatures
2. Monitoring power
3. Monitoring rack conditions
4. Detecting fluid leaks
5. Intelligent control of precision cooling
6. Intelligent control of critical power
7. Managing alerts and alarms
8. Monitoring energy efficiency
9. Monitoring batteries
10. Monitoring and managing remotely
Through these 10 steps, data center managers can automatically collect data from infrastructure systems and sensors placed strategically within the physical space and use this data to gain higher availability, increase efficiency, enhance the value of virtualization and consolidation efforts and improve planning.
During the first decade of the 21st century, the data center emerged as a significant corporate asset, playing a vital role in business management and customer service. Throughout this period, the data center underwent an evolution as computing and data storage capacities increased significantly. Consider these trends:
Global server shipments grew from 7.5 million annually in 2005 to 9 million in
2008, according to Gartner (Figure 1)
Average rack density grew from 6 kW in
2006 to 7.4 kW in 2009, according to the
Data Center Users’ Group (Figure 2)
Data center power density increased an average of 15 percent annually between
2000 and 2009 according to IDC
Chips in blade servers are now generating
100 watts per square centimeter and growing
Data centers have traditionally been designed with extra headroom to accommodate growth, but during the last decade demand escalated so quickly that added IT capacity consumed available headroom and outpaced supply in terms of floor space and power and cooling capacity. This created conflicts as facility personnel struggled to supply IT’s demand for server capacity.
These problems were further exacerbated by two trends that emerged in the second half of the decade. The first is the increased focus on data center energy consumption. With both the density and quantity of servers rising, data center energy consumption became a significant factor in terms of IT cost management and, in some companies, response to concerns about global warming. Early efforts to reduce data center energy consumption focused on reducing costs around data center cooling, which accounts for approximately 35 percent of data center energy consumption. Subsequent efforts took a more holistic approach that recognized the interdependency of data center systems and shifted the focus to the IT systems that create the need for cooling.
The second trend was the adoption of virtualization technologies. In its annual survey of data center managers, the Data Center Users’ Group saw virtualization adoption rates of 81 percent in 2009. This has created a dynamically changing application environment layered on an essentially static physical environment, increasing data center complexity and introducing new challenges to physical infrastructure management.
In most organizations, data center managers lacked the tools to effectively address these challenges. The network management systems essential to IT personnel in monitoring and managing IT equipment did not address the critical issues of energy consumption, available rack capacity or ambient air temperatures that are essential to proactive data center management. Further, the building management systems used by facility personnel to monitor power and cooling in the data center failed to provide the alarm management capabilities required for critical systems and to account for the interdependencies between systems.
Evolving from a reactive to a proactive approach to infrastructure monitoring requires a new type of management system that provides visibility into the data center physical infrastructure within both the IT and facility domains and across these two domains.
The Emergence of Data Center Infrastructure Management
Data Center Infrastructure Management (DCIM) is a superset of infrastructure monitoring and encompasses the ability to manage the data center physical infrastructure to optimize data center resource utilization, efficiency and availability.
DCIM includes management of the data center infrastructure layer (power, cooling and the physical space), the IT infrastructure layer, (compute, storage and communications equipment) and the gap between the two layers.
By enabling management across the gap, data center operators have visibility into the true capacity of their IT and infrastructure systems, allowing them to manage closer to actual capacity, rather than the conservative estimates that leave some percentage of capacity unused as a buffer. Emerson Network Power has identified four successive stages of DCIM progression:
1. Monitor and Access, which provides the ability to quickly react to potential problems in the data center infrastructure and improve management. With monitoring and access, data center personnel have visibility into equipment operating status and receive real-time alerts and alarms to notify them of potential equipment operating problems. Remote access can also speed the response to equipment problems while real-time monitoring data can be used
to populate planning tools with actual performance data.
2. Data Capture and Planning, in which data center personnel have the ability to automatically collect data about what assets are in the data center and where they are located, as well as how they are interconnected. This data can be used to address key planning issues including, is there enough space, power and cooling to meet future needs and how can equipment be commissioned and decommissioned more efficiently.
3. Analyze and Diagnose, which provides data center personnel with the ability to respond more quickly to changes in the infrastructure and manage more efficiently. Operating data available through monitoring and data capture initiatives can be used to extend the life of the data center, reduce mean-time-to-repair, synchronize infrastructure with virtualization automation and analyze performance against SLAs.
4. Recommend and Automate. The final stage of progression enables data center optimization by providing data center personnel with the visibility and control to optimize performance while maintaining or improving availability. With this level of progression, data center management becomes truly proactive as personnel can anticipate potential failures and automatically shift compute and physical resources to eliminate downtime while increasing resource utilization to optimize efficiency across the data center. Creating a comprehensive approach to data center infrastructure monitoring not only addresses the first phase of DCIM maturity but enables future phases. The remainder of this paper outlines 10 steps data center managers can take to create an infrastructure monitoring system that will deliver value today and create the foundation for holistic Data Center Infrastructure Management.
The 10 Steps to Effective Infrastructure Monitoring
Although sophisticated data center management tools have emerged in recent years, many facilities still lack the ability
to comprehensively monitor their physical infrastructure systems. This is partly due to the disparate systems that make up the data center infrastructure, partly the result of the rapid changes transpiring in the data center, and partly the lack of a clear and simple roadmap for bringing together these disparate systems into a common network. This paper attempts to address this last challenge by outlining a simple and logical 10-step process for moving toward comprehensive data center infrastructure monitoring.
If you can’t measure it, you can’t control it. That’s why the first four steps in this 10- step approach prescribe the deployment of sensors that can collect critical power, cooling and safety data across the data center.
1. Sensing temperatures
One of the most significant consequences of the growth in data center density and complexity is the issue of heat density. As data center density has increased, cooling loads have grown and become more heterogeneous. It is no longer possible to manage temperatures on a facility level because rack densities may vary widely, creating hot spots in one zone while another zone is cooled below the desired temperature.
Installing a network of temperature sensors across the data center helps ensure that all equipment is operating within the ASHRAE recommended temperature range (64.4° F to 80.6° F). By sensing temperatures at multiple locations the airflow and cooling capacity of the precision cooling units can be more precisely controlled, resulting in more efficient operation.
Additionally, the network of sensors can reduce cooling costs by allowing safe operation closer to the upper end of the temperature range—operating, for example, at 75° F instead of 65° F. According to an ASHRAE paper developed by Emerson Network Power, a 10° F increase in server inlet temperatures results in a 30 percent reduction in compressor power draw. Assuming the Computer Room Air Conditioning (CRAC) units supporting the facility are equipped with digital or unloading compressors, this reduction in compressor power draw translates into a 21 percent reduction in cooling energy costs.
The data center cooling system typically measures return air temperatures and, in some cases, supply air temperatures. These measurements should be supplemented with sensors that measure server inlet temperature to enable more precise control of the air temperature at the server. With more cooling systems migrating to the row and rack, these sensors may be connected directly to a particular cooling unit, as is the case with the Liebert CRV row-based system, which can support a mini-network of sensors that measure server inlet temperature for adjacent racks and adjust cooling accordingly.
The best practice is to attach at least one sensor on every rack, and it is also acceptable to place a sensor on every other rack when racks are arranged in the hot aisle/cold aisle configuration, and there is uniform loading across the row. Sensors should be located near the top of the rack where temperatures are generally highest.
It is also advantageous to locate sensors near the end of the row where they can detect any hot air entering the cold aisle from the hot aisle.
There are advantages to connecting the temperature sensors directly to the cooling system, as with the Liebert CRV, as well as to a central monitoring system. When the sensors and cooling system are working in concert, the cooling system can automatically adapt its operation to eliminate hot spots, respond to heat load changes, detect obstructions and coordinate its operation with other cooling units working in the same zone.
ASHRAE provides more detailed guidelines for sensor location in the paper Thermal Guidelines for Data Processing Environments.
2. Monitoring power usage
With power densities and energy costs both rising, the ability to monitor energy consumption is essential for effective data center management. Where one measures power can have an effect on how efficiency is measured. See the discussion of PUE monitoring in Step 8 for more information on efficiency monitoring.
To gain a comprehensive picture of data center power consumption, power should be monitored at the Uninterrumpible Power Supply (UPS), the room Power Distribution Unit (PDU) and within the rack. Measurements taken at the UPS provide a base measure of data center energy consumption that can be used to calculate Power Usage Effectiveness (PUE) and identify energy consumption trends. Monitoring the room PDU prevents overload conditions at the PDU and helps ensure power is distributed evenly across the facility. The best view of IT power consumption comes from the power distribution units inside racks. Rack PDUs now feature integrated monitoring and control capabilities to enable continuous power monitoring. Because rack power consumption varies based on the specific equipment within the rack and its load, each rack should be equipped with a PDU— two for dual bus environments—capable of monitoring power consumption to the rack PDU, as well as overload-protected receptacle groups and, where required, at the receptacle level.
These systems can provide PDU, branch- level and receptacle-level monitoring of volts, kilowatts (kW), amps and kW per hour. This provides the most direct measure of power consumption available to data center management and supports both higher data center efficiency and availability. In addition to more effective power management, rack PDUs are used to support more accurate chargeback of IT services and identify stranded capacity.
Some models also enable individual receptacles to be turned on and off remotely to prevent the addition of new devices that could create an overload condition.
3. Monitoring rack conditions
With increasing densities, a single rack can now support the same computing capacity that used to require an entire room. Visibility into conditions in the rack can help prevent many of the most common threats to rack- based equipment, including accidental or malicious tampering, and the presence of water, smoke and excess humidity or temperature.
A rack monitoring unit can be configured to trigger alarms when rack doors are opened (and can even capture video of the event), when water or smoke is detected, or when temperature or humidity thresholds are exceeded. These “eyes inside the rack” can be connected to a central monitoring system where environmental data can be integrated with power data from the rack PDUs, while also providing local notification by activating a beacon light or other alarm if problems are detected. They should always be deployed in high-density racks and racks containing business-critical equipment.
4. Detecting fluid leaks
A single water leak can cost thousands of dollars in equipment damage—and lose many times more in lost data, customer transactions and enterprise productivity. Leak detection systems use strategically located sensors to detect leaks across the data center and trigger alarms to prevent damage. Sensors should be positioned at every point fluids are present in the data center, including around water and glycol piping, humidifier supply and drain lines, condensate drains and unit drip pans.
A leak detection system can be operated as a standalone system or connect into the central monitoring system to simplify alarm management. Either way, it is an important part of the sensor network that gives data center managers visibility into operating conditions.
Current generation infrastructure systems are equipped with sophisticated controls that enhance reliability and enable
multiple units to work together to improve performance and increase efficiency.
5. Intelligent control of precision cooling Intelligent controls integrated into room and row air conditioners allow these systems to maintain precise temperature and humidity control as efficiently as possible. They coordinate the operation of multiple cooling units to allow the units to complement rather than compete with each other, as sometimes occurs when intelligent controls are not present.
For example, one unit may get a low humidity reading that could trigger
the precision cooling system’s internal humidifier. But before turning on the humidifier, the unit checks the humidity readings of other units and discovers that humidity across the room is at the high end of the acceptable range. Instead of turning on the humidifier, the system continues to monitor humidity to see if levels balance out across the room.
In one large data center’s carefully monitored retrofit application, adding intelligent controls to 32 Liebert Deluxe precision cooling units with integrated Liebert iCOM controls reduced energy consumption by 200 kW per hour, and generated a return on investment of 1.2 years.
Integrated control systems on room- and rack-based cooling systems can also be used to enable preventive maintenance programs and speed response to system problems. Data collected by these systems enables predictive analysis of components and proactive management of system maintenance. Event logs, service history logs and spare parts lists all support more
6. Intelligent control of critical power
UPS systems now include digital controls with the intelligence to alter and optimize the performance of the UPS. They automatically calibrate the system and ensure the UPS is working properly. In addition, they ensure that the UPS switches between traditional operation and bypass during overloads, protecting the UPS system and the overall power infrastructure. This minimizes the need to make manual adjustments based on site conditions. Instead of requiring a service technician to manually adjust the analog controls, the UPS system itself monitors the conditions at the site (power factor, load and ambient temperature) and makes adjustments to maintain optimum performance.
These controls also enable more efficient operation through energy optimization and intelligent paralleling features. Energy optimization mode increases UPS efficiency by powering the IT load from the bypass path while providing some power conditioning. An organization may choose to activate energy optimization during periods when utility power quality is thought to be particularly good or when availability requirements are not as high, such as nights or weekends. Energy optimization mode can improve UPS efficiency by as much as five percentage points, but also introduces the possibility of compromising total power protection. This risk can be mitigated when the controls are designed to keep the UPS inverter “hot” while the system is in energy optimization mode, allowing faster response to utility power disturbances.
Intelligent paralleling provides another option for improving UPS efficiency in multi- module systems. Intelligent paralleling manages the load across multiple UPS modules and can automatically deactivate
modules that are not required to support the load, while still ensuring that the system is providing adequate redundancy. For example, a four-module N + 1 system sized to support 700 kVA using four 250 kVA UPS modules can support loads below 400 kVA with only three modules. This capability can improve system efficiency by up to six percent without sacrificing protection.
Centralized Monitoring and Management
Current generation power and cooling systems feature sophisticated displays that provide a wealth of operating data. The Liebert CRV cooling system, for example, can show trending of server inlet temperatures for multiple racks. But in
the dynamic, every-second-counts world of the data center, local management
of infrastructure systems is typically inadequate to meet high efficiency and availability requirements. That has spurred the use of centralized monitoring systems.
Centralized monitoring systems are available today that operate across the existing IT network or across a dedicated network.
Sites smaller than 2,500 square feetgenerally choose to use the existing network rather than set up a separate network,
while larger facilities will benefit from a dedicated network that provides the ability to integrate with building automation and management systems and manage multiple facilities.
7. Managing alerts and alarms Minimizing system downtime has been the traditional justification for data center
infrastructure monitoring and it continues to be a powerful benefit. The ability to view immediate notification of a failure—or an event that could ultimately lead to a failure— through a centralized system allows for a faster, more effective response to system problems.
Equally important, a centralized alarm management system provides a single window into data center operations and
can prioritize alarms by criticality, to ensure the most serious incidents receive priority attention. Every alarm needs to be gauged for its impact on operations. For example, it may be acceptable to defer a repair of one precision cooling unit if 30 are working normally, but not if it is one of only two units.
Taken a step further, data from the monitoring system can be used to analyze equipment operating trends and develop more effective preventive maintenance programs.
Finally, the visibility into data center infrastructure provided by a centralized system can help prevent problems created by changing operating conditions. For example, the ability to turn off receptacles in a rack that is maxed out on power, but may still have physical space, can prevent
a circuit overload. Alternately, alarms that indicate a rise in server inlet temperatures could dictate the need for an additional row cooling unit before overheating brings down the servers the business depends on.
8. Monitoring energy efficiency
Energy costs consume a large proportion of data center operating costs, but many facilities lack energy monitoring capabilities.
Automating collection and analysis of data from the UPS and PDU monitoring systems can help reduce energy consumption while increasing IT productivity. Energy efficiency monitoring can track total data center consumption, automatically calculate
and analyze PUE and optimize the use of alternative energy sources.
Using data from the UPS, the monitoring system can track UPS power output, determine when UPS units are running at peak efficiency, and report Level 1 (basic) PUE. Monitoring at the room or row PDU provides the ability to more efficiently load power supplies, dynamically manage cooling and automatically calculate Level 2 (intermediate) PUE. Panel board monitoring provides visibility into power consumption by non-IT systems, including lighting and generators, to ensure efficient use of those systems. Finally, rack-level monitoring provides the most accurate picture of IT equipment power consumption and can support Level 3 (advanced) PUE reporting. The ability to automate data collection, consolidation and analysis related to efficiency is essential to data center optimization and frees up data center staff to focus on strategic IT issues.
In a study of organizations with service contracts for maintenance and remote analysis with alarm monitoring, ensuring regular preventive maintenance is performed, the Liebert Services business of Emerson Network Power found that customers using battery monitoring experienced half as many battery failures as customers that didn’t.
The potential for battery monitoring to reduce failures is even greater than that—customers who relied on Liebert Services Ntegrated Monitoring did not experience a single battery failure.
9. Monitoring batteries
To prevent data loss and increase uptime, most data centers require a dedicated battery monitoring system. According to Emerson Network Power’s Liebert Services business, battery failure is the leading cause of UPS system loss of power. Utilizing a predictive monitoring battery monitoring method can provide early notification of potential battery failure. The best practice is to implement a monitoring system that connects to and tracks the health of each battery within a string. The most effective battery monitoring systems continuously track all battery parameters, including internal resistance, using a DC test current to ensure measurement accuracy and repeatability.
Supported by a well-defined process for preventive maintenance, and
10. Monitoring and managing remotely Data center remote monitoring can lift the burden of infrastructure monitoring from internal personnel and place it with an organization with resources devoted to this task, as well as deep infrastructure
expertise. In addition to improved resource utilization, a dedicated monitoring organization can respond more quickly to portfolio issues.
For instance, in monitoring data across multiple facilities, they may be alerted to a problem caused by a certain manufacturer’s breaker. Very quickly, the manufacturer can be notified so as to avoid a potential problem occurring across hundreds of sites, many of which contain similar equipment.
An organization such as Liebert Services has engineers on staff that analyze data returned remotely and systematically examine that data. For example, remote monitoring tracks the inbound frequency of power provided to a UPS. If the UPS is receiving utility power, the input power frequency will be precisely 60 Hz. When the monitoring staff sees the input frequency vary within 58-61 Hz, they immediately recognize that the generator has started and is sourcing power—but potentially at the wrong time, and for the wrong reason. Finally, telemetry-based monitoring enables remote management of systems where authorized, allowing the monitoring partner to control systems remotely. This is particularly valuable
when a facility is undergoing changes and updates.
Evaluating the Benefits of Infrastructure Monitoring
The 10 steps presented in this paper deliver powerful, quantifiable benefits in the key areas of data center availability and efficiency.
By some accounts, data center cooling accounts for 35 percent of data center energy consumption. Monitoring provides multiple opportunities to improve cooling efficiency. From the more precise control of air temperatures at the server inlet, to the improved coordination between cooling systems enabled by intelligent controls, monitoring can reduce cooling energy costs or enable the existing cooling system to support higher capacities.
Power monitoring and control also delivers energy reductions. With in-rack power monitoring, managers can identify equipment that is using energy but not supporting business services, reclaiming or eliminating this stranded capacity. Controls on the
power system also create the opportunity to increase UPS system efficiency by up to six percent.
Virtually every monitoring step contributes to data center availability by providing advance warning of potential problems or faster response and recovery from actual events.
From systems that can show exactly what is happening inside a rack at any point in time, to centralized alarm management and
battery monitoring, infrastructure monitoring eliminates some of the most common causes of data center downtime. In many cases these systems are relatively simple to implement and, once installed, provide the visibility and control required for data center optimization.
Monitoring rack conditionsPrevent unsafe conditions inside the rack; respond quickly to problems
Avert organizational inefficiency created by server and application downtime
4Detecting fluid leaks
Prevent outages from water leaks
Reduce wasted energy caused by leaking equipment
5Intelligent controls cooling
Enhanced cooling system maintenance
Cut cooling costs by improving hot spot management; optimize operation of multiple units
6Intelligent controls power Enhanced ability to handle faultsGain three to five percent efficiency from energy optimization mode and one to six percent savings from intelligent paralleling
7Managing alerts and alarms
Faster response to events;
more proactive maintenance
Automate operations so personnel can focus on othe issues
8Monitoring energy efficiency
Optimize efficiency based on measurements of operating conditions
Reduce battery failures by half
Reduce operational downtime so enterprise stays productive
Enhanced data analysis and specialization reduces downtime; elminate battery-related downtime
Create efficient use of human resources allowing personnel to attend to strategic issues
The steps outlined in this paper represent proven strategies for improving data center efficiency and availability and create the foundation for holistic data center infrastructure management.
The next stage in data center management progression is automating and centralizing the management of the physical infrastructure to enable more effective resource utilization without compromising availability. Following the 10 steps outlined in this paper can help enterprises create the foundation for the future of data center management, while delivering value today by improving availability, efficiency and planning.