Platform for Optimising the Design and Operation of Modular Configurable IT Infrastructures and Facilities with Resource-Efficient Cooling
ICT-2011.6.2: ICT systems for energy efficiency
Execution: From 2011-10-01 to 2014-03-31
IT infrastructures are responsible for at least 2% of the global energy consumption making it equal to the demand of the aviation industry. Furthermore, in many current data centres the IT equipment uses only about half of the total energy for computing, whilst most of the remaining energy is required for cooling and air movement. This often results in poor Power Usage Effectiveness (PUE) values and significant CO2 emissions. For this reason issues related to cooling, heat transfer, and IT infrastructure location are gaining more attention and are carefully studied during planning and operation of data centres. In this context, the construction of data centres by using modular building blocks is gaining more and more attention, in particular as a potential antipole to specialised facilities. This modular approach is becoming increasingly popular due to flexibility of design, lower costs and shorter building times. This modular approach can refer to a variety of approaches - one of the most popular are data centres housed in standard shipping containers. In addition, this modular approach can also refer to pre-configured units which are joined together to build-up large computing facilities, with e.g., hundreds of square meters in size. However, while specialised facilities are established in current environments, there is a significant need to analyse the energy efficiency aspects of such a modular approach in order to allow for a comparison of these approaches. In particular, a deep insight into the total energy consumption of both, large data centres and smaller facilities, enforce additional research to determine how efficient this approach is. An important aspect when considering the energy efficiency of modular data centres is the cooling technique. The use of approaches such as “free air cooling” where external air is used to cool systems rather than electrical chillers can help to improve efficiency and achieve PUE ratings close to the ideal of 1.0. The cooling and heat transfer processes are not the only important aspects influencing the energy efficiency of data centres. Actual power usage and effectiveness of energy saving methods heavily depends on the types of IT applications and workload properties. However, to take full advantage of these methodes, application power usage and performance must be monitored in a fine-grained manner, and parameters and metrics characterising both, applications and resources, must be defined precisely.
Consequently, there is a large amount of parameters impacting the energy efficiency of IT infrastructures. All these parameters should be taken into account during the design and configuration of data centres. Issues such as types and parameters of applications, workload and resource management policy based scheduling, hardware configuration, metrics defining efficiency of building blocks, and energy reused by facilities connected to IT infrastructures are all crucial to understand and improve the energy efficiency of data centres as a whole. To carefully study these issues, simulation, visualisation, and decision supporting tools are needed, supporting the optimisation of the design and operation of new energy-efficient modular IT infrastructures and facilities. To address the aforementioned IT energy efficiency issues, the main goal of CoolEmAll is to provide advanced simulation, visualisation and decision support tools along with blueprints of computing building blocks for modular data centre environments. Once developed, these tools and blueprints are going to allow to minimize the energy consumption, and consequently the CO2 emissions of the entire IT infrastructure, taking into account the corresponding facilities as well. This will be achieved by:
- the design of diverse types of modular computing building blocks (ComputeBox Blueprints), which are going to be well defined by energy efficiency metrics;
- the development of a simulation, visualisation and decision support toolkit (SVD Toolkit) that will enable the analysis and optimization of IT infrastructures assembled with these building blocks;
Therefore, these modular computing modules as well as the toolkit are going to take into account three aspects reflecting the major impact on actual energy consumption, namely the cooling model, the according application properties and workloads, as well as workload and resource management policies. To this end, the energy efficiency of computing building blocks will be precisely defined by a set of novel metrics expressing relations between the energy efficiency and essential factors listed above. In addition to common static approaches, the CoolEmAll platform will enable studies of dynamic states of IT infrastructures based on changing workloads, management policies, cooling method, and ambient temperature. Therefore, CoolEmAll is going to address the following technical objectives:
- The development of a simulation, visualisation, and decision support toolkit (SVD Toolkit), allowing for analysing and designing modular IT infrastructures and facilities with resource-efficient cooling. This platform will support IT infrastructure designers, decision makers and administrators in the process of planning new infrastructures or improving existing ones. The intended modular approach to build and model IT infrastructures and facilities allows for many extension possibilities and high level of customization. CoolEmAll will develop a flexible simulation platform integrating models of applications, workload scheduling policies, hardware characteristics, cooling and air and thermal flows using computational fluid dynamics (CFD) simulation tools. The flexibility of these models, based on model parameter settings, will ensure flexibility of the entire CoolEmAll SVD platform, allowing capturing required model settings and simulate these models for a wide range of applications, workload scheduling policies, IT-Infrastructure and hardware characteristics. Advanced visualisation tools and user interfaces will allow users to easily analyse various options and optimize energy efficiency of planned IT infrastructures and facilities.
- The provisioning of blueprints of computing modules and a basic prototype. The basic version of this module will enable tests and real-life experiments providing realistic behavioural information for the simulation models, allowing capturing thermal and energy efficiency behaviour on node and rack level, and will also enable the verification of these models. This prototype will include fine-grained monitoring capabilities, allowing for a detailed inspection of the entire environment. Based on this evaluation, a refined and optimised prototype will be designed for diverse scenarios including various hardware densities, cooling methods, workloads and requirements.
- The definition and evaluation of thermal- and energy-aware workload scheduling and resource management policies. The proposed policies will include intelligent workload scheduling and resource management (e.g. dynamic switching off nodes, lowering frequency and voltage to avoid excessive heat generation). The corresponding decisions will be based on fine-grained hardware and application monitoring. The selection and setup of the corresponding hardware will depend on applications types, workload requirements, cooling method, and ambient temperatures. In order to reflect the evaluation of the CoolEmAll approach within a realistic environment, two major types of workload will be considered: data centre cloud workloads using virtualisation and HPC applications. The proposed policies will be applied in simulations to study their impact on energy-efficiency in diverse configurations and in large scale.
- The analysis and parameterisation of applications and workloads. The CoolEmAll simulations as well as workload management techniques will take into account specific workload and application characteristics. To this end, CoolEmAll will prepare benchmarks and classification of applications and workloads. This knowledge about application properties will be used to simulate their impact on thermal issues and energy efficiency and to propose thermal-aware management policies.
- The definition of specific energy efficiency metrics. Precise definitions of metrics expressing trade-offs between energy and performance will be defined. These metrics will go beyond existing ones (e.g. those defined in the Code of Conduct on Data Centers Energy Efficiency). With this respect, CoolEmAll is going to take into account metrics defined by other projects as well, extend them or propose additional metrics expressing classes of efficiency including relation between energy efficiency and application characteristics, workload properties, ambient temperatures, required heat re-use efficiency, etc. In particular, the metrics defined and evaluated within the GAMES project are going to be taken into account
- Verification of simulation tools and their application for specific scenarios. The verification of the proposed methods and software will be performed by tests in real environments using a basic prototype module, real applications, as well as enhanced monitoring systems based on sensors. CoolEmAll will also perform coupled simulations for several diverse settings including large scale IT infrastructures such as whole data centres. These simulations will use collected traces (e.g. from the GAMES project or partners) to plan, design and operate new and existing IT infrastructures and facilities. In this way, the final blueprints of the computing modules will be evaluated and optimised in specific settings.
- Instytut Chemii Bioorganicznej Pan (PSCN), PL
- High Performance Computing Center Stuttgart (HLRS), D
- Universite Paul Sabatier Toulouse III, F
- Christmann Informationstechnik + Medien GmbH & Co KG, D
- The 451 Group Limited, UK
- Fundacio Institut de Recerca de L'Energia de Catalunya, E
- ATOS Origin Sciedad Anonima Espanola, E