Tools for Intelligent System Management of Very Large Computing Systems
The main objective of TIMaCS is the reduction in complexity for the administration of compute infrastructures by implementing a framework for intelligent management of even very large environments which will emerge in future.
- Concept and Implementation of a robust and highly scalable monitoring solution for very large computing systems based on existing tools and supplementary implementations ready for production.
- Design and Implementation of a system for partitioning and dynamic user assignment of very large computing sysstems based on concepts for virtualisation. Easy setup or removal of single compute nodes out of a heterogeneous or hybrid system will be included.
- On top of that a management framework will be developed which supports different automisation and escalation strategies based on policies: notification of an administrator, semi-automatic to fully-automatic counteractions, prognoses, anomaly detection and their validation under production conditions.
- Tools for detection and automatic error handling as well as concepts and realisation of preventive actions to check the infrastructure i.e. between jobs and supporting regular maintenance.
- Sustainability by defining stasndard conform interfaces and an integrated framework targeting at the combination of not yet synchronised developments of tools for monitoring and management, in cluster virtualisation, policy based management and knowledge based data analysis.
- The High Performance Computing Center Stuttgart (HLRS), University of Stuttgart
- The Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden
- science + computing ag (S+C), Tübingen
- European High Performance Computing Technology Center, (NEC), NEC Deutschland GmbH, Stuttgart
- Distributed Systems Group, Philipps-University of Marburg
More information can be found at
TIMaCS started on the 1. January 2009 and will run until the 31.Dezember 2011.