Tools for Intelligent System Management of Very Large Computing Systems

The project aims at reducing the complexity of the manual administration of computing systems by realising a framework for intelligent manangement of even very large computing systems based on technologies for virtualising, knowledge-based analysis and validation of collected information, definition of metrics and policies. This framework should be able to automatically start predefined actions additionally to the notification of an administrator. Beyond that the data analysis based on previous monitoring data, regression tests and intense regular checks aims at preventive actions prior to failures. The framework to be realised will include open interfaces to be easily bind to relevant existing systems like accounting or user management systems (user policies, priority, ...).

  • Concept and Implementation of a robust and highly scalable monitoring solution for very large computing systems based on existing tools and supplementary implementations ready for production.
  • Design and Implementation of a system for partitioning and dynamic user assignment of very large computing sysstems based on concepts for virtualisation. Easy setup or removal of single compute nodes out of a heterogeneous or hybrid system will be included.
  • On top of that a management framework will be developed which supports different automisation and escalation strategies based on policies: notification of an administrator, semi-automatic to fully-automatic counteractions, prognoses, anomaly detection and their validation under production conditions.
  • Tools for detection and automatic error handling as well as concepts and realisation of preventive actions to check the infrastructure i.e. between jobs and supporting regular maintenance.
  • Sustainability by defining standard conform interfaces and an integrated framework targeting at the combination of not yet synchronised developments of tools for monitoring and management, in cluster virtualisation, policy based management and knowledge based data analysis.

Project Details

Funding AgencyBMBF
Runtime01.01.2009 - 31.12.2011