Zhou, N., Georgiou, Y., Pospieszny, M., Zhong, L., Zhou, H., Niethammer, C., Pejak, B., Marko, O., Hoppe, D.: Container Orchestration on HPC Systems through Kubernetes. journal of Cloud Computing: Advances, Systems and Applications. (2021).
Containerisation demonstrates its efficiency in application deployment in Cloud Computing. Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable, hence are being adopted in High Performance Computing (HPC) clusters. Singularity, initially designed for HPC systems, has become their de facto standard container runtime. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We introduce a Torque-Operator which serves as a bridge between HPC workload manager (TORQUE) and container orchestrator (Kubernetes). We propose a hybrid architecture that integrates HPC and Cloud clusters seamlessly with little interference to HPC systems where its container orchestration is performed on two levels.
Zhou, N.: Containerization and Orchestration on HPC Systems. Sustained Simulation Performance 2019 and 2020 (2021).
Containerization demonstrates its efficiency in application deployment in Cloud clusters. HPC systems start to adopt containers, as containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We enable the synergy of Cloud and HPC clusters. We propose the preliminary design of a feedback control scheduler that performs efficient container scheduling meanwhile taking advantage of the scheduling policies of the container orchestrator (Kubernetes) and the HPC workload manager (TORQUE).
Zhou, N., Zhong, L., Hoppe, D., Pejak, B., Marko, O., Cardona, J., Czerkawski, M., Andonovic, I., Michie, C., Tachtatzis, C., Alexakis, E., Mavrepis, P., Kyriazis, D., Pospieszny, M.: CYBELE: A Hybrid Architecture of HPC and Big Data for AI Applications in Agriculture (to apear). (2021).
Containers are widely utilized in Cloud to encapsulate applications and their dependencies in isolated environments, thus providing compatibility and portability to application execution environments. Artificial Intelligence (AI) applications often incorporate a complex stack of software packages which can benefit from containerization immensely. Big data analytics push the development of AI applications, making them more computation-intensive or data-intensive. There is a growing interest in executing AI applications in HPC clusters that are conventionally applied for large-scale engineering, scientific and financial simulations. This chapter presents a hybrid architecture consisting of a Cloud cluster and an HPC cluster, which has been proposed in the EU-funded research project CYBELE. A login node bridges the two clusters and provides a unified interface for job submission. More specifically, via the login node, long-running service programs are scheduled to be hosted on the Cloud cluster. Per contra, AI applications are scheduled to run on the HPC cluster where they obtain significant performance improvement. Furthermore, the methods of parallelism and deployment of containerized AI applications on HPC systems are described.
The use of hybrid scheme combining the message passing programming models for inter-node parallelism and the shared memory programming models for node-level parallelism is widely spread. Existing extensive practices on hybrid Message Passing Interface (MPI) plus Open Multi-Processing (OpenMP) programming account for its popularity. Nevertheless, strong programming efforts are required to gain performance benefits from the MPI+OpenMP code. An emerging hybrid method that combines MPI and the MPI shared memory model (MPI+MPI) is promising. However, writing an efficient hybrid MPI+MPI program – especially when the collective communication operations are involved – is not to be taken for granted. In this paper, we propose a new design method to implement hybrid MPI+MPI context-based collective communication operations. Our method avoids on-node memory replications (on-node communication overheads) that are required by semantics in pure MPI. We also offer wrapper primitives hiding all the design details from users, which comes with practices on how to structure hybrid MPI+MPI code with these primitives. Further, the on-node synchronization scheme required by our method/collectives gets optimized. The micro-benchmarks show that our collectives are comparable or superior to those in pure MPI context. We have further validated the effectiveness of the hybrid MPI+MPI model (which uses our wrapper primitives) in three computational kernels, by comparison to the pure MPI and hybrid MPI+OpenMP models.
Zhou, N., Georgiou, Y., Zhong, L., Zhou, H., Pospieszny, M.: Container Orchestration on HPC Systems.2020 IEEE International Conference on Cloud Computing (CLOUD). 2020 IEEE International Conference on Cloud Computing (CLOUD) (2020).
Containerization demonstrates its efficiency in application deployment in cloud computing. Containers can encapsulate complex programs with their dependencies in isolated environments, hence are being adopted in HPC clusters. HPC workload managers lack micro-services support and deeply-integrated container management, as opposed to container orchestrators (e.g. Kubernetes). We introduce Torque-Operator (a plugin) which serves as a bridge between HPC workload managers and container Orchestrators.
Georgiou, Y., Zhou, N., Zhong, L., Hoppe, D., Pospieszny, M., Papadopoulou, N., Nikas, K., Nikolos, O.L., Kranas, P., Karagiorgou, S., Pascolo, E., Mercier, M., Velho, P.: Converging HPC, Big Data and Cloud technologies for precision agriculture data analytics on supercomputers. 15th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'20) (2020).
The convergence of HPC and Big Data along with the influence of Cloud are playing an important role in the democratization of HPC. The increasing needs of Data Analytics in computational power has added new fields of interest for the HPC facilities but also new problematics such as interoperability with Cloud and ease of use. Besides the typical HPC applications, these infrastructures are now asked to handle more complex workflows combining Machine Learning, Big Data and HPC. This brings challenges on the resource management, scheduling and environment deployment layers. Hence, enhancements are needed to allow multiple frameworks to be deployed under common system management while providing the right abstraction to facilitate adoption. This paper presents the architecture adopted for the parallel and distributed execution management software stack of Cybele EU funded project which is put in place on production HPC centers to execute hybrid data analytics workflows in the context of precision agriculture and livestock farming applications. The design is based on: Kubernetes as a higher level orchestrator of Big Data components, hybrid workflows and a common interface to submit HPC or Big Data jobs; Slurm or Torque for HPC resource management; and Singularity containerization platform for the dynamic deployment of the different Data Analytics frameworks on HPC. The paper showcases precision agriculture workflows being executed upon the architecture and provides some initial performance evaluation results and insights for the whole prototype design.
Zhou, N., Delaval, G."el, Robu, B., Rutten, 'E., M'ehaut, J.-F.: An autonomic-computing approach on mapping threads to multi-cores for software transactional memory. Concurrency and Computation: Practice and Experience.30, (2018).
A parallel program needs to manage the trade‐off between the time spent in synchronisation and computation. This trade‐off is significantly affected by its parallelism degree. A high parallelism degree may decrease computing time while increasing synchronisation cost. Furthermore, thread placement on processor cores may impact program performance, as the data access time can vary from one core to another due to intricacies of the underlying memory architecture. Alas, there is no universal rule to decide thread parallelism and its mapping to cores from an offline view, especially for a program with online behaviour variation. Moreover, offline tuning is less precise. We present our work on dynamic control of thread parallelism and mapping. We address concurrency issues via Software Transactional Memory (STM). STM bypasses locks to tackle synchronisation through transactions. Autonomic computing offers designers a framework of methods and techniques to build autonomic systems with well‐mastered behaviours. Its key idea is to implement feedback control loops to design safe, efficient, and predictable controllers, which enable monitoring and adjusting controlled systems dynamically while keeping overhead low. We implement feedback control loops to automate management of threads and diminish program execution time.
Zhou, N., Delaval, G."el, Robu, B., Rutten, E., M'ehaut, J.-F. cois: Autonomic Parallelism and Thread Mapping Control on Software Transactional Memory.13th IEEE International Conference on Autonomic Computing (ICAC 2016). S. 189 - 198. 13th IEEE International Conference on Autonomic Computing (ICAC 2016), Wurzburg, Germany (2016).
Zhou, N., Delaval, G."el, Robu, B., Rutten, E., M'ehaut, J.-F. cois: Control of Autonomic Parallelism Adaptation on Software Transactional Memory. International Conference on High Performance Computing & Simulation (HPCS 2016) . S. 180-187. International Conference on High Performance Computing & Simulation (HPCS 2016) , Innsbruck, Austria (2016).