Schuchart, J., Gerndt, M., Kjeldsberg, P.G., Lysaght, M., Horak, D., Riha, L., Gocht, A., Sourouri, M., Kumaraswamy, M., Chowdhury, A., Jahre, M., Diethelm, K., Bouizi, O., Mian, U.S., Kruzik, J., Sojka, R., Beseda, M., Kannan, V., Bendifallah, Z., Hackenberg, D., Nagel, W.E.: The READEX formalism for automatic tuning for energy efficiency. Computing.1--19 (2017).
Energy efficiency is an important aspect of future exascale systems, mainly due to rising energy cost. Although High performance computing (HPC) applications are compute centric, they still exhibit varying computational characteristics in different regions of the program, such as compute-, memory-, and I/O-bound code regions. Some of today's clusters already offer mechanisms to adjust the system to the resource requirements of an application, e.g., by controlling the CPU frequency. However, manually tuning for improved energy efficiency is a tedious and painstaking task that is often neglected by application developers. The European Union's Horizon 2020 project READEX (Runtime Exploitation of Application Dynamism for Energy-efficient eXascale computing) aims at developing a tools-aided approach for improved energy efficiency of current and future HPC applications. To reach this goal, the READEX project combines technologies from two ends of the compute spectrum, embedded systems and HPC, constituting a split design-time/runtime methodology. From the HPC domain, the Periscope Tuning Framework (PTF) is extended to perform dynamic auto-tuning of fine-grained application regions using the systems scenario methodology, which was originally developed for improving the energy efficiency in embedded systems. This paper introduces the concepts of the READEX project, its envisioned implementation, and preliminary results that demonstrate the feasibility of this approach.
Schuchart, J., Hackenberg, D., Schöne, R., Ilsche, T., Nagappan, R., Patterson, M.K.: The Shift from Processor Power Consumption to Performance Variations: Fundamental Implications at Scale.Proceedings of the 1st Workshop on Energy-Aware HPC (Ena-HPC). S. 197-205 (2016).
Hackenberg, D., Schöne, R., Ilsche, T., Molka, D., Schuchart, J., Geyer, R.: An Energy Efficiency Feature Survey of the Intel Haswell Processor.International Parallel and Distributed Processing Symposium Workshop (IPDPSW). S. 896-904. IEEE Computer Society (2015).
Schuchart, J., Waurich, V., Flehmig, M., Walther, M., Nagel, W.E., Gubsch, I.: Exploiting Repeated Structures and Vectorization in Modelica.11th International Modelica Conference. S. 265-272. Modelica Association Paris, France (2015).
Ilsche, T., Hackenberg, D., Graul, S., Schöne, R., Schuchart, J.: Power Measurements for Compute Nodes: Improving Sampling Rates, Granularity and Accuracy. Sixth International Green and Sustainable Conputing Conference (IGSC).1-8 (2015).
Wang, D., Xu, Y., Thornton, P., King, A., Steed, C., Gu, L., Schuchart, J.: A functional test platform for the Community Land Model. Environmental Modelling & Software. (2014).
Jana, S., Schuchart, J., Chapman, B.: Analysis of Energy and Performance of PGAS-based Data Access Patterns. In: Malony, A.D. und Hammond, J.R. (Hrsg.) Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. S. 15:1-15:10. ACM, New York, NY, USA (2014).
Ilsche, T., Schuchart, J., Schöne, R., Hackenberg, D.: Combining instrumentation and sampling for trace-based application performance analysis.Tools for High Performance Computing 2014. S. 123-136. Springer (2014).
Hackenberg, D., Ilsche, T., Schuchart, J., Schöne, R., Nagel, W.E., Simon, M., Georgiou, Y.: HDEEM: High Definition Energy Efficiency Monitoring.Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing. S. 1-10. IEEE Press, Piscataway, NJ, USA (2014).
Ilsche, T., Schuchart, J., Cope, J., Kimpe, D., Jones, T., Knüpfer, A., Iskra, K., Ross, R., Nagel, W.E., Poole, S.: Enabling event tracing at leadership-class scale through I/O forwarding middleware. In: Epema, D.H.J., Kielmann, T., und Ripeanu, M. (Hrsg.) Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. S. 49-60. ACM (2012).
We present a practical lock-free shared data structure that efficiently implements the operations of a concurrent deque as well as a general doubly linked list. The implementation supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of doubly linked lists are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm only requires single-word compare-and-swap atomic primitives, supports fully dynamic list sizes, and allows traversal also through deleted nodes and thus avoids unnecessary operation retries. We have performed an empirical study of our new algorithm on two different multiprocessor platforms. Results of the experiments performed under high contention show that the performance of our implementation scales linearly with increasing number of processors. Considering deque implementations and systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture.