ExtractIng - Automated Metadata Extraction for Computational Engineering Applications and High-Performance Computing. Special Intrest Group Data Infrastructures University of Stuttgart, Stuttgart, 5.2.2020 (invited)
Of Hawks and Dark Data. HPSS User Forum 2019, Bloomington, Indiana, USA, 16.10.2019
The Dark Side of Data Management, 18th HLRS/hww Workshop on Scalable Global Parallel File Systems, Stuttgart, Germany, 10.4.2019 (invited)
Vorhandene FDM-Dienste und Tools an der Universität Stuttgart, Universität Stuttgart, 20.3.2019
Forschungsdatenmanagement in den computergestützten Ingenieurwissenschaften am HLRS. Deutsches Klimarechenzentrum, Hamburg. 18.2.2019 (invited)
The Genesis of EngMeta - A Metadata Model for Research Data in Computational Engineering, MTSR 2018, Limassol, Cyprus, 25.10.2018
HLRS Data Management Changes, Exeter, UK, 16.10.2018
How Filesystems can improve Research Data Management for HPC, 17th HLRS/hww Workshop on Scalable Global Parallel File Systems, Stuttgart, Germany, 18.4.2018 (invited)
Anforderungen der Ingenieurwissenschaften an ein institutionelles Forschungsdatenrepositorium der Universität Stuttgart (mit Dorothea Iglezakis (UB Stuttgart)), Workshop, 8. DINI/nestor-Workshop "Forschungsdatenrepositorien", Stuttgart, 27.11.2017
Challenges of Research Data Management for High Performance Computing, TPDL2017, Thessaloniki, Greece, 19.9.2017
Bestehende Arbeiten zum FDM am HLRS, Workshop FDM, UB Stuttgart, Stuttgart, 3.7.2017
Challenges of Research Data Management for HPC, Workshop Scientific Practice of Big Data, Stuttgart, Germany, 29.11.2016
HLRS Data Management Evolution, HPSS User Forum 2016, New York City, New York, USA, 1.9.2016
Petabyte-Datenmanagement und Verteilte Langzeit-Archivierung, High Performance Computing Workshop 2015, Leogang, Austria, 01.03.2015 - 04.03.2015 (invited)
HLRS Site Presentation, HPSS User Forum 2014, München, Germany, 28.10.2014
HLRS Site Presentation, HPSS User Forum 2013, Boulder, Colorado, USA, 6.11.2013
Research data as the true valuable good in science must be saved and subsequently kept findable, accessible and reusable for reasons of proper scientific conduct for a time span of several years. However, managing long-term storage of research data is a burden for institutes and researchers. Because of the sheer size and the required retention time apt storage providers are hard to find. Aiming to solve this puzzle, the bwDataArchive project started development of a long-term research data archive that is reliable, cost effective and able store multiple petabytes of data. The hardware consists of data storage on magnetic tape, interfaced with disk caches and nodes for data movement and access. On the software side, the High Performance Storage System (HPSS) was chosen for its proven ability to reliably store huge amounts of data. However, the implementation of bwDataArchive is not dependant on HPSS. For authentication the bwDataArchive is integrated into the federated identity management for educational institutions in the State of Baden-Württemberg in Germany. The archive features data protection by means of a dual copy at two distinct locations on different tape technologies, data accessibility by common storage protocols, data retention assurance for more than ten years, data preservation with checksums, and data management capabilities supported by a flexible directory structure allowing sharing and publication. As of September 2019, the bwDataArchive holds over 9 PB and 90 million files and sees a constant increase in usage and users from many communities.
Schembera, B., Iglezakis, D.: EngMeta -- Metadata for Computational Engineering. International Journal of Metadata, Semantics and Ontologies.14,26-38 (2020).
Selent, B., Kraus, H., Hansen, N., Schembera, B., Seeland, A., Iglezakis, D.: Management of Research Data in Computational Fluid Dynamics and Thermodynamics. Proceedings of E-Science-Tage 2019: Data to Knowledge. (2020).
Horsch, M.T., Chiacchiera, S., Seaton, M.A., Todorov, I.T., Schembera, B., Konchakova, N., Klein, P.: Reliable and interoperable computational molecular engineering: 1. Pragmatic interoperability and translation of industrial engineering problems into modelling and simulation solutions. Molecular and Mesoscopic Modelling in Chemical Engineering Data Science. (2020).
Many studies in big data focus on the uses of data available to researchers, leaving without treatment data that is on the servers but of which researchers are unaware. We call this dark data, and in this article, we present and discuss it in the context of high-performance computing (HPC) facilities. To this end, we provide statistics of a major HPC facility in Europe, the High-Performance Computing Center Stuttgart (HLRS). We also propose a new position tailor-made for coping with dark data and general data management. We call it the scientific data officer (SDO) and we distinguish it from other standard positions in HPC facilities such as chief data officers, system administrators, and security officers. In order to understand the role of the SDO in HPC facilities, we discuss two kinds of responsibilities, namely, technical responsibilities and ethical responsibilities. While the former are intended to characterize the position, the latter raise concerns---and proposes solutions---to the control and authority that the SDO would acquire.
Schembera, B., Iglezakis, D.: The Genesis of EngMeta - A Metadata Model for Research Data in Computational Engineering. In: Garoufallou, E., Sartori, F., Siatri, R., und Zervas, M. (Hrsg.) Metadata and Semantic Research. S. 127-132. Metadata and Semantic Research, Cham (2019).
In computational engineering, numerical simulations produce huge amounts of data. To keep this research data findable, accessible, inter-operable and reusable, a structured description of the data is indispensable. This paper outlines the genesis of EngMeta – a metadata model designed to describe engineering simulation data with a focus on thermodynamics and aerodynamics. The metadata model, developed in close collaboration with engineers, is based on existing standards and adds discipline-specific information as the main contribution. Characteristics of the observed system offer researchers important search criteria. Information on the hardware and software used and the processing steps involved helps to understand and replicate the data. Such metadata are crucial to keeping the data FAIR and bridging the gap to a sustainable research data management in computational engineering.
Iglezakis, D., Schembera, B.: Anforderungen der Ingenieurwissenschaften an das Forschungsdatenmanagement der Universität Stuttgart - Ergebnisse der Bedarfsanalyse des Projektes DIPL-ING. o-bib. Das offene Bibliotheksjournal.3, (2018).
Schembera, B., Bönisch, T.: Challenges of Research Data Management for High Performance Computing.International Conference on Theory and Practice of Digital Libraries. S. 140--151. International Conference on Theory and Practice of Digital Libraries (2017).
This paper targets the challenges of research data management with a focus on High Performance Computing (HPC) and simulation data. Main challenges are discussed: The Big Data qualities of HPC research data, technical data management, organizational and administrative challenges. Emerging from these challenges, requirements for a feasible HPC research data management are derived and an alternative data life cycle is proposed. The requirement analysis includes recommendations which are based on a modified OAIS architecture: To meet the HPC requirements of a scalable system, metadata and data must not be stored together. Metadata keys are defined and organizational actions are recommended. Moreover, this paper contributes by introducing the role of a Scientific Data Manager, who is responsible for the institution’s data management and taking stewardship of the data.
Schembera, B.: Myths of Simulation.The Science and Art of Simulation I. S. 51--63. The Science and Art of Simulation I (2017).
Certain myths have emerged about computer technology in general, such as the almighty electronic brain that outperforms humans in every discipline or legends about the capability of artificial intelligence. Some of these myths find echoes in the field of computer simulation, like simulation being pure number-crunching on supercomputers. This article reflects on myths about computer simulation and tries to oppose them. At the beginning of the paper, simulation is defined. Then, some central myths about computer simulation will are identified from a general computer science perspective. The first central myth is that simulation is a virtual experiment. This view is contradicted by the argument, that computer simulation is located in between theory and experiment. Furthermore, access to reality is possible indirectly via representation. The second myth is that simulation is said to be exact. This myth can be falsified by examining technical and conceptual limitations of computer technology. Moreover, arguments are presented as to why ideal exactness is neither possible nor necessary. A third myth emerges from the general overstatement of computer technology: Everything can be simulated. It will be shown that simulation can only solve problems that can be formalized and calculated—and can only produce results that are within the scope of the models they are based on.
Skvortsov, P., Schembera, B., Dürr, F., Rothermel, K.: Optimized Secure Position Sharing with Non-trusted Servers. arXiv.org.1-26 (2017).
Today, location-based applications and services such as friend finders and geo-social networks are very popular. However, storing private position information on third-party location servers leads to privacy problems. In our previous work, we proposed a position sharing approach for secure management of positions on non-trusted servers , which distributes position shares of limited precision among servers of several providers. In this paper, we propose two novel contributions to improve the original approach. First, we optimize the placement of shares among servers by taking their trustworthiness into account. Second, we optimize the location update protocols to minimize the number of messages between mobile device and location servers.
Die Durchführung einer Simulation ist geleitet durch die Idee einer formalen Übersetzung: Ein (häufig physikalisches) Gegenstandsmodell soll in ein mathematisches, dieses in ein algorithmisches Modell übersetzt werden. Dabei soll die formale Struktur erhalten bleiben. Die Praxis der Simulation zeigt nun zwar, dass diese Idee leitend ist, aber nicht nahtlos umgesetzt werden kann. Es treten vielmehr grundlegende Probleme auf, die eine direkte Übersetzung unmöglich machen. Die These des Beitrags ist: Wo immer ein Bruch in diesem Übersetzungsprozess auftritt, wird eine List ersonnen, um das, was direkt nicht möglich ist, doch, in gewisser Weise zumindest, möglich werden zu lassen. Die Simulation erweist sich dadurch also als ein ungemein listenreicher Prozess. Dies schärft den Blick für die Technik und das Handwerk der Simulation. Denn der Begriff der List ist von vornherein auf die Tricks, Kniffe, Finten und verblüffenden Effekte von Technik bezogen.
Scientific and cultural organisations, international collaborations and projects have a need to preserve and maintain access to large volumes of digital data for several decennia. Existing systems supporting these requirements span from simple databases at libraries to complex multi-tier software environments developed by scientific communities. All communities see an increasing volume of data that must be stored efficiently and economically which today, is usually a combination of a dynamic proportion of storage on magnetic disk and on magnetic tape. The bwDataArchiv project at KIT and HLRS is developing an infrastructure for secure and reliable archival storage that functions as a uniform platform for multiple scientific domains and international projects. Access to the actual storage in the data centre is enabled through an abstracted bit-preservation layer that offers features selected for long term storage such as special metadata tags, takes into account the higher latencies of tape or cloud storage and can be used for infrastructure as a service (IaaS) offerings. At the same time access to the storage remains backward compatible for existing applications. Several projects serving different communities i.e. HPC users, libraries, archives, using the interface are presented as are the collection of requirements and the architecture of the prototype implementation.
Schembera, B.: Platzierungsoptimierung für vertrauliche Verwaltung der verteilten Positionsinformationenftp://ftp.informatik.uni-stuttgart.de/pub/library/medoc.ustuttgart_fi/DIP-3102/DIP-3102.pdf (2011).