Many studies in big data focus on the uses of data available to researchers, leaving without treatment data that is on the servers but of which researchers are unaware. We call this dark data, and in this article, we present and discuss it in the context of high-performance computing (HPC) facilities. To this end, we provide statistics of a major HPC facility in Europe, the High-Performance Computing Center Stuttgart (HLRS). We also propose a new position tailor-made for coping with dark data and general data management. We call it the scientific data officer (SDO) and we distinguish it from other standard positions in HPC facilities such as chief data officers, system administrators, and security officers. In order to understand the role of the SDO in HPC facilities, we discuss two kinds of responsibilities, namely, technical responsibilities and ethical responsibilities. While the former are intended to characterize the position, the latter raise concerns---and proposes solutions---to the control and authority that the SDO would acquire.
Schembera, B., Iglezakis, D.: The Genesis of EngMeta - A Metadata Model for Research Data in Computational Engineering. In: Garoufallou, E., Sartori, F., Siatri, R., und Zervas, M. (Hrsg.) Metadata and Semantic Research. S. 127-132. Metadata and Semantic Research, Cham (2019).
In computational engineering, numerical simulations produce huge amounts of data. To keep this research data findable, accessible, inter-operable and reusable, a structured description of the data is indispensable. This paper outlines the genesis of EngMeta – a metadata model designed to describe engineering simulation data with a focus on thermodynamics and aerodynamics. The metadata model, developed in close collaboration with engineers, is based on existing standards and adds discipline-specific information as the main contribution. Characteristics of the observed system offer researchers important search criteria. Information on the hardware and software used and the processing steps involved helps to understand and replicate the data. Such metadata are crucial to keeping the data FAIR and bridging the gap to a sustainable research data management in computational engineering.
Iglezakis, D., Schembera, B.: Anforderungen der Ingenieurwissenschaften an das Forschungsdatenmanagement der Universität Stuttgart - Ergebnisse der Bedarfsanalyse des Projektes DIPL-ING. o-bib. Das offene Bibliotheksjournal.3, (2018).
Schembera, B., Bönisch, T.: Challenges of Research Data Management for High Performance Computing.International Conference on Theory and Practice of Digital Libraries. S. 140--151. International Conference on Theory and Practice of Digital Libraries (2017).
This paper targets the challenges of research data management with a focus on High Performance Computing (HPC) and simulation data. Main challenges are discussed: The Big Data qualities of HPC research data, technical data management, organizational and administrative challenges. Emerging from these challenges, requirements for a feasible HPC research data management are derived and an alternative data life cycle is proposed. The requirement analysis includes recommendations which are based on a modified OAIS architecture: To meet the HPC requirements of a scalable system, metadata and data must not be stored together. Metadata keys are defined and organizational actions are recommended. Moreover, this paper contributes by introducing the role of a Scientific Data Manager, who is responsible for the institution’s data management and taking stewardship of the data.
Schembera, B.: Myths of Simulation.The Science and Art of Simulation I. S. 51--63. The Science and Art of Simulation I (2017).
Certain myths have emerged about computer technology in general, such as the almighty electronic brain that outperforms humans in every discipline or legends about the capability of artificial intelligence. Some of these myths find echoes in the field of computer simulation, like simulation being pure number-crunching on supercomputers. This article reflects on myths about computer simulation and tries to oppose them. At the beginning of the paper, simulation is defined. Then, some central myths about computer simulation will are identified from a general computer science perspective. The first central myth is that simulation is a virtual experiment. This view is contradicted by the argument, that computer simulation is located in between theory and experiment. Furthermore, access to reality is possible indirectly via representation. The second myth is that simulation is said to be exact. This myth can be falsified by examining technical and conceptual limitations of computer technology. Moreover, arguments are presented as to why ideal exactness is neither possible nor necessary. A third myth emerges from the general overstatement of computer technology: Everything can be simulated. It will be shown that simulation can only solve problems that can be formalized and calculated—and can only produce results that are within the scope of the models they are based on.
Skvortsov, P., Schembera, B., Dürr, F., Rothermel, K.: Optimized Secure Position Sharing with Non-trusted Servers. arXiv.org.1-26 (2017).
Today, location-based applications and services such as friend finders and geo-social networks are very popular. However, storing private position information on third-party location servers leads to privacy problems. In our previous work, we proposed a position sharing approach for secure management of positions on non-trusted servers , which distributes position shares of limited precision among servers of several providers. In this paper, we propose two novel contributions to improve the original approach. First, we optimize the placement of shares among servers by taking their trustworthiness into account. Second, we optimize the location update protocols to minimize the number of messages between mobile device and location servers.
Die Durchführung einer Simulation ist geleitet durch die Idee einer formalen Übersetzung: Ein (häufig physikalisches) Gegenstandsmodell soll in ein mathematisches, dieses in ein algorithmisches Modell übersetzt werden. Dabei soll die formale Struktur erhalten bleiben. Die Praxis der Simulation zeigt nun zwar, dass diese Idee leitend ist, aber nicht nahtlos umgesetzt werden kann. Es treten vielmehr grundlegende Probleme auf, die eine direkte Übersetzung unmöglich machen. Die These des Beitrags ist: Wo immer ein Bruch in diesem Übersetzungsprozess auftritt, wird eine List ersonnen, um das, was direkt nicht möglich ist, doch, in gewisser Weise zumindest, möglich werden zu lassen. Die Simulation erweist sich dadurch also als ein ungemein listenreicher Prozess. Dies schärft den Blick für die Technik und das Handwerk der Simulation. Denn der Begriff der List ist von vornherein auf die Tricks, Kniffe, Finten und verblüffenden Effekte von Technik bezogen.
Scientific and cultural organisations, international collaborations and projects have a need to preserve and maintain access to large volumes of digital data for several decennia. Existing systems supporting these requirements span from simple databases at libraries to complex multi-tier software environments developed by scientific communities. All communities see an increasing volume of data that must be stored efficiently and economically which today, is usually a combination of a dynamic proportion of storage on magnetic disk and on magnetic tape. The bwDataArchiv project at KIT and HLRS is developing an infrastructure for secure and reliable archival storage that functions as a uniform platform for multiple scientific domains and international projects. Access to the actual storage in the data centre is enabled through an abstracted bit-preservation layer that offers features selected for long term storage such as special metadata tags, takes into account the higher latencies of tape or cloud storage and can be used for infrastructure as a service (IaaS) offerings. At the same time access to the storage remains backward compatible for existing applications. Several projects serving different communities i.e. HPC users, libraries, archives, using the interface are presented as are the collection of requirements and the architecture of the prototype implementation.
Schembera, B.: Platzierungsoptimierung für vertrauliche Verwaltung der verteilten Positionsinformationenftp://ftp.informatik.uni-stuttgart.de/pub/library/medoc.ustuttgart_fi/DIP-3102/DIP-3102.pdf (2011).