HLRS Will Help Build National Data Research Infrastructure for Catalysis Research

09 October 2020

By developing a repository for sharing and accessing research data, and standards for catalysis-related data management, HLRS will create a basis for interdisciplinary computational research in chemistry and chemical engineering.

A consortium including the High-Performance Computing Center Stuttgart (HLRS) has been awarded a grant of more than €10 million from the Deutsche Forschungsgemeinschaft (DFG) to create a National Research Data Infrastructure for Catalysis-Related Sciences (NFDI4Cat). The consortium, led by the nonprofit chemical society DECHEMA (Gesellschaft für Chemische Technik und Biotechnologie e.V.) and involving representatives from 15 additional partner institutions, will develop infrastructure, software, and data management standards to empower the next generation of chemical engineering research. NDFI4Cat is one of nine new consortia that will contribute to the construction of a German National Research Data Infrastructure.

As one of four members of the NFDI4Cat coordination group, HLRS will create and host a data repository for catalysis-related research, including a portal for sharing and accessing data stored at multiple locations. In addition, HLRS will play a major role in an effort to establish standardized metadata and ontologies for catalysis research that will ensure compatibility among different data sets, increasing their usability and amplifying their potential impact for scientific progress.

"We are very pleased that HLRS will be participating in the development of a National Research Data Infrastructure," said HLRS Director Prof. Dr.-Ing. Michael Resch. "Working together with partners in the catalysis research community, this project should offer outstanding opportunities to accelerate research in a field that is not only of great economic importance, but that also holds keys to addressing some of our greatest global challenges."

Transforming catalysis into a computational science

Catalysis and chemical engineering are essential disciplines for producing many materials we use in our daily lives. They also offer the potential to address some of humanity's most pressing problems. Developing new technologies for reducing CO2 emissions, avoiding plastic waste, or producing sustainable fertilizers to meet the nutritional needs of a growing global population, for example, are all areas in which catalysis and chemical engineering have important roles to play.

Catalysis research, like many scientific fields, increasingly relies on computational methods that support a continual dialogue between theory, simulation, and experiment. Based on rapidly growing collections of high-throughput experimental data, data science methods can be used to predict relationships between the chemical structures of catalysts and their activities. At the same time, simulation can provide valuable guidance for optimizing the design of chemical reactors and processes. As data accumulate, the ability to integrate knowledge from different disciplines, looking at all levels of catalytic reactions — from the physical chemistry of individual molecules all the way up to process engineering — would also provide fundamental knowledge that would be of great use to researchers across the field.

NFDI4Cat Data Infrastructure

NFDI4Cat is planning a distributed data infrastructure using a shared metadata concept that makes it possible to access data repositories from a central portal. (Image courtesy of NFDI4Cat)

To exploit the full potential of this trend, however, new kinds of computing infrastructure and methods are needed. Although the usage of data science in catalysis research has been growing, scientists too often work in relative isolation from one another. The result is that data is typically collected in proprietary formats, is not organized using standardized metadata description, is not saved in places where it is accessible to other researchers, and is not linked to related publications and published data sets. The NFDI4Cat project intends to address these problems by creating a shared, comprehensive framework for pooling and managing catalysis research data.

Establishing catalysis metadata standards

Between 2017 and 2019, HLRS contributed significantly to a research project called DIPL-ING, which developed a system for research data management in computational engineering. The result was a metadata model called EngMeta, which HLRS and the University of Stuttgart library now use. In addition, researchers developed a method called ExtractIng that could automatically extract metadata from research data sets and transform it to EngMeta. This approach relieves investigators of often tedious and time-consuming work, a goal that will be important to the success of the NFDI4Cat project.

In this new project, HLRS will build on EngMeta to create an ontology — a set of categories for organizing the knowledge contained in all relevant datasets — for the management of data in catalysis research. Such an ontology would include metadata that generally describe the data, technical metadata about the data objects contained in the dataset, process metadata describing the methods and experimental or computational hardware used to generate the data, and domain-specific data related to the specific field of research in which the data were generated.

Researchers in the NFDI4Cat consortium envision integrating the ontology in two other complementary software platforms: Piveau, an open source data management ecosystem developed at FOKUS (Fraunhofer Institute for Open Communication Systems), and CaRMen, software developed at the Karlsruhe Institute of Technology for analyzing physical and chemical models against experimental data.

Ultimately, by standardizing metadata frameworks, this project aims to bring catalyst research data in line with the so-called FAIR principles for scientific data management (findability, accessibility, interoperability, and reusability). NFDI4Cat will also implement stringent quality assurance methods, including prompts to follow best practices during data ingestion, to ensure that all data that enter the repository are of high quality and labeled correctly.

HLRS to develop and host NFDI4Cat data repository

In addition to developing the conceptual framework for organizing catalysis research data, HLRS will also play a central role in the development of the infrastructure for hosting and sharing data.

NFDI4Cat will be based on a distributed repository infrastructure. (See figure above.) Data will be stored across a variety of servers, making it possible for institutions that want to share data with the community to participate, even if policies prevent them from storing it on an external server. A middleware layer at the core of the service will connect the various repositories, and a graphical user interface will provide a portal for users to access data across the network.

To ensure that the catalysis community adopts the repository and its user interface, HLRS, together with FOKUS, will hold meetings with representatives from academia and industry to gather information about their needs. Discussions will explore questions related to the state of the art in data management hardware and software technologies, how to make ingestion of data and metadata into the system as easy as possible, and legal requirements for protecting intellectual property and data access. Once the project requirements are clarified, HLRS will work with FOKUS to develop the hardware and software backbone of the catalysis data repository.

In addition, HLRS will provide two server systems for the repository, as well as hard disk storage of approximately 100 TB and up to 1 PB of background storage on tape.

Networking to leverage the power of data

In addition to maintaining close dialogue with the catalysis research community, the NFDI4Cat will also engage with other NFDI centers on issues of shared interest to ensure that data is useful in other fields. This includes centers focused on developing national research data infrastructures for chemistry (NFDI4Chem), engineering (NFDI4Ing), and photon and neutron experiments (DAPHNE), which share overlapping research concerns. NFDI4Cat will also coordinate with other relevant research institutes, programs, and data resources outside the NFDI network to identify potential synergies.

Developing a more robust infrastructure for sharing and organizing data offers the near-term possibility of transforming catalysis research. By moving toward a more open data structure, scientists could gain new computational power for understanding reaction mechanisms and kinetics, pursue more efficient, rational approaches to catalyst design, and gain new kinds of insights that arise through interdisciplinary research. In this way, NFDI4Cat should itself serve as a catalyst for new kinds of discoveries.

Christopher Williams