International Project Will Create Data Infrastructure for Pandemic Research

Keyvisual image main

ORCHESTRA will develop a data repository constituting a large population cohort based on cohorts from multiple countries. The result will support studies to improve public health and vaccine strategies for tackling COVID-19.

The ongoing coronavirus pandemic has been an incremental learning experience for much of the world. This is true not only with respect to the complex biology and pathology of the SARS-CoV-2 virus, but also concerning the challenge of understanding disease transmission, controlling pandemic spread, and providing the best care for infected individuals.

During the pandemic, medical centers in Europe and around the world have accumulated masses of patient data describing patient characteristics, treatment plans, and disease progression. Currently, however, patient data is largely gathered and saved by different healthcare providers in their own local databases, with few connections between them. Because modern biological and epidemiological studies benefit from the computational analysis of large datasets, bringing this information together would provide investigators with a valuable resource that could be mined for clues to improve the fight against the disease.

The High-Performance Computing Center Stuttgart (HLRS), working as a member of a consortium of 27 centers for public health and high-performance computing from 15 countries in Europe, Africa, South America, and Asia, aims to help address this challenge. As part a newly funded project called ORCHESTRA (Connecting European Cohorts to Increase Common and Effective Response to SARS-CoV-2 Pandemic), HLRS will help develop a data infrastructure for collecting and analyzing patient data from across Europe and other parts of the world.

"Although HLRS itself doesn't conduct biological or epidemiological research, the kinds of computing resources and expertise that we provide are going to be very useful for ending the COVID pandemic," said HLRS Director Prof. Michael Resch. "We are delighted to be working with such a diverse, multidisciplinary team to support scientists and clinicians in this important and potentially high-impact effort."

The three-year project is being led by Prof. Evelina Tacconelli at the University of Verona, Italy. The project budget of nearly €20 million is funded by the European Union’s Horizon 2020 research and innovation program under the ERAvsCORONA ACTION PLAN, which was developed jointly by Commission services and national authorities to address the pandemic.

HLRS will focus on building the computing infrastructure and data management framework for collecting, storing, integrating, and sharing critical data related to the pandemic.

"For a data engineer, this work will be exciting both as a technical challenge and because it offers an opportunity to contribute useful expertise in the fight against COVID-19." says Dr. Björn Schembera, a researcher in data management who will lead HLRS's participation in ORCHESTRA. "Bringing together cohort analysis and advanced data technology will give epidemiologists new insights into the pandemic.“

A resource for COVID-19 research

The large patient cohort that ORCHESTRA will create will support a range of multidisciplinary studies, including genetic, epigenetic, immunological, epidemiological, and other types of approaches. The resources will enable retrospective analyses of risk factors for disease acquisition and progression, as well as prospective follow-up aimed at exploring long-term consequences of the virus. Such knowledge will be valuable for preparing for and managing potential future waves of COVID-19 spread, or other kinds of future pandemics.

The knowledge gleaned from the study of these cohorts will inform strategies for addressing key public health challenges facing Europe. These include protecting fragile populations such as the elderly or individuals with compromising health conditions, reducing risks for frontline health care staff, addressing long-term consequences of COVID-19 on the health and well-being of individuals, analyzing vaccination response, and understanding the impact of environmental factors, socio-economic determinants, lifestyle and confinement measures on the spread of COVID-19.

In coordination with the European Commission, the ORCHESTRA team will consult with the European Centre for Disease Prevention and Control (ECDC) and the European Medicines Agency (EMA), in particular when it comes to making available data in real time that can be of value for shaping the continuously evolving public health and vaccine strategies.

Data infrastructure needed to improve cohort analysis

Working together with scientists at the high-performance computing centers CINECA (Italy) and CINES (France), HLRS will contribute to the development of the data infrastructure needed to support ORCHESTRA's goals.

Each of the IT partners involved in the project will establish a national hub that will be responsible for collecting data from national data providers in a standardized manner, and ensuring that it is safe as required by data protection regulations. A cloud-based ORCHESTRA portal will then offer an online interface for accessing, sharing, and linking together data stored by the various national hubs. The portal will also include tailored artificial intelligence and analytics tools.

HLRS will help to build a federated research architecture for cohort analysis based on three layers: national data providers, national hubs, and the centralized ORCHESTRA portal. The center will oversee the implementation of national hubs among the project partners and act as the national hub for cohort data in Germany. Moreover, HLRS will participate in the design and implementation of the ORCHESTRA portal platform.

For this to function well, data will need to be organized in a consistent way across the national hubs, including through the use of standardized metadata. HLRS and its partners will therefore also establish standards for harmonizing the collection and management of data. For scientists, this ability to combine datasets more seamlessly will offer the opportunity to run more robust, more refined analyses.

Christopher Williams

(Parts of this article were adapted from a press release by the University of Verona.)