The task only sounds underwhelming at first glance: A team consisting of members of the University of Stuttgart’s departments of literary studies, computational linguistics and computer science is planning to set up a database known as the Science Data Center for Literature (SDC4Lit) for literary studies, which researchers believe will be useful for their future work: “The great thing is that it is an attempt to think about and combine research and infrastructure,” explains Sandra Richter, director of the DLA and professor of modern German literature at the University of Stuttgart. “The main question concerns the data and format in which literature should be recorded to make it useable. And this question is not new. In light of digitalization, it is now possible to answer it in a very specific way.”
Because works of literature are now created on computers and published on digital platforms, the objective of the SDC4Lit is to provide a platform on which they can be researched. It first has to record and archive the texts in question, which researchers will then be able to evaluate using intelligent, digital tools. The results are to be made available to researchers and the public via the SDC4Lit. The state of Baden-Württemberg is funding the project with €1.8 million.
“Ideally, we will store text documents in a fully searchable format. The question is what data does one record to do this?” explains Richter. This goes far beyond the place and year of publication and the name of the author. When it comes to authors in exile, for example, one might wish to know about the history of the text, where it came from, or how much it sold for at auction. More such text-related metadata can be archived in digital format than used to be possible in library index cards. “The relevant data could be continuously expanded for researchers to work on and generate their own analyses based on the metadata,” Richter explains. For example: “What is known about the provenance of a given text? What about similar texts? Could more general conclusions be drawn from this?” The larger the corpus, the broader such questions could be.
The relevant software and storage systems have already been chosen, says the researcher about the status of the SDC4Lit, which was launched in 2019. “We can now start to upload the texts.” The objective is to establish concepts for working with the texts by the end of the project in 2023. “In doing so we are working in completely unchartered waters,” says Prof. Michael Resch, head of the University of Stuttgart’s High Performance Computing Center (HLRS) and the University's Institute for High Performance Computing. This is because the metadata pertaining to literature differs from the technical data with which the HLRS usually works.
A new virtual database for literary studies will enhance the collection at the German Literature archive. (Photo: Wikimedia Commons)
“When I refer to a flow simulation or something of that nature, it usually involves technical parameters that have been standardized within the research community for many years,” Resch explains. “When it comes to pure library research, it would be similar in literature. But, rather than digitizing books, what we want to do is to map a creative process that doesn't follow standardized norms.” Resch's team provides the means to store digital data for the next 20, 30, or 40 years, as well as the information management know-how.
To explain the features of this terra incognita, Resch uses the origins of literature as an example: “For example,” he says, “one can view the original manuscripts of Franz Kafka at the DLA. One can see that he's crossed out a word and replaced it with another.” In the case of digital literature, one is faced with an infinite process of change; the author can modify the text every day. “It’s about recording a creative process and deriving something from it that I can use in the following creative process.”
As Richter explains, the fact that DLA and HLRS are partners within the SDC4Lit is extremely beneficial: “Both institutions have the capacity to host infrastructures such as these on a long-term basis and make the data available to the research community, which will then be able to use it.” Something similar, he adds, is planned in projects being run by the German National Research Data Infrastructure (NFDI), which is planning is to make all German humanities data accessible, networked, and permanently utilizable. It will initially provide funding for these projects, but the plan is for them to continue independently over the long term. The major German humanities institutes are currently applying for this funding as part of the “Text+” consortium and plan to digitize language and text data from various collections, editions, and lexical resources. “I could imagine the SDC4Lit participating in this,” says Richter. The DLA, which maintains one of the most important libraries for German-language literature and literary studies, could eventually upload its holdings to the NFDI project. If the development of the digital sphere continues at an extremely fast pace and partly replaces analogue resources, we will have to provide whatever resources we can provide. Our resources are finite of course, but our core mission encompasses this very area and therefore everything that is created in the digital space.”
— Daniel Völpel
This article was first published in the March 2021 issue of "forschung leben," the magazine of the University of Stuttgart. Republished with permission.