Student Dives Into Data to Predict Train Delays

11 July 2018

As a participant in the HLRS program "Simulated Worlds," high school student Niklas Knöll used machine learning to fight against a common source of complaints in the Stuttgart area.

Smartphone apps from the Deutsche Bahn and Stuttgart's local transit networks can show delays of city trains and buses in real-time. But wouldn’t it be nice to know farther in advance whether it’s really necessary to break into a 400-meter-sprint to get to the tram on time? Or if there’s still time to stop for a pretzel at the bakery?

HLRS prepares Stuttgart students for the digital world

Niklas Knöll lives in Stuttgart and is very familiar with this problem. The high school student is one of eight scholars in Simulated Worlds, an initiative funded by the Baden-Württemberg Ministry of Science, Research, and the Arts. Over the course of one school year, selected students within the Stuttgart region receive 1,000€ each in order to conduct a research project at the High-Performance Computing Center Stuttgart (HLRS), receiving support and supervision from HLRS employees. With topics related to simulation and modeling, the overall aim is to train and prepare young people for the digital realm.

With big data becoming more and more relevant in high-performance computing (HPC), the high school students were encouraged to analyze big datasets. This suggestion brought Niklas to the subject of tram delays in Stuttgart. “The German transportation company Deutsche Bahn operates a portal for accessing public data,” Niklas explains, “including scheduled and actual arrivals of trains in Stuttgart for almost two months in 2017. We decided to use this database for my research project.” At HLRS he learned to code with the programming language Python and implemented the programming framework Apache Spark; these tools are often used specifically for solving big data problems, making it possible to solve computational problems even on complex computer structures such as those of supercomputers.

Machine learning algorithms provides accurate predictions

An initial exploratory data analysis, which was conducted to get a better understanding of the underlying dataset, showed (unsurprisingly) that the S-Bahn was often delayed on workdays but mostly on time on Sundays. Although this information might seem obvious, it is in fact valuable when embedded in a larger context. The finding showed that in order to improve reliability of prediction for each day, different regression models need to be applied; e.g., because on Thursdays the delays often exceed 15 minutes and statistical outliers need to be taken into account.

These models were trained with machine learning algorithms in order to predict train delays to the minute, categorizing the outcomes into the binary classes “delayed” and “not delayed.” In 8 out of 10 cases this approach could clearly predict whether a city train would leave at least three minutes late — allowing Niklas to occasionally skip his 400-meter sprint.

Students develop and demonstrate strong technical skills

Admittedly, from a statistical point of view, this result is not very satisfactory because the database Niklas used was too small for a big data problem. But Simulated Worlds’ aspiration is not to produce scientific breakthroughs, but to provide the participating scholars with advanced technical competencies and problem-solving skills.

Eight high school students successfully participated
in the Simulated Worlds program. (Photo: HLRS)

These skills were on display at an awards ceremony at HLRS on July 4, when Niklas and the other participating students presented their projects and results. “In addition to data analysis” Simulated Worlds project coordinator Doris Lindner explains, “the topics our students dealt with ranged from simulation of blood flows and the programming of a coffee machine to reflections on the philosophy of technology. At the presentation and Q&A session, everyone was impressed by how thoroughly they studied their subject.”

Students from Karlsruhe were also able to prove their technical know-how in Simulated Worlds. The results of the projects, which were conducted at the initiative’s partner Steinbuch Center for Computing (SCC), were presented on June 22 in Karlsruhe.

 — Lena Bühler