Marius Zumwald (Institute for Environmental Decisions, ETH Zurich)
In climate science, observational datasets serve as evidence for scientific claims and they are used to calibrate and to evaluate models. However, datasets only represent selected aspects of the real world and are subject to uncertainty. Here we present a theoretical framework for understanding representational uncertainties of datasets that distinguishes three general sources of uncertainty: (1) uncertainty that arises during the generation of the dataset; (2) uncertainty due to biased samples; and (3) uncertainty that arises due to the choice of abstract properties, such as the resolution and the metric. Based on this framework, we identify four different types of dataset ensembles – parametric, structural, resampling and property ensembles – as tools to estimate and assess dataset uncertainties. We advocate for a more systematic generation of dataset ensembles and discuss how this might be achieved. We focus on gridded datasets that are based on in-situ measurements but also discuss uncertainties of low-cost sensors and under what conditions data from such sensors might be appropriately used for scientific purposes. We argue that a more systematic understanding and assessment of representational dataset uncertainty allows for a more reliable uncertainty assessment, and that its more systematic use would be beneficial for both scientific reasoning and scientific policy advice based on climate datasets.