Journal topic
Hydrol. Earth Syst. Sci., 24, 3157–3188, 2020
https://doi.org/10.5194/hess-24-3157-2020
Hydrol. Earth Syst. Sci., 24, 3157–3188, 2020
https://doi.org/10.5194/hess-24-3157-2020

Research article 19 Jun 2020

Research article | 19 Jun 2020

# The accuracy of weather radar in heavy rain: a comparative study for Denmark, the Netherlands, Finland and Sweden

The accuracy of weather radar in heavy rain: a comparative study for Denmark, the Netherlands, Finland and Sweden
Marc Schleiss1, Jonas Olsson2, Peter Berg2, Tero Niemi3,5, Teemu Kokkonen3, Søren Thorndahl4, Rasmus Nielsen4, Jesper Ellerbæk Nielsen4, Denica Bozhinova2, and Seppo Pulkkinen5,6 Marc Schleiss et al.
• 1Dept. of Geoscience and Remote Sensing, Delft University of Technology, Delft, the Netherlands
• 2Hydrology Research Unit, Swedish Meteorological and Hydrological Institute SMHI, Norrkoping, Sweden
• 3Dept. of Built Environment, Aalto University, Espoo, Finland
• 4Dept. of Civil Engineering, Aalborg University, Aalborg, Denmark
• 5Finnish Meteorological Institute FMI, Helsinki, Finland
• 6Faculty of Electrical and Computer Engineering, Colorado State University, Fort Collins, USA

Correspondence: Marc Schleiss (m.a.schleiss@tudelft.nl)

Abstract

1 Introduction

The ability to measure short-duration, high-intensity rainfall rates is of paramount importance in predicting hydrological response. Indeed, several studies have shown that the resolution of the rainfall data directly impacts the shape, timing and peak flow of hydrographs . Previous research has shown that in order to obtain reliable results in small urban catchments, the rainfall data should have a resolution of at least 10 min and 1 km . If the resolution is insufficient compared with what is needed for the runoff simulations, the accuracy of flood predictions is likely to be compromised .

Since radar measurements are inherently prone to errors and knowledge about microphysical processes in clouds and rain is limited, post-processing plays an important role. In addition to using better hardware, many weather services now offer gridded, quantitative rainfall products that combine measurements from different radar systems and have been corrected for various types of biases using rain gauges and other sources of information such as elevation, cloud cover and satellite imagery . During post-processing, many systematic biases due to attenuation, calibration, vertical variability and range effects are mitigated . However, rain gauge data also contain errors and biases, the most important of which is an underestimation of the rainfall intensity due to local wind effects. For regular events, errors usually remain on the order of 5 %–10 %. However, during heavy rain events, wind-induced biases can exceed 30 % . As a result, post-processed radar products might still contain important residual errors . For example, , , and highlighted several major quality issues affecting post-processed quantitative precipitation estimates from NEXRAD, including range-dependent and intensity-dependent biases. Quantifying these residual errors and studying their propagation in hydrological models is crucial for improving the timing and accuracy of flood predictions . For example, in their study, estimated that the propagation of biased radar measurements in urban drainage models could result in up to 30 %–45 % errors in terms of peak flow magnitude. To limit error propagation, recommended that the bias affecting areal-averaged rainfall intensities should not exceed 10 %.

Over the years, each country has developed its own strategy for mitigating errors and biases in operational radar rainfall estimates. However, since there is no common benchmark and few international studies are available, the merits and weaknesses of each approach remain difficult to quantify objectively. This study sheds new light on current performances by conducting a multinational assessment of radar's ability to capture heavy rain events at scales of 5 min up to 2 h. In total, six different radar products across four European countries (i.e., Denmark, the Netherlands, Finland and Sweden) are considered. Special emphasis is put on analyzing the performance during the 50 most intense events over the last 10–15 years. By comparing different types of radar products (C-band versus X-band, single versus dual polarization) and identifying the main sources of errors and biases across scales, important recommendations about how to improve the accuracy of quantitative precipitation estimates for flash flood prediction and urban pluvial flooding can be drawn. The rest of this paper is organized as follows: Sect. 2.1 explains the methodology used to select events and extract the gauge and radar data. Section 2.2 gives a detailed description of the radar products used for the analysis. Section 2.3 introduces the statistical models used to quantify the bias between gauges and radar. Section 3 presents the results and Sect. 4 summarizes the main conclusions.

2 Data and methods

## 2.1 Event selection and data extraction methods

Event selection was done based on rainfall time series from the national networks of automatic rain gauges in Denmark, the Netherlands, Finland and Sweden. Due to data availability and quality, only a small subset of all the existing gauges was used for analysis (i.e., 66 gauges for Denmark, 35 for the Netherlands, 64 for Finland and 10 for Sweden). Table 1 provides an overview of the number of gauges used, their temporal resolutions and the length of the observational records for each country. Note that Denmark has two separate rain gauge networks. The first is operated by the Danish Meteorological Institute DMI and consists of OTT Pluvio2 weighing gauges (Vejen2006; Thomsen2016). The second belongs to the Water Pollution Committee of the Society of Danish Engineers and consists of RIMCO tipping bucket gauges . For this study, only the RIMCO tipping buckets were used. In the Netherlands, precipitation is measured using the displacement of a float in a reservoir (KNMI2000). The 10 min data from 2008 to 2018 used in this study have been validated internally by the Royal Netherlands Meteorological Institute KNMI using a combination of automatic and manual quality control tests. In Finland, weighting gauges of type OTT Pluvio2 are used. Observations are made using a wind protector according to World Meteorological Organization regulations (WMO2008). Automatic quality control tests are used to flag suspicious values which are then double-checked manually by human experts. In Sweden, gauges are vibrating wire load sensors of type GEONOR with an oil film to keep evaporation at very low amounts.

Table 1Rain gauge datasets used to determine the top 50 rainfall events for each country. The time periods were chosen based on radar data availability.

Based on the available gauge data, the top 50 rain events (in terms of peak intensity) were determined for each country and observation period. For every gauge, a continuous 6 h dry period was used to separate events from each other. This was done separately for each gauge, which means that some events were included multiple times in the dataset given that they were observed by different gauges at different locations. To ensure quality, each identified event was subjected to a visual quality control test by human experts, making sure the rainfall rates recorded by the gauges and the radar (see Sect. 2.2) were plausible and consistent with each other in terms of their temporal structure. Cases for which the gauge or radar data were incomplete, obviously wrong or inconsistent with each other were removed and replaced by new events until the total number of events that passed the quality control tests reached 50 for each country. Overall, about 10 % of the originally identified events had to be removed and replaced by new ones during these quality control steps, most of them because of incomplete or erroneous radar data.

The radar data for each country were extracted according to the following procedure. First, the four radar pixels closest to a given rain gauge were extracted. The four radar rainfall time series were then aggregated in time (i.e., averaged) to match the temporal sampling resolution of the considered rain gauge. Then, for each time step, the value among the four radar pixels that best matched the gauge was kept for comparison. The motivation behind this type of approach is that it can account for small differences in location and timing between radar and gauge observations due to motion, wind and vertical variability . Note that this is a rather conservative and favorable way of comparing gauges with radar that leads to smaller overall discrepancies and more robust results than pixel-by-pixel comparisons. Other less favorable ways of extracting the radar data were also tested (e.g., using inverse distance weighted interpolation or the maximum value among the nearest neighbors). However, these only resulted in higher discrepancies and did not change the main conclusions and were therefore abandoned in subsequent analyses.

Figure 1 shows a map with the location of all rain gauges used for the final, quality-controlled rain event catalog for each country. As can be seen in Fig. 2, the final catalog includes a large variety of rain events, ranging from single isolated convective cells to large organized thunderstorms and mesoscale complexes. Additional tables summarizing the starting time, duration, amount and peak rainfall intensity for each event and country are provided in the Appendix (see Tables A1A5). Because events were selected based on peak intensity, it is not surprising to see that all of them occurred in the warm season between May and September, during which convective activity is at its maximum (see Fig. 3). Similar analyses confirm that the events mostly occurred during the afternoon and late evening hours, in agreement with the diurnal cycle of convective precipitation and rainfall intensity at mid-latitudes .

Figure 1The four considered study areas in Denmark, the Netherlands, Finland and Sweden with the used rain gauges (black dots) and the location of the C-band radars marked by black crosses. The dashed lines denote circles of 100 km radius around each radar. Due to maintenance and relocations, not all the radars were operating at the same time.

Figure 2Snapshots of the radar rainfall estimates (in mmh−1) at the time of peak intensity for the 3 most intense events in each country. Each map is a square of size 60×60 km2 with the gauge located in the center of the domain.

Figure 3Distribution of the 50 top events over the month (a) and hour of the day (b).

This section gives a brief overview of the different radar products used for the analyses. A short summary of the most important characteristics of each product is provided in Table 2.

Table 2Radar products used in this study.

### 2.2.1 Radar data for Denmark

The weather radar network of the Danish Meteorological Institute (DMI) operates four 5.625 GHz C-band pulse radars with 1 beam width and 250 kW peak power located in Rømø, Sindal, Stevns, Virring and Bornholm . New dual-polarization radars were installed at all sites between 2008 and 2017. However, for this study, only the single-polarization data from the Stevns radar were used. The latter is located near the coast, at 55.326 N 12.449 E and 53 m elevation, approximately 40 km south of Copenhagen in an area of relatively flat topography with altitudes ranging from −7 to 125 m above mean sea level. It was purchased in 2002 from Electronic Enterprise Corporation (EEC) and is operated using a combination of EEC and DMI software. The scanning strategy involves collecting reflectivity measurements at nine different elevation angles of 0.5, 0.7, 1.0, 1.5, 2.4, 4.5, 8.5, 13.0 and 15.0 with a range resolution of 500 m and a maximum range of 240 km. The reflectivity measurements Z (dBZ) at these nine elevations are projected to a pseudo-constant altitude plan position indicator (PCAPPI) at 1000 m height to generate a high-resolution gridded product with 10 min temporal resolution and 500×500 m2 grid spacing . The temporal resolution of the PCAPPI is then statistically enhanced to 5 min using an advection interpolation scheme . Ground clutter in the PCAPPI is removed by filtering out echoes with Doppler velocity smaller than 1 ms−1. Rainfall-induced attenuation K is estimated as $K=\mathrm{6.9}×{\mathrm{10}}^{-\mathrm{5}}{Z}^{\mathrm{0.67}}$ (dBZ km−1) and attenuation-corrected reflectivity estimates are converted to rainfall rates R based on a fixed Marshall–Palmer ZR relationship given by Z=200R1.6. To take into account calibration errors and variations in raindrop size distributions, a daily mean field bias correction is applied to the high-resolution radar rainfall estimates based on the measurements from a network of 66 RIMCO tipping bucket rain gauges in the region operated by the Water Pollution Committee of the Society of Danish Engineers . Note that the final 500 m, 5 min bias-corrected product used in this study is not operational but has been developed for research purposes by Aalborg University.

### 2.2.3 Radar data for Finland

The Finnish radar product is an experimental product from the Finnish Meteorological Institute (FMI) OSAPOL project, which differs from the operational product used by the FMI mainly by making a better use of dual polarization. The product is based on the data from the years 2013–2016, during which the old single-polarization radars were being replaced by C-band dual-polarization Doppler radars. The product is therefore based on data from four to eight dual-polarization radars depending on how many were available each year. The beam width is 1, the range resolution is 500 m and the scanning is done in pulse pair processing (PPP) mode. Doppler filtering is done first in the signal processing stage, and reflectivity measurements are calibrated based on solar signals . Next, non-meteorological targets are removed using statistical clutter maps and fuzzy-logic-based HydroClass classification by Vaisala . The reflectivity Z is attenuation-corrected and the differential phase shift Kdp is estimated using the method described in . For hydrometeors classified as liquid precipitation, two alternative rain rate conversions are used. For heavy rain, i.e., Kdp >0.3 and Z>30 dBZ, the R(Kdp) relation given by R=21 Kdp0.72 is used . For low to moderate intensities, i.e., Kdp ≤0.3 or Z≤30 dBZ, and for radar bins where HydroClass indicates non-liquid precipitation, a fixed Z(R) relation given by Z=223R1.53 is used . Using the estimated rainfall rates at the four lowest elevation angles, a PCAPPI at 500 m height is produced using inverse distance-weighted interpolation with a Gaussian weight function. Finally, a composite VPR correction map is applied to the PCAPPI to generate a 1×1 km2 and 5 min resolution product. The OSAPOL is the only radar product in this study that is not gauge-adjusted.

## 2.3 Comparison of radar and gauge measurements

Since radar and gauges measure rainfall at different scales using different measuring principles, one can not expect a perfect agreement between the two. Gauges are more representative of point rainfall measurements on the ground, while radar provides averages over large-resolution volumes several hundreds of meters above the ground. In addition, each sensor has its own measurement uncertainty and limitations in times of heavy rain. Gauges are known to underestimate intensity by up to 25 %–30 % in heavy rain and windy conditions . On the other hand, radar is known to suffer from signal attenuation, non-uniform beam filling, clutter, hail contamination and overshooting . Missing data in one or both of the sensors also further complicate the comparison . Therefore, the main goal here will not be to make a statement about which sensor comes closest to the truth, but to quantify the average discrepancies between the gauge and radar measurements as a function of the event, timescale, intensity and radar product. Such information can be useful to monitor the performance and consistency of operational radar and gauge products or study the propagation of rainfall uncertainties in hydrological models .

### 2.3.1 Bias estimation

Discrepancies between radar and gauge observations are assessed with the help of a multiplicative error model:

$\begin{array}{}\text{(1)}& {R}_{\mathrm{r}}\left(t\right)=\mathit{\beta }\cdot {R}_{\mathrm{g}}\left(t\right)\cdot \mathit{\epsilon }\left(t\right),\end{array}$

where Rr(t) (in mmh−1) denote the radar measurements at time t, Rg(t) (in mmh−1) the gauge measurements, and β (–) the multiplicative bias and ε(t) (–) independent, identically distributed random errors drawn from a log-normal distribution with median 1 and scale parameter σε>0 . The multiplicative bias in Eq. (1) can also be expressed in terms of the log ratios of radar versus gauge values:

$\begin{array}{}\text{(2)}& \mathrm{ln}\left(\frac{{R}_{\mathrm{r}}\left(t\right)}{{R}_{\mathrm{g}}\left(t\right)}\right)=\mathrm{ln}\left(\mathit{\beta }\right)+\mathrm{ln}\left(\mathit{\epsilon }\left(t\right)\right),\end{array}$

where ln (ε(t)) is a Gaussian random variable with mean 0 and variance ${\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}$. Equation (2) can be used to detect the presence of conditional bias with intensity by checking whether the expected value of the log ratio $\mathrm{ln}\left(\frac{{R}_{\mathrm{r}}\left(t\right)}{{R}_{\mathrm{g}}\left(t\right)}\right)$ depends on Rg(t) or not. Note that the multiplicative bias model in Eqs. (1) and (2) has been shown to provide a better, physically more plausible representation of the error structure between in situ and remotely sensed rainfall observations than the classical additive bias model used in linear regression (e.g., Tian et al.2013). It assumes that the discrepancies between radar and gauge measurements are the result of two error contributions: a deterministic component β that accounts for systematic errors in radar and gauge measurements (e.g., due to calibration, wind effects, wrong ZR relationship) and a random term ε(t) that represents sampling errors and noise in radar and gauge observations. Since gauges are not seen as ground truth in this study, ε(t) is assumed to contain all possible sources of errors in both the gauge and radar observations, including the ones due to differences in sampling volumes . The last point is particularly important as radar sampling volumes can be up to 7 orders of magnitude larger than that of rain gauges . This means that even if both sensors would be perfectly calibrated, their measurements would still disagree with each other due to the fact that rain gauge measurements made at a particular location within a radar pixel are usually not representative of averages over larger areas. In their paper, proposed a rigorous statistical framework for assessing this representativeness error based on the spatial autocovariance function and the notion of extension variance. However, their approach was developed for an additive error model and can not be directly applied here. Instead, we propose a comparatively simpler approach in which the differences in sampling volumes are already included in the random errors ε(t). Our approach is based on the assumption that the errors ε(t) have a log-normal distribution with median 1 and scale parameter σε>0, which means that we must have $\mathbb{E}\left[\mathit{\epsilon }\left(t\right)\right]=\mathrm{exp}\left(\frac{{\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}}{\mathrm{2}}\right)\ne \mathrm{1}$. Furthermore, if we assume that Rg(t) and Rr(t) are second-order stationary random processes with fixed mean μg and μr and variances ${\mathit{\sigma }}_{\mathrm{g}}^{\mathrm{2}}$ and ${\mathit{\sigma }}_{\mathrm{r}}^{\mathrm{2}}$ and that the random errors ε(t) are identically distributed and independent of Rg(t), then we get the following system of equations.

$\begin{array}{}\text{(3)}& \left\{\begin{array}{ll}\mathbb{E}\left[{R}_{\mathrm{g}}\left(t\right)\right]& =\mathit{\beta }\cdot \mathbb{E}\left[{R}_{\mathrm{r}}\left(t\right)\right]\cdot \mathbb{E}\left[\mathit{\epsilon }\left(t\right)\right]=\mathit{\beta }\cdot {\mathit{\mu }}_{\mathrm{r}}\cdot \mathrm{exp}\left(\frac{{\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}}{\mathrm{2}}\right)\\ \mathrm{Var}\left[{R}_{\mathrm{g}}\left(t\right)\right]& ={\mathit{\beta }}^{\mathrm{2}}\cdot \mathrm{Var}\left[{R}_{\mathrm{r}}\left(t\right)\right]\cdot \mathrm{Var}\left[\mathit{\epsilon }\left(t\right)\right]={\mathit{\beta }}^{\mathrm{2}}\cdot {\mathit{\sigma }}_{\mathrm{r}}^{\mathrm{2}}\\ & \cdot \mathrm{exp}\left({\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}\right)\cdot \left(\mathrm{exp}\left({\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}\right)-\mathrm{1}\right)\end{array}\right\\end{array}$

From the first equation we get ${\mathit{\beta }}^{\mathrm{2}}=\frac{{\mathit{\mu }}_{\mathrm{g}}^{\mathrm{2}}}{{\mathit{\mu }}_{\mathrm{r}}^{\mathrm{2}}}\cdot \mathrm{exp}\left(-{\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}\right)$, which can be plugged into the second equation to get an estimate of the scale parameter ${\stackrel{\mathrm{^}}{\mathit{\sigma }}}_{\mathit{\epsilon }}$:

$\begin{array}{}\text{(4)}& {\stackrel{\mathrm{^}}{\mathit{\sigma }}}_{\mathit{\epsilon }}^{\mathrm{2}}=\mathrm{ln}\left(\mathrm{1}+\frac{{\mathit{\sigma }}_{\mathrm{g}}^{\mathrm{2}}{\mathit{\mu }}_{\mathrm{r}}^{\mathrm{2}}}{{\mathit{\sigma }}_{\mathrm{r}}^{\mathrm{2}}{\mathit{\mu }}_{\mathrm{g}}^{\mathrm{2}}}\right)=\mathrm{ln}\left(\mathrm{1}+\frac{{\mathrm{CV}}_{\mathrm{g}}^{\mathrm{2}}}{{\mathrm{CV}}_{\mathrm{r}}^{\mathrm{2}}}\right),\end{array}$

where ${\mathrm{CV}}_{\mathrm{g}|\mathrm{r}}=\frac{{\mathit{\sigma }}_{\mathrm{g}|\mathrm{r}}}{{\mathit{\mu }}_{\mathrm{g}|\mathrm{r}}}$ denotes the coefficient of variation of the gauge and radar values, respectively. Substituting, we get the following estimate for β:

$\begin{array}{}\text{(5)}& \stackrel{\mathrm{^}}{\mathit{\beta }}=\frac{{\mathit{\mu }}_{\mathrm{g}}}{{\mathit{\mu }}_{\mathrm{r}}}\cdot \mathrm{exp}\left(-\frac{{\stackrel{\mathrm{^}}{\mathit{\sigma }}}_{\mathit{\epsilon }}^{\mathrm{2}}}{\mathrm{2}}\right).\end{array}$

The first term $\frac{{\mathit{\mu }}_{\mathrm{g}}}{{\mathit{\mu }}_{\mathrm{r}}}$ in Eq. (5) is known as the G∕R ratio , and it quantifies the apparent bias between radar and gauge measurements. The second term $\mathrm{exp}\left(-\frac{{\stackrel{\mathrm{^}}{\mathit{\sigma }}}_{\mathit{\epsilon }}^{\mathrm{2}}}{\mathrm{2}}\right)$ is a bias-adjustment factor that accounts for the fact that gauge and radar measurements do not have the same mean and variance (e.g., due to differences in sampling volumes and/or different measurement uncertainties). The actual underlying model bias β is obtained by multiplying the two terms together. However, it is important to keep in mind that only the G∕R ratio is directly observable from the data, while β is a theoretical bias that heavily depends on the assumptions that the errors are log-normally distributed with median 1 and independent of the radar observations. To avoid any confusion, the following terminology is adopted.

• The “apparent” bias (i.e., seemingly real or true, but not necessarily so) is the one that we see in the data. It is measured using the G∕R ratio.

• The “actual” bias (i.e., existing in fact; real) is the unknown underlying bias, i.e., the bias that we would measure if radar and gauges would have the same sampling volumes. The actual bias is always unknown. The best we can do is approximate it with the help of a statistical model.

Note that σε and β could also be estimated through Eq. (2) by calculating the mean and standard deviation of $\mathrm{ln}\left(\frac{{R}_{\mathrm{g}}\left(t\right)}{{R}_{\mathrm{r}}\left(t\right)}\right)$. However, this approach is not recommended as the ratios for small rainfall rates can be very noisy and numerical errors will arise whenever one of the measurements is zero.

For readers not familiar with the interpretation of multiplicative biases, note that it is also possible to express the G∕R ratio and model bias β as an average relative error. In this case, we have

$\begin{array}{}\text{(6)}& \begin{array}{rl}{\mathrm{Err}}_{\mathrm{avg}}& =\mathbb{E}\left[\frac{{R}_{\mathrm{g}}\left(t\right)-{R}_{\mathrm{r}}\left(t\right)}{{R}_{\mathrm{g}}\left(t\right)}\right]=\mathrm{1}-\frac{\mathrm{1}}{\mathit{\beta }}\cdot \mathbb{E}\left[\frac{\mathrm{1}}{\mathit{\epsilon }\left(t\right)}\right]\\ & =\mathrm{1}-\frac{\mathrm{exp}\left({\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}\right)\cdot \left(\mathrm{exp}\left({\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}\right)-\mathrm{1}\right)}{\mathit{\beta }},\end{array}\end{array}$

where we used the fact that $\frac{\mathrm{1}}{\mathit{\epsilon }\left(t\right)}$ is also a log-normal with median 1 and scale parameter σε. However, for simplicity and robustness, we prefer to report the median relative error which is independent of the variance of ε(t):

$\begin{array}{}\text{(7)}& \begin{array}{rl}{\mathrm{Err}}_{\mathrm{med}}& =\mathrm{Med}\left[\frac{{R}_{\mathrm{g}}\left(t\right)-{R}_{\mathrm{r}}\left(t\right)}{{R}_{\mathrm{g}}\left(t\right)}\right]=\mathrm{1}-\frac{\mathrm{1}}{\mathit{\beta }}\cdot \mathrm{Med}\left[\frac{\mathrm{1}}{\mathit{\epsilon }}\right]\\ & =\mathrm{1}-\frac{\mathrm{1}}{\mathit{\beta }}.\end{array}\end{array}$

### 2.3.2 Peak intensity bias

Equation (5) provides a convenient way to estimate the average bias between radar and gauge measurements over the course of an event. However, in reality, the bias is likely to fluctuate over time as a function of the spatio-temporal characteristics and intensity of the considered events and their location with respect to the radar(s). Consequently, the G∕R ratio and model bias β might not necessarily be representative of what happens during the most intense parts of an event. To account for this, we also consider the peak rainfall intensity bias (PIB) between radar and gauges. The PIB is defined as

$\begin{array}{}\text{(8)}& {R}_{\mathrm{g}}^{\mathrm{max}}=\mathrm{PIB}\cdot {R}_{\mathrm{r}}^{\mathrm{max}},\end{array}$

where ${R}_{\mathrm{g}}^{\mathrm{max}}$ and ${R}_{\mathrm{r}}^{\mathrm{max}}$ denote the maximum rain rate values recorded by the gauges and radar over the course of an event. The PIB values are computed on an event-by-event basis, by aggregating the radar and gauge data to a fixed temporal resolution (using overlapping time windows) and extracting the maximum rain rate over the event at this scale. Note that this is done independently for the gauge and radar time series, which means that the maximum values may not necessarily correspond to the same time interval. The main reason for this is that it leads to a more reliable and robust estimate of PIB at high spatial and temporal resolutions and reduces the sensitivity to small timing differences between radar and gauge observations due to wind and vertical variability.

### 2.3.3 Other metrics

To complement the bias analysis and provide a more comprehensive overview of the agreement between gauge and radar measurements, we also calculate standard error metrics such as the Spearman rank correlation coefficient (CC), root mean square difference (RMSD) and relative root mean square difference $\mathrm{RRMSD}=\frac{\mathrm{RMSD}}{{\mathit{\mu }}_{\mathrm{g}}}$ between gauge and radar values. All these statistics are calculated on an event-by-event basis at a fixed aggregation timescale.

Figure 4Time series of radar and gauge intensities (in mmh−1) for the most intense event in each country.

3 Results

## 3.1 Agreement during the four most intense events

Figure 4 shows the time series of rainfall intensities for the top events in each country (i.e., Denmark, the Netherlands, Finland and Sweden, respectively). Each of these events is highly intense, with peak intensities reaching 204 mmh−1 in Denmark, 180 mmh−1 in the Netherlands, 89.1 mmh−1 in Finland and 91.2 mmh−1 in Sweden. The 2 July 2011 event in Denmark was particularly violent, affecting more than a million people in the greater Copenhagen region and causing an estimated damage of at least EUR 800 million . During the third rainfall peak in Denmark, rain rates remained well above 125 mmh−1 for three consecutive 5 min time steps, resulting in more than 41 mm of rain (e.g., about 1 month's worth of rain for the Copenhagen region). During the same 15 min, the radar only recorded 12.1 mm, which is 3.39 times less than what was measured by the gauge. Note that this does not necessarily imply that the radar estimates are wrong, as rain gauge data can also suffer from large biases in times of heavy rain and are not directly comparable to radar due to the large difference in sampling volumes. Nevertheless, all four depicted events show a strong, systematic pattern of underestimation by radar compared with the gauges. The G∕R ratios, as defined in Eq. (5), are 1.66, 1.37, 1.55 and 1.68, respectively, which corresponds to a relative difference in rainfall rates between radar and gauges of 27 %–40 %. This order of magnitude is consistent with previous values reported in the literature. For example, mentioned a 30 % underestimation of radar compared with gauges in Belgium, and found up to 50 % underestimation on individual events in the United States.

Despite being biased, radar and gauge measurements are rather consistent with each other in terms of their temporal structure (e.g., rank correlation values of 0.92, 0.75, 0.80 and 0.85 for Denmark, the Netherlands, Finland and Sweden, respectively). Also, a substantial part of the apparent bias is likely attributable to differences in sampling volumes. According to Eq. (5), the bias-adjustment factor ${e}^{-{\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}/\mathrm{2}}$ is 0.63, 0.59, 0.66, and 0.70 in Denmark, the Netherlands, Finland and Sweden, respectively. The actual underlying model bias β for the four depicted events is therefore estimated to be 1.04, 0.81, 1.02 and 1.18. In other words, once the differences in scale between radar and gauge data have been accounted for, radar only appears to underestimate rainfall rates by a factor 1.04 (3.8 %) in Denmark, 1.02 (2.0 %) in Finland and 1.18 (15.3 %) in Sweden. In the Netherlands, the radar values even seem to be overestimated by a factor $\mathrm{1}/\mathrm{0.81}=\mathrm{1.23}$ (18.7 %). The fact that radar might overestimate rainfall rates compared with gauges may seem contradictory at first (given that actual values are lower) but can be explained by the fact that β also accounts for the relative variability of the radar and gauge observations. Nevertheless, β values should be interpreted very carefully as they rely on the assumption that the errors between radar and gauges are independent and log-normally distributed with median 1. Figure 4 suggests that this might not always be the case. In particular, the bias between radar and gauges appears to increase during the peaks (see Sect. 3.3 for more details). In this case, the peak intensity biases for the top events in each country were 2.17 (Denmark), 2.09 (Finland), 1.98 (Netherlands) and 1.73 (Sweden), which is consistently larger than the average bias (as measured by the G∕R ratio).

Figure 5Radar versus gauge intensities (in mmh−1) at the highest available temporal resolution for each country (all 50 events combined). The dashed line represents the diagonal.

## 3.2 Overall agreement between radar and gauges

In the following, we consider the overall agreement between radar and gauges for each country. Figure 5 shows the rainfall intensities of radar versus gauges for each country (at the highest temporal resolution). Each dot in this figure represents a radar–gauge pair and all 50 events have been combined together into the same graph. Results show a good consistency between the two sensors (i.e., rank correlation coefficients between 0.77 and 0.91). However, the intensities measured by radar are clearly lower than that of the gauges. The G∕R ratios are 1.59 for Denmark, 1.40 for the Netherlands, 1.56 for Finland and 1.66 for Sweden, corresponding to median relative differences of 37.3 %, 28.4 %, 35.9 %, and 39.7 %, respectively. In addition to the bias, we also see a significant amount of scatter with relative root mean square differences between 116.4 % and 139.1 % (depending on the country). This is characteristic for sub-hourly aggregation timescales and can be explained by the large spatial and temporal variability of rainfall and the fact that radar and gauges do not measure precipitation at the same height and over the same volumes.

Figure 6Radar versus gauge accumulations (in millimeters) at the event scale for each country (i.e., one dot per event). The dashed line represents the diagonal.

Since it can be hard to compare gauge and radar measurements over short aggregation timescales, additional analyses were carried out to better understand how resolution affects the discrepancies between the two rainfall sensors. Figure 6 shows the scatter plot of radar versus gauge estimates when the data are aggregated to the event scale. Each dot in this graph represents the total rainfall accumulation (in millimeters) over an event. The aggregation to the event scale strongly reduces the scatter (i.e., RRMSD between 38.8 % and 47.7 %) and further increases the correlation coefficient (i.e., 0.80–0.92), making it easier to see the bias. The G∕R ratio remains the same, as values only depend on total accumulation and not on the temporal resolution at which the events are sampled. The fact that radar and gauges agree more at the event scale than at the sub-hourly scale is encouraging. However, improvements are mainly attributed to the fact that many of the large discrepancies affecting the rainfall peaks get smoothed out during aggregation. This leads to an overly optimistic assessment of the agreement between radar and gauges that is not necessarily representative of what happens during the most intense parts of the events.

Based on the values of the G∕R ratio in Fig. 5, the Dutch C-band radar composite has the lowest apparent bias of all products (28.4 %), followed by Finland (35.9 %), Denmark (37.3 %) and Sweden (39.7 %). However, such direct comparisons are not really fair, as they do not take into account the different spatial and temporal resolutions of the radar products, the number of radars used during the estimation and their distances to the considered rain gauges. They also ignore the fact that the top 50 events in each country do not have the same intensities, durations and spatio-temporal structures. For example, the events in Denmark are significantly more intense compared with the Netherlands, Finland and Sweden, which might explain some of the differences. Also, the longest event in the Danish database only lasted 4 h, which is shorter than for the other countries. To better understand the origin of the bias and interpret the differences between the countries, additional, more detailed analyses are necessary.

Table 3Summary statistics for the highest aggregation timescale (all 50 events combined). Average intensity for gauges and radar μg and μr, standard deviations σg and σr, G∕R ratio, coefficient of variation, scale parameter σε and model bias β.

The first analysis we did was to estimate the model bias β in Eq. (5) under the assumption that the errors are log-normally distributed with median 1. Table 3 shows the estimated values of μg, μr, σg, σr and σε at the highest available temporal resolution for each radar product (all 50 events combined). The obtained β values are 1.04 for Denmark, 0.94 for the Netherlands, 1.11 for Finland and 1.11 for Sweden. This leads to a radically different assessment of the bias between radar and gauge values than with the G∕R ratio. According to the β values, the Danish product has the lowest model bias (3.8 %), followed by the Netherlands (−6.4 %), Finland (9.9 %) and Sweden (9.9 %). The Dutch radar product again appears to slightly overestimate the rainfall intensity, which is counter-intuitive given that the radar values are 30 %–40 % lower than the gauges on average. However, this can be explained by the fact that β is a theoretical bias that accounts for the relative variability of the rain gauge and radar observations around their respective means (see Eqs. 45). Products for which CVg is larger than CVr therefore see their bias reduced. This makes sense as gauge measurements are expected to have a larger coefficient of variation than radar due to their smaller sampling volume (i.e., point estimate versus areal average). Another reason is that gauges are known to suffer from relatively large sampling uncertainties at sub-hourly timescales. The fact that Denmark uses RIMCO tipping bucket gauges (as opposed to the float gauges in the Netherlands and weighing gauges in Finland and Sweden) therefore also makes a difference when calculating β. The bias-adjustment factor $\mathrm{exp}\left(\frac{-{\mathit{\sigma }}_{\mathit{\epsilon }}^{\mathrm{2}}}{\mathrm{2}}\right)$ combines all these different factors together, which leads to a fairer comparison of the different radar products. The fact that the theoretical bias after accounting for differences in mean and variance might be as low as 10 % (despite what the G∕R ratio suggests) and that products with higher spatial/temporal resolutions seem to be affected by lower biases (in absolute value) is quite encouraging. However, one has to keep in mind that the representativity of β strongly depends on the adequacy of the model proposed in Eq. (1). Further analyses presented in the next section show that some of these assumptions might not be very realistic.

## 3.3 Conditional bias with intensity

The analyses performed in Sect. 3.1 and 3.2 are useful to understand the overall agreement between radar and gauges over a large number of events, but the estimated values strongly depend on the assumption that the bias β in Eq. (1) is constant. Our initial analysis in Sect. 3.1 already showed that in reality, the bias is likely to fluctuate over time, increasing in times of heavy rain. As mentioned in the introduction, time and intensity-dependent biases in radar or gauge estimates are highly problematic because they affect the timing and magnitude of peak flow predictions in hydrological models. Here, we perform a more quantitative assessment of this effect by studying the conditional bias between radar and gauges with respect to the rainfall intensity. Conditional biases are detected and quantified on the basis of the multiplicative bias model in Eqs. (1) and (2). If our assumptions are correct and there is no conditional bias, Eq. (2) tells us that the average log ratio between rain gauge and radar estimates should be a Gaussian random variable with constant mean and variance. Moreover, this result must hold independently of the rainfall intensity Rg(t). To detect the presence of a conditional bias in the G∕R ratio, we therefore plot the values of $\mathrm{ln}\left(\frac{{R}_{\mathrm{g}}\left(t\right)}{{R}_{\mathrm{r}}\left(t\right)}\right)$ versus Rg(t) (at the highest available temporal resolution) and calculate the slope of the corresponding regression line, as shown in Fig. 7. If the slope is positive, the bias increases with intensity. The relative rate of increase (in percentage) in the G∕R ratio per mmh−1 is then given by 100(em−1), where m is the slope of $\mathrm{ln}\left(\frac{{R}_{\mathrm{g}}\left(t\right)}{{R}_{\mathrm{r}}\left(t\right)}\right)$ versus Rg(t).

Figure 7Log ratio of gauge over radar values as a function of rain gauge intensity (in mmh−1) for each country. The red lines represent the fitted linear regression models.

The fitted regression lines in Fig. 7 show that three out of the four main radar products exhibit a clear positive conditional bias with intensity. The only product for which the bias does not increase with intensity is the Finnish OSAPOL. Incidentally, the Finnish OSAPOL is also the only product in which heavy rainfall rates are estimated through differential phase instead of reflectivity, pointing to the advantage of polarimetry over fixed ZR relationships. The relative rates of increase for the G∕R ratio are 1.09 % per mmh−1 in Denmark, 0.86 % in the Netherlands, 0.09 % in Finland and 2.12 % in Sweden. This may not seem large but can make a big difference when rainfall intensities vary from 1 mmh−1 to more than 100 mmh−1. For example, in Denmark, the G∕R ratio (conditional on intensity) increases from 0.92 at 1 mmh−1 to 2.69 at 100 mmh−1. In Sweden, the conditional G∕R ratio varies from 1.49 at 1 mmh−1 to 11.96 at 100 mmh−1. By contrast, the conditional G∕R ratios at 100 mmh−1 for the Netherlands and Finland only reach values of 2.48 and 2.40, respectively. The fact that both the Danish and Swedish products have large conditional biases also explains why their overall bias (as measured by the G∕R ratio without conditioning on intensity) is slightly larger than for the Netherlands and Finland. However, since large rainfall intensities are rare, the net effect of the conditional bias on the overall G∕R ratio remains rather small.

The most likely explanation for the conditional bias with intensity is the fact that three out of the four main radar products use a fixed Marshall–Palmer ZR relationship to estimate rainfall rates from reflectivity. The bias therefore increases/decreases whenever the raindrop size distribution starts to deviate significantly from Marshall–Palmer, as is usually the case during strong convective precipitation and high rainfall intensities. The mean field bias adjustments based on rain gauge data can help reduce the overall bias by tuning the prefactor in the ZR relationship. However, mean field bias adjustments are insufficient to account for the rapid changes in raindrop size distributions in heavy rain. Previous studies suggest that the best way to mitigate biases and ensure accurate hydrological predictions is to frequently adjust the radar data over time . This might also explain why the Swedish and Danish radar products which are corrected using daily gauge data have a stronger conditional bias with intensity than the Dutch product which uses hourly corrections. Another even better strategy, as demonstrated by the low conditional bias of the Finnish OSAPOL product, is to replace the ZR relation by a R(Kdp) retrieval which is known to be less sensitive to variations in drop size distributions and calibration effects .

Figure 8Log ratio of gauge over radar values as a function of the distance to the nearest radar. The red line represents the fitted linear regression model.

## 3.4 Other sources of bias

The conditional bias with intensity explains a lot of the differences between the radar products. However, this is only one part of the story, and other confounding factors such as the distance between the radar(s) and the gauges also need to be considered. Figure 8 shows the log ratio of gauge versus radar estimates $\mathrm{ln}\left(\frac{{R}_{\mathrm{g}}\left(t\right)}{{R}_{\mathrm{r}}\left(t\right)}\right)$ as a function of the distance to the nearest radar. Compared with intensity, the trend with distance appears to be much weaker. Out of the four considered products, only the Danish C-band exhibits a trend that is significantly different from zero (at the 5 % level). This makes sense given that the Danish product only considers data from a single radar and only applies a mean field bias correction, making it more likely to be affected by range effects such as overshooting, non-uniform beam filling and attenuation. Based on our analyses, the multiplicative bias β increases by 0.73 % per kilometer. However, since the range of distances between radar and gauges in Denmark is relatively small (from 29.2 to 74.2 km), bias values only vary from 1.06 to 1.47 at minimum and maximum distances, respectively. Distance therefore only plays a minor role in explaining the variations in bias compared with intensity. Interestingly, the composite products in the Netherlands and Finland do not seem to suffer from significant conditional biases with distance, highlighting the advantage of combining data from different radars and viewpoints to mitigate range effects. The Swedish product currently does not combine measurements from multiple radars in an optimal way, only using the measurements from the best (i.e., nearest) radar. However, the Swedish BRDC also contains an additional range-dependent bias correction (see Sect. 2.2.4) that appears to be rather efficient at removing large-scale trends with distance. However, the strong conditional bias with intensity in the Swedish BRDC also makes it harder to see potential range-dependent biases in the first place.

Figure 9Boxplots of peak intensity bias versus aggregation timescale. Each boxplot represents the 10 %, 25 %, 50 %, 75 % and 90 % quantiles for the 50 top events in each country. The horizontal lines denote the average multiplicative biases (G∕R ratio).

Table 4Summary statistics for the highest aggregation timescale (all 50 events combined). G∕R ratio and G∕R ratio corrected for areal-reduction factor ARF, model bias β assuming log-normal distribution and relative increase in β with respect to intensity and range.

## 3.5 Agreement during the peaks

In this section, we take a closer look at how well the rainfall peaks are captured by the radar. Figure 9 shows the 10 %, 25 %, 50 %, 75 % and 90 % quantiles of peak intensity bias between radar and gauges as a function of the aggregation timescale. The dashed horizontal lines denote the average apparent bias (i.e., the G∕R ratio). We see that the Netherlands and Finland have relatively low median peak intensity biases of 1.82 and 1.88 at 10 min resolution (approximately 1.2–1.3 times higher than the average bias). Denmark and Sweden on the other hand have substantially higher median PIB values of 2.96 and 2.24 (1.86 and 1.35 times higher than the average). Moreover, the rate at which the PIB decreases with the aggregation timescale is different in each country. In Denmark and Sweden, the PIB remains well above the average bias for all aggregation timescales up to 2 h, while in the Netherlands and Finland, the PIB converges much more quickly to the mean bias (i.e., after approximately 60 min for the Netherlands and 20 min for Finland). This is no coincidence and can be explained by the fact that the Netherlands use hourly rain gauge data to bias correct their radar estimates, while the Danish and Swedish products use daily bias-adjustment factors. showed that switching from daily to hourly mean field bias adjustments can slightly improve peak rainfall estimates but also pointed out that hourly bias corrections tend to be problematic in times of low rain rates due to the small number of tips in the gauges. Therefore, in order to make a generally applicable adjustment that works for all rain conditions, the authors argue that it is better to use daily adjustments. Here, we see that this strategy can result in a severe increase in the peak intensity bias at sub-hourly scales, with some of the radar–gauge pairs differing by more than a factor 5. The Dutch radar product also exhibits a rapid increase in PIB at sub-hourly scales. However, since the conditional bias with intensity is rather small, the overall G∕R ratio at 10 min resolution rarely exceeds more than a factor 3. The Finnish product is interesting, as it is the only one that has not been bias corrected with gauges. Its strength is that it makes use of polarimetry (i.e., Kdp) to estimate rainfall rates during the peaks. This results in almost identical performances in terms of PIBs than a traditional approach based on the ZR relationship with hourly bias corrections, as used in the Netherlands. The only notable difference is the rate at which the peak intensity bias converges to the average bias, with the Finnish product exhibiting a lower dependence on the aggregation timescale than the Dutch product.

Figure 10Peak rainfall intensities measured by radar and gauges as a function of the aggregation timescale for the top one event in each country. The red triangles show the peak intensity bias between radar and gauges (axis on the right).

Another equally interesting result is the fact that the PIB for specific events does not necessarily decrease when the radar and rain gauge data are aggregated to a coarser timescale. Figure 10 illustrates this point by showing the PIBs for the top event in each country as a function of the aggregation timescale. The time series corresponding to these four events were already shown in Fig. 4. While the PIB in the Netherlands and Finland exponentially decays with the aggregation timescale, Denmark and Sweden exhibit a more complicated structure characterized by multiple ups and downs. Looking at event 1 for Denmark, we see that the peak intensity bias starts at 2.17 (53.9 %) at 5 min, decreases to 2.1 (52.4 %) at 10 min, increases again to 2.17 (53.9 %) at the 15 min timescale, decreases until 1.78 (43.9 %) at 35 min, only to increase again to 2.02 (50.4 %) at 45–50 min. The multiple ups and downs can be explained by the intermittent nature of this event, with four successive rainfall peaks separated by approximately 15–45 min (see Fig. 4). Each of these peaks is characterized by different random observational errors, causing extremes at certain scales to be captured better than others. The same applies to event 1 in Sweden, where the peak intensity bias starts at 1.73 (42.3 %) at 15 min, decreases to 1.67 (40.1 %) at 30 min and increases again to 1.75 (42.8 %) at 45 min. In this case, the event is less intermittent and there is only one single rainfall peak. However, Fig. 4 clearly shows three consecutive time steps during which the radar underestimates the rainfall rate. These examples show that even though globally speaking, the average peak intensity bias between radar and gauges converges to the average G∕R ratio when the data are aggregated to coarser timescales (as shown in Fig. 9), this might not always be the case locally and does not necessarily apply to all events. The reason for this is that the PIB depends on a multitude of confounding factors (e.g., calibration errors, natural variations in drop size distributions, range effects, wind, vertical variability, attenuation). When individual sources of error depend on each other or exhibit significant auto-correlation, their combined effect might cause the PIB to (locally) increase with the aggregation timescale. In particular, strongly auto-correlated sources of bias such as changing drop size distributions, signal attenuation or wind effects can cause the PIB to increase with the aggregation timescale.

The notion that peak intensity biases between radar and gauges can amplify when data are aggregated to coarser timescales is not new in itself but has important consequences for the representation of peak rainfall intensities in hydrological models as it affects the choice of the optimal spatial and temporal resolution at which models should be run when making flood predictions. Another important finding of our study is that single-radar products with daily rain gauge adjustments are more likely to contain increasing PIBs with the aggregation timescale than composite products with hourly bias corrections. This makes sense as mean field bias adjustments can (partly) compensate for the bias in rainfall rate due to deviations from the Marshall–Palmer drop size distribution in the ZR relationship. Similarly, radar compositing can mitigate the bias due to environmental factors such as range effects, vertical variability and attenuation. To show this, we computed, for each event, the timescale at which peak intensity bias reaches its maximum value. Figure 11 shows that in Denmark, 21 out of 50 events exhibited a maximum PIB at a scale larger than that of the highest available temporal resolution. Similarly, for the Swedish radar product, 26 out of 50 cases of locally increasing peak intensity biases with the aggregation timescale could be identified. By contrast, the Finnish and Dutch radar products, which make use of compositing and more frequent bias adjustments, only contained 14 and 8 such events, respectively. Further analysis reveals that most of the events with locally amplifying PIBs consist of two or more rainfall peaks separated by 10–30 min, with rapidly fluctuating rainfall intensities between them (i.e., high intermittency). Some events with single rainfall peaks during which radar strongly underestimated rainfall rates for two or more time steps in a row were also identified. However, due to the limited temporal autocorrelation in heavy rain, most peak intensity bias values reached their maximum at timescales of 30 min or less.

Figure 11Aggregation timescale at which the maximum peak intensity bias between gauge and radar occurred.

Figure 12Performance metrics for the Danish X-band radar system (top 10 events).

Figure 13Rank correlation, relative root mean square difference, G∕R ratio and peak intensity bias (at 15 min resolution) of the national radar products and the BALTRAD composite.

4 Conclusions

The accuracy of six different radar products in four countries (Denmark, Finland, the Netherlands and Sweden) has been analyzed. Special emphasis has been put on quantifying discrepancies between radar and gauges in times of heavy rain. A relatively good agreement was found in terms of temporal consistency (correlation coefficient between 0.7 and 0.9). However, the scatter at sub-hourly timescales remains high (98 %–144 % at 5–15 min). Moreover, all six radar products exhibited a clear pattern of underestimation. The multiplicative biases at 5–15 min were between 1.20 and 1.77, suggesting that radar underestimates rainfall rates by 17 %–44 % compared with gauges. A substantial part of the bias (i.e., 10 %–30 % according to areal-reduction factors) is likely due to differences in sampling volumes. However, this remains hard to quantify precisely in the absence of dense rain gauge networks. An alternative bias model that accounts for the differences in mean and variance between radar and gauge measurements suggested that the actual bias affecting radar rainfall estimates could be as low as 10 %. Moreover, higher-resolution radar products seemed to agree better with gauges, which is encouraging. At the same time, these conclusions strongly rely on the assumption that errors are log-normally distributed and independent of intensity, which, as we have seen in this study, is likely not to be true during the peaks.

Overall, the X-band data for Denmark showed promising results, outperforming all other C-band products in terms of accuracy and correlation, thereby demonstrating the value of high-resolution rainfall observations for urban hydrology. However, due to the shorter data record, only 10 events over 2 years could be considered. The polarimetric estimates from the Finnish OSAPOL project also showed promising performance, which is remarkable considering the fact that they were not adjusted by any gauges. However, it should also be pointed out that for now, the overall performance of the OSAPOL remains similar to that of the Dutch C-band product with a fixed ZR relationship and hourly bias correction. Interestingly, the distance between the radar and the gauges did not appear to have a strong effect on peak intensity bias. We explain this by the fact that range-dependent biases tend to be small compared with the large spatial variability of rain at the event scale. Therefore, range effects are masked by other errors and only become visible when the radar data are aggregated over the course of several days or months.

Another important finding of this paper was that the largest bias between radar and gauges in terms of peak intensities does not necessarily occur at the highest temporal sampling resolution. Depending on the autocorrelation structure of the errors and the resolution of the rain gauge data used for the adjustments, multiplicative biases may amplify over time instead of converging to the mean value. This mostly happens at the sub-hourly timescales and roughly affects 40 %–50 % of all events in single-radar products and 15 %–30 % in composite products. Most of these cases were characterized by a succession of multiple rainfall peaks or, alternatively, one very intense peak of 15–30 min during which radar strongly underestimated the intensity for two or more consecutive time steps. The strong dependence of the error structure in radar data depending on aggregation timescale still represents a major challenge as it limits our ability to accurately characterize rainfall extremes and uncertainties in hydrological models across scales . One way to partially mitigate this effect is to combine measurements from multiple radars. However, more research is necessary to precisely quantify this part of the error.

Finally, like with any statistical analysis, there are a few important limitations that need to be mentioned. The first is that little focus has been given to the analysis of the rain gauge data themselves. In reality, gauges also suffer from measurement uncertainties and errors, the most common being an underestimation of rainfall rates in times of heavy precipitation due to calibration issues and wind effects. No attempt has been made to correct for these additional biases nor to distinguish between gauge and radar-induced errors. Since the gauge data are likely to be underestimated as well, the actual bias between the two sensors might be larger than suspected. The second issue is the relatively short length of the observational record (10–15 years), which meant that only a small number of extreme rain events could be considered. Moreover, it is worth mentioning that some of the events in the database actually occurred on the same day but were captured by different gauges at different locations. The derived statistics might therefore be biased towards characterizing the performance of the radar during these days instead of the average performance over a large number of independent events. Another issue is the lack of a common denominator for comparing the radar products. Future studies involving identical radar systems and different levels of processing (e.g., by switching on/off individual correction schemes) would be useful to get a better understanding of the strengths and weaknesses of individual retrieval techniques within a more controlled setting. Despite all these limitations, the present study already provided some important insight into the major issues affecting radar–rainfall estimates in times of heavy rain. Also, several useful strategies for mitigating errors and reducing biases were identified. Future research should focus on analyzing more radar products and identifying the most promising strategies for improving performance in each country.

Appendix A: Top 50 events for each country

Table A1Top 50 events for Denmark.

Table A2Top 50 events for the Netherlands.

Table A3Top 50 events for Finland.

Table A4Top 50 events for Sweden.

Table A5Top 10 events for the Danish X-band product.

Data availability
Data availability.

The Dutch radar products are available for free in HDF5 format through the FTP of KNMI or in netCDF4 format via the Climate4Impact website. The Danish, Swedish and Finnish products are not open yet but can be made available for research purposes upon request to the authors.

Author contributions
Author contributions.

MS coordinated the experiments, developed the theoretical formalism, performed the analyses and wrote the manuscript. JO and PB compiled the Swedish radar and BALTRAD datasets with support from DB. TN and TK produced the Finnish radar and gauge datasets with support from SP. ST, RN and JEN produced the Danish C-band and X-band radar datasets. All the authors provided critical feedback and helped shape the research, analysis and manuscript.

Competing interests
Competing interests.

The authors declare that they have no competing interests.

Acknowledgements
Acknowledgements.

The authors would like to thank the Danish, Finnish, Swedish and Dutch Meteorological Institutes (i.e., DMI, FMI, SMHI and KNMI) for collecting and distributing the radar and gauge data used in this study.

Financial support
Financial support.

This research has been supported by the EU in the framework of ERA-NET Cofund WaterWorks2014 project MUFFIN (Multiscale Flood Forecasting: From Local Tailored Systems to a Pan-European Service). This ERA-NET is an integral part of the 2015 Joint Activities developed by the Water Challenges for a Changing World Joint Programme Initiative (Water JPI). The first author was supported by the Netherlands Organisation for Scientific Research NWO (project code ALWWW.2014.3). The Finnish partners were supported by the Maa- ja vesitekniikan tuki ry. foundation (grant no. 32230). The Optimal Rain Products with Dual-Pol Doppler Weather Radar (OSAPOL) project was supported by the European Regional Development Fund and Business Finland (grant no. 4459/31/2014).

Review statement
Review statement.

This paper was edited by Nadav Peleg and reviewed by Witold Krajewski, Miguel Angel Rico-Ramirez, and one anonymous referee.

References

Anagnostou, M. N., Kalogiros, J., Anagnostou, E. N., Tarolli, M., Papadopoulos, A., and Borga, M.: Performance evaluation of high-resolution rainfall estimation by X-band dual-polarization radar for flash flood applications in mountainous basins, J. Hydrol., 394, 4–16, https://doi.org/10.1016/j.jhydrol.2010.06.026, 2010. a

Andréassian, V., Perrin, C., Michel, C., Usart-Sanchez, I., and Lavabre, J.: Impact of imperfect rainfall knowledge on the efficiency and the parameters of watershed models, J. Hydrol., 250, 206–223, https://doi.org/10.1016/S0022-1694(01)00437-1, 2001. a

Aronica, G., Freni, G., and Oliveri, E.: Uncertainty analysis of the influence of rainfall time resolution in the modelling of urban drainage systems, Hydrol. Process., 19, 1055–1071, https://doi.org/10.1002/hyp.5645, 2005. a, b

Baeck, M. L. and Smith, J. A.: Rainfall Estimation by the WSR-88D for Heavy Rainfall Events, Weather Forecast., 13, 416–436, https://doi.org/10.1175/1520-0434(1998)013<0416:REBTWF>2.0.CO;2, 1998. a

Bech, J., Codina, B., Lorente, J., and Bebbington, D.: The Sensitivity of Single Polarization Weather Radar Beam Blockage Correction to Variability in the Vertical Refractivity Gradient, J. Atmos. Ocean. Tech., 20, 845–855, https://doi.org/10.1175/1520-0426(2003)020<0845:TSOSPW>2.0.CO;2, 2003. a

Berg, P., Norin, L., and Olsson, J.: Creation of a high resolution precipitation data set by merging gridded gauge data and radar observations for Sweden, J. Hydrol., 541, 6–13, https://doi.org/10.1016/j.jhydrol.2015.11.031, 2016. a, b

Berne, A. and Krajewski, W. F.: Radar for hydrology: Unfulfilled promise or unrecognized potential?, Adv. Water Resour., 51, 357–366, https://doi.org/10.1016/j.advwatres.2012.05.005, 2013. a, b

Berne, A., Delrieu, G., Creutin, J.-D., and Obled, C.: Temporal and spatial resolution of rainfall measurements required for urban hydrology, J. Hydrol., 299, 166–179, https://doi.org/10.1016/j.jhydrol.2004.08.002, 2004. a, b

Blenkinsop, S., Lewis, E., Chan, S. C., and Fowler, H. J.: Quality-control of an hourly rainfall dataset and climatology of extremes for the UK, Int. J. Climatol., 37, 722–740, https://doi.org/10.1002/joc.4735, 2017. a

Brandes, E. A., Ryzhkov, A. V., and Zrnic, D. S.: An evaluation of radar rainfall estimates from specific differential phase, J. Atmos. Ocean. Tech., 18, 363–375, https://doi.org/10.1175/1520-0426(2001)018<0363:AEORRE>2.0.CO;2, 2001. a

Bringi, V. N. and Chandrasekar, V.: Polarimetric doppler weather radar, Cambridge University Press, Cambridge, 2001. a

Bruni, G., Reinoso, R., van de Giesen, N. C., Clemens, F. H. L. R., and ten Veldhuis, J. A. E.: On the sensitivity of urban hydrodynamic modelling to rainfall spatial and temporal resolution, Hydrol. Earth Syst. Sci., 19, 691–709, https://doi.org/10.5194/hess-19-691-2015, 2015. a, b, c

Chandrasekar, V., Keranen, R., Lim, S., and Moisseev, D.: Recent advances in classification of observations from dual polarization weather radars, Atmos. Res., 119, 97–111, https://doi.org/10.1016/j.atmosres.2011.08.014, 2013. a

Chang, M. and Flannery, L. A.: Spherical gauges for improving the accuracy of rainfall measurements, Hydrol. Process., 15, 643–654, https://doi.org/10.1002/hyp.181, 2001. a

Ciach, G. J.: Local random errors in tipping-bucket rain gauge measurements, J. Atmos. Ocean. Tech., 20, 752–759, https://doi.org/10.1175/1520-0426(2003)20<752:LREITB>2.0.CO;2, 2003. a

Ciach, G. J. and Krajewski, W. F.: On the estimation of radar rainfall error variance, Adv. Water Resour., 22, 585–595, https://doi.org/10.1016/S0309-1708(98)00043-8, 1999a. a, b, c

Ciach, G. J. and Krajewski, W. F.: Radar-Rain Gauge Comparisons under Observational Uncertainties, J. Appl. Meteorol., 38, 1519–1525, https://doi.org/10.1175/1520-0450(1999)038<1519:RRGCUO>2.0.CO;2, 1999b. a

Collier, C. G.: Flash flood forecasting: What are the limits of predictability?, Q. J. Roy. Meteor. Soc., 133, 3–23, https://doi.org/10.1002/qj.29, 2007. a

Collier, C. G. and Knowles, J. M.: Accuracy of rainfall estimates by radar, part III: application for short-term flood forecasting, J. Hydrol., 83, 237–249, https://doi.org/10.1016/0022-1694(86)90154-X, 1986. a

Courty, L. G., Rico-Ramirez, M. A., and Pedrozo-Acuna, A.: The Significance of the Spatial Variability of Rainfall on the Numerical Simulation of Urban Floods, Water, 10, 1–17, https://doi.org/10.3390/w10020207, 2018. a, b

Cristiano, E., ten Veldhuis, M.-C., and van de Giesen, N.: Spatial and temporal variability of rainfall and their effects on hydrological response in urban areas – a review, Hydrol. Earth Syst. Sci., 21, 3859–3878, https://doi.org/10.5194/hess-21-3859-2017, 2017. a

Cunha, L. K., Mandapaka, P. V., Krajewski, W. F., Mantilla, R., and Bradley, A. A.: Impact of radar-rainfall error structure on estimated flood magnitude across scales: An investigation based on a parsimonious distributed hydrological model, Water Resour. Res., 48, W10515, https://doi.org/10.1029/2012WR012138, 2012. a

Cunha, L. K., Smith, J. A., Krajewski, W. F., Baeck, M. L., and Seo, B.-C.: NEXRAD NWS Polarimetric Precipitation Product Evaluation for IFloodS, J. Hydrometeorol., 16, 1676–1699, https://doi.org/10.1175/JHM-D-14-0148.1, 2015. a, b

Dai, Q. and Han, D.: Exploration of discrepancy between radar and gauge rainfall estimates driven by wind fields, Water Resour. Res., 50, 8571–8588, https://doi.org/10.1002/2014WR015794, 2014. a, b

Delrieu, G., Nicol, J., Yates, E., Kirstetter, P.-E., Creutin, J.-D., Anquetin, S., Obled, C., Saulnier, G.-M., Ducrocq, V., Gaume, E., Payrastre, O., Andrieu, H., Ayral, P.-A., Bouvier, C., Neppel, L., Livet, M., Lang, M., du Châtelet, J., Walpersdorf, A., and Wobrock, W.: The Catastrophic Flash-Flood Event of 8-9 September 2002 in the Gard Region, France: A First Case Study for the Cévennes-Vivarais Mediterranean Hydrometeorological Observatory, J. Hydrometeorol., 6, 34–52, https://doi.org/10.1175/JHM-400.1, 2005. a

Delrieu, G., Wijbrans, A., Boudevillain, B., Faure, D., Bonnifait, L., and Kirstetter, P.-E.: Geostatistical radar-raingauge merging: A novel method for the quantification of rain estimation accuracy, Adv. Water Resour., 71, 110–124, https://doi.org/10.1016/j.advwatres.2014.06.005, 2014. a, b

Dupasquier, B., Andrieu, H., Delrieu, G., Griffith, R. J., and Cluckie, I.: Influence of the VRP on High Frequency Fluctuations Between Radar and Raingage Data, Phys. Chem. Earth, 25, 1021–1025, https://doi.org/10.1016/S1464-1909(00)00146-5, 2000. a

Einfalt, T., Arnbjerg-Nielsen, K., Golz, C., Jensen, N. E., Quirmbach, M., Vaes, G., and Vieux, B.: Towards a roadmap for use of radar rainfall data in urban drainage, J. Hydrol., 299, 186–202, https://doi.org/10.1016/j.jhydrol.2004.08.004, 2004. a

Fairman, J. G., Schultz, D. M., Kirshbaum, D. J., Gray, S. L., and Barrett, A. I.: Climatology of Size, Shape, and Intensity of Precipitation Features over Great Britain and Ireland, J. Hydrometeorol., 18, 1595–1615, https://doi.org/10.1175/JHM-D-16-0222.1, 2017. a

Gill, R. S., Overgaard, S., and Bøvith, T.: The Danish weather radar network, in: Proceedings of Fourth European Conference on Radar in Meteorology and Hydrology (ERAD), Barcelona, Spain, 1–4, 2006. a, b

Goudenhoofdt, E. and Delobbe, L.: Evaluation of radar-gauge merging methods for quantitative precipitation estimates, Hydrol. Earth Syst. Sci., 13, 195–203, https://doi.org/10.5194/hess-13-195-2009, 2009. a

Goudenhoofdt, E., Delobbe, L., and Willems, P.: Regional frequency analysis of extreme rainfall in Belgium based on radar estimates, Hydrol. Earth Syst. Sci., 21, 5385–5399, https://doi.org/10.5194/hess-21-5385-2017, 2017. a, b, c

Gourley, J. J., Tabary, P., and Parent-du Chatelet, J.: Data quality of the Meteo-France C-band polarimetric radar, J. Atmos. Ocean. Tech., 23, 1340–1356, https://doi.org/10.1175/JTECH1912.1, 2006. a

Gourley, J. J., Tabary, P., and Parent-du Chatelet, J.: A fuzzy logic algorithm for the separation of precipitating from nonprecipitating echoes using polarimetric radar observations, J. Atmos. Ocean. Tech., 24, 1439–1451, https://doi.org/10.1175/JTECH2035.1, 2007. a

Gu, J.-Y., Ryzhkov, A., Zhang, P., Neilley, P., Knight, M., Wolf, B., and Lee, D.-I.: Polarimetric Attenuation Correction in Heavy Rain at C Band, J. Appl. Meteorol. Clim., 50, 39–58, https://doi.org/10.1175/2010JAMC2258.1, 2011. a

He, X., Sonnenborg, T. O., Refsgaard, J. C., Vejen, F., and Jensen, K. H.: Evaluation of the value of radar QPE data and rain gauge data for hydrological modeling, Water Resour. Res., 49, 5989–6005, https://doi.org/10.1002/wrcr.20471, 2013. a, b

Holleman, I.: Bias adjustment and long-term verification of radar-based precipitation estimates, Meteorol. Appl., 14, 195–203, https://doi.org/10.1002/met.22, 2007. a, b

Holleman, I. and Beekhuis, H.: Review of the KNMI clutter removal scheme, Tech. Rep. TR-284, Royal Netherlands Meteorological Institute KNMI, available at: https://www.knmi.nl/publications/fulltexts (last access: 15 June 2020), 2005. a

Holleman, I., Huuskonen, A., Kurri, M., and Beekhuis, H.: Operational monitoring of weather radar receiving chain using the sun, J. Atmos. Ocean. Tech., 27, 159–166, https://doi.org/10.1175/2009JTECHA1213.1, 2010. a

Huuskonen, A., Saltikoff, E., and Holleman, I.: The Operational Weather Radar Network in Europe, B. Am. Meteorol. Soc., 95, 897–907, https://doi.org/10.1175/BAMS-D-12-00216.1, 2014. a

KNMI: Handbook for the Meteorological Observation, Tech. rep., Koninklijk Nederlands Meteorologisch Instituut, De Bilt, Netherlands, available at: http://projects.knmi.nl/hawa/pdf/Handbook_H01_H06.pdf (last access: 15 June 2020), 2000. a

Koistinen, J. and Pohjola, H.: Estimation of Ground-Level Reflectivity Factor in Operational Weather Radar Networks Using VPR-Based Correction Ensembles, J. Appl. Meteorol. Clim., 53, 2394–2411, https://doi.org/10.1175/JAMC-D-13-0343.1, 2014. a

Krajewski, W. F.: Cokriging radar-rainfall and rain-gauge data, J. Geophys. Res.-Atmos., 90, 9571–9580, https://doi.org/10.1029/JD092iD08p09571, 1987. a

Krajewski, W. F. and Smith, J. A.: Radar hydrology: rainfall estimation, Adv. Water Resour., 25, 1387–1394, https://doi.org/10.1016/j.advwatres.2005.03.018, 2002. a

Krajewski, W. F., Villarini, G., and Smith, J. A.: RADAR-Rainfall Uncertainties: Where are we after Thirty Years of Effort?, B. Am. Meteor. Soc., 91, 87–94, https://doi.org/10.1175/2009BAMS2747.1, 2010. a, b, c

Lee, G.: Sources of errors in rainfall measurements by polarimetric radar: variability of drop size distributions, observational noise, and variation of relationships between R and polarimetric parameters, J. Atmos. Ocean. Tech., 23, 1005–1028, 2006. a

Leinonen, J., Moisseev, D., Leskinen, M., and Petersen, W. A.: A Climatology of Disdrometer Measurements of Rainfall in Finland over Five Years with Implications for Global Radar Observations, J. Appl. Meteorol. Clim., 51, 392–404, https://doi.org/10.1175/JAMC-D-11-056.1, 2012. a, b

Löwe, R., Thorndahl, S., Mikkelsen, P. S., Rasmussen, M. R., and Madsen, H.: Probabilistic online runoff forecasting for urban catchments using inputs from rain gauges as well as statically and dynamically adjusted weather radar, J. Hydrol., 512, 397–407, https://doi.org/10.1016/j.jhydrol.2014.03.027, 2014. a, b

Madsen, H., Mikkelsen, P. S., Rosbjerg, D., and Harremoës, P.: Estimation of regional intensity-duration-frequency curves for extreme precipitation, Water Sci. Technol., 37, 29–36, https://doi.org/10.1016/S0273-1223(98)00313-8, 1998. a, b

Madsen, H., Gregersen, I. B., Rosbjerg, D., and Arnbjerg-Nielsen, K.: Regional frequency analysis of short duration rainfall extremes using gridded daily rainfall data as co-variate, Water Sci. Technol., 75, 1971–1981, https://doi.org/10.2166/wst.2017.089, 2017. a, b

Matrosov, S. Y., Cifelli, R., Kennedy, P. C., Nesbitt, S. W., Rutledge, S. A., Bringi, V. N., and Martner, B. E.: A comparative study of rainfall retrievals based on specific differential phase shifts at X- and S-band radar frequencies, J. Atmos. Ocean. Tech., 23, 952–963, https://doi.org/10.1175/JTECH1887.1, 2006. a

Matrosov, S. Y., Clark, K. A., and Kingsmill, D. E.: A polarimetric radar approach to identify rain, melting-layer, and snow regions for applying corrections to vertical profiles of reflectivity, J. Appl. Meteorol. Clim., 46, 154–166, 2007. a

Michelson, D.: The Swedish weather radar production chain, in: Proceedings of Fourth European Conference on Radar in Meteorology and Hydrology (ERAD), Barcelona, Spain, 382–385, 2006. a

Michelson, D., Henja, A., Ernes, S., Haase, G., Koistinen, J., Ośródka, K., Peltonen, T., Szewczykowski, M., and Szturc, J.: BALTRAD Advanced Weather Radar Networking, J. Open Res. Softw., 6, 1–12, https://doi.org/10.5334/jors.193, 2018. a, b

Nielsen, J. E., Thorndahl, S. L., and Rasmussen, M. R.: A Numerical Method to Generate High Temporal Resolution Precipitation Time Series by Combining Weather Radar Measurements with a Nowcast Model, Atmos. Res., 138, 1–12, https://doi.org/10.1016/j.atmosres.2013.10.015, 2014. a

Niemi, T. J., Warsta, L., Taka, M., Hickman, B., Pulkkinen, S., Krebs, G., Moisseev, D. N., Koivusalo, H., and Kokkonen, T.: Applicability of open rainfall data to event-scale urban rainfall-runoff modelling, J. Hydrol., 547, 143–155, https://doi.org/10.1016/j.jhydrol.2017.01.056, 2017. a

Norin, L., Devasthale, A., L'Ecuyer, T. S., Wood, N. B., and Smalley, M.: Intercomparison of snowfall estimates derived from the CloudSat Cloud Profiling Radar and the ground-based weather radar network over Sweden, Atmos. Meas. Tech., 8, 5009–5021, https://doi.org/10.5194/amt-8-5009-2015, 2015. a, b

Ntelekos, A. A., Smith, J. A., and Krajewski, W. F.: Climatological Analyses of Thunderstorms and Flash Floods in the Baltimore Metropolitan Region, J. Hydrometeorol., 8, 88–101, https://doi.org/10.1175/JHM558.1, 2007. a

Nystuen, J. A.: Relative performance of automatic rain gauges under different rainfall conditions, J. Atmos. Ocean. Tech., 16, 1025–1043, https://doi.org/10.1175/1520-0426(1999)016<1025:RPOARG>2.0.CO;2, 1999. a, b

Ochoa-Rodriguez, S., Wang, L.-P., Gires, A., Pina, R. D., Reinoso-Rondinel, R., Bruni, G., Ichiba, A., Gaitan, S., Cristiano, E., van Assel, J., Kroll, S., Damian Murlà-Tuyls, D., Tisserand, B., Schertzer, D., Tchiguirinskaia, I., Onof, C., Willems, P., and ten Veldhuis, M.-C.: Impact of spatial and temporal resolution of rainfall inputs on urban hydrodynamic modelling outputs: A multi-catchment investigation, J. Hydrol., 531, 389–407, https://doi.org/10.1016/j.jhydrol.2015.05.035, 2015. a

Ogden, F. L. and Julien, P. Y.: Runoff model sensitivity to radar rainfall resolution, J. Hydrol., 158, 1–18, 1994. a

Otto, T. and Russchenberg, H. W. J.: Estimation of specific differential phase and differential backscatter phase from polarimetric weather radar measurements of rain, IEEE Geosci. Remote Sens. Lett., 8, 988–992, https://doi.org/10.1109/LGRS.2011.2145354, 2011. a

Overeem, A., Buishand, T. A., and Holleman, I.: Extreme rainfall analysis and estimation of depth-duration-frequency curves using weather radar, Water Resour. Res., 45, W10424, https://doi.org/10.1029/2009WR007869, 2009a. a, b

Overeem, A., Holleman, I., and Buishand, T. A.: Derivation of a 10-year radar-based climatology of rainfall, J. Appl. Meteorol. Clim., 48, 1448–1463, https://doi.org/10.1175/2009JAMC1954.1, 2009b. a, b, c, d, e

Overeem, A., Buishand, T. A., Holleman, I., and Uijlenhoet, R.: Extreme value modeling of areal rainfall from weather radar, Water Resour. Res., 46, W09514, https://doi.org/10.1029/2009WR008517, 2010. a

Peleg, N., Marra, F., Fatichi, S., Paschalis, A., Molnar, P., and Burlando, P.: Spatial variability of extreme rainfall at radar subpixel scale, J. Hydrol., 556, 922–933, https://doi.org/10.1016/j.jhydrol.2016.05.033, 2018. a

Pollock, M. D., O'Donnell, G., Quinn, P., Dutton, M., Black, A., Wilkinson, M., Colli, M., Stagnaro, M., Lanza, L. G., Lewis, E., Kilsby, C. G., and O'Connell, P. E.: Quantifying and Mitigating Wind-Induced Undercatch in Rainfall Measurements, Water Resour. Res., 54, 3863–3875, https://doi.org/10.1029/2017WR022421, 2018. a, b

Rafieeinasab, A., Norouzi, A., Kim, S., Habibi, H., Nazari, B., Seo, D.-J., Lee, H., Cosgrove, B., and Cui, Z.: Toward high-resolution flash flood prediction in large urban areas – Analysis of sensitivity to spatiotemporal resolution of rainfall input and hydrologic modeling, J. Hydrol., 531, 370–388, https://doi.org/10.1016/j.jhydrol.2015.08.045, 2015. a, b

Rickenbach, T. M., Nieto-Ferreira, R., Zarzar, C., and Nelson, B.: A seasonal and diurnal climatology of precipitation organization in the southeastern United States, Q. J. Roy. Meteor. Soc., 141, 1938–1956, https://doi.org/10.1002/qj.2500, 2015. a

Rico-Ramirez, M. A., Liguori, S., and Schellart, A. N. A.: Quantifying radar-rainfall uncertainties in urban drainage flow modelling, J. Hydrol., 528, 17–28, https://doi.org/10.1016/j.jhydrol.2015.05.057, 2015. a

Rodríguez-Iturbe, I. and Mejía, J. M.: On the transformation of point rainfall to areal rainfall, Water Resour. Res., 10, 729–735, https://doi.org/10.1029/WR010i004p00729, 1974. a

Rossa, A., Liechti, K., Zappa, M., Bruen, M., Germann, U., Haase, G., Keil, C., and Krahe, P.: The COST 731 Action: a review on uncertainty propagation in advanced hydro-meteorological forecast systems, Atmos. Res., 100, 150–167, https://doi.org/10.1016/j.atmosres.2010.11.016, 2011. a

Ruzanski, E., Chandrasekar, V., and Wang, Y. T.: The CASA nowcasting system, J. Atmos. Ocean. Tech., 28, 640–655, https://doi.org/10.1175/2011JTECHA1496.1, 2011. a

Ryzhkov, A. and Zrnic, D. S.: Assessment of rainfall measurement that uses specific differential phase, J. Appl. Meteorol., 35, 2080–2090, https://doi.org/10.1175/1520-0450(1996)035<2080:AORMTU>2.0.CO;2, 1996. a

Ryzhkov, A. V. and Zrnic, D. S.: Discrimination between rain and snow with a polarimetric radar, J. Appl. Meteorol., 37, 1228–1240, 1998. a

Saltikoff, E., Haase, G., Delobbe, L., Gaussiat, N., Martet, M., Idziorek, D., Leijnse, H., Novák, P., Lukach, M., and Stephan, K.: OPERA the Radar Project, Atmosphere, 10, 1–13, 2019. a

Schilling, W.: Rainfall data for urban hydrology: what do we need?, Atmos. Res., 27, 5–21, https://doi.org/10.1016/0169-8095(91)90003-F, 1991. a, b

Seo, B.-C., Dolan, B., Krajewski, W. F., Rutledge, S. A., and Petersen, W.: Comparison of Single- and Dual-Polarization-Based Rainfall Estimates Using NEXRAD Data for the NASA Iowa Flood Studies Project, J. Hydrometeorol., 16, 1658–1675, https://doi.org/10.1175/JHM-D-14-0169.1, 2015. a, b

Sieck, L. C., Burges, S. J., and Steiner, M.: Challenges in obtaining reliable measurements of point rainfall, Water Resour. Res., 43, W01420, https://doi.org/10.1029/2005WR004519, 2007. a, b

Smith, J. A. and Krajewski, W. F.: Estimation of the Mean Field Bias of Radar Rainfall Estimates, J. Appl. Meteorol., 30, 397–412, https://doi.org/10.1175/1520-0450(1991)030<0397:EOTMFB>2.0.CO;2, 1991. a, b

Smith, J. A., Seo, D. J., Baeck, M. L., and Hudlow, M. D.: An intercomparison study of NEXRAD precipitation estimates, Water Resour. Res., 32, 2035–2045, https://doi.org/10.1029/96WR00270, 1996. a

Smith, J. A., Baeck, M. L., Meierdiercks, K. L., Miller, A. J., and Krajewski, W. F.: Radar rainfall estimation for flash flood forecasting in small urban watersheds, Adv. Water Resour., 30, 2087–2097, https://doi.org/10.1016/j.advwatres.2006.09.007, 2007. a

Smith, J. A., Baeck, M. L., Villarini, G., Welty, C., Miller, A. J., and Krajewski, W. F.: Analyses of a long-term, high-resolution radar rainfall data set for the Baltimore metropolitan region, Water Resour. Res., 48, W04504, https://doi.org/10.1029/2011WR010641, 2012. a, b

Stevenson, S. N. and Schumacher, R. S.: A 10-Year Survey of Extreme Rainfall Events in the Central and Eastern United States Using Gridded Multisensor Precipitation Analyses, Mon. Weather Rev., 142, 3147–3162, https://doi.org/10.1175/MWR-D-13-00345.1, 2014. a

Stransky, D., Bares, V., and Fatka, P.: The effect of rainfall measurement uncertainties on rainfall-runoff processes modelling, Water Sci. Technol., 55, 103–111, 2007. a

Thomsen, R. S. T.: Drift af Spildevandskomitéens RegnmålersystemÅrsnotat 2015, Tech. rep., DMI, Copenhagen, available at: https://www.dmi.dk/fileadmin/user_upload/Rapporter/TR/2016/DMI_Report_16_3.pdf (last access: 13 December 2019), 2016. a

Thorndahl, S., Nielsen, J. E., and Rasmussen, M. R.: Bias adjustment and advection interpolation of long-term high resolution radar rainfall series, J. Hydrol., 508, 214–226, https://doi.org/10.1016/j.jhydrol.2013.10.056, 2014a. a, b

Thorndahl, S., Smith, J. A., Baeck, M. L., and Krajewski, W. F.: Analyses of the temporal and spatial structures of heavy rainfall from a catalog of high-resolution radar rainfall fields, Atmos. Res., 144, 111–125, https://doi.org/10.1016/j.atmosres.2014.03.013, 2014b. a

Thorndahl, S., Nielsen, J. E., and Jensen, D. G.: Urban pluvial flood prediction: a case study evaluating radar rainfall nowcasts and numerical weather prediction models as model inputs, Water Sci. Technol., 74, 2599–2610, https://doi.org/10.2166/wst.2016.474, 2016. a

Thorndahl, S., Einfalt, T., Willems, P., Nielsen, J. E., ten Veldhuis, M.-C., Arnbjerg-Nielsen, K., Rasmussen, M. R., and Molnar, P.: Weather radar rainfall data in urban hydrology, Hydrol. Earth Syst. Sci., 21, 1359–1380, https://doi.org/10.5194/hess-21-1359-2017, 2017. a

Thorndahl, S. L., Nielsen, J. E., and Rasmussen, M. R.: Estimation of Storm-Centred Areal Reduction Factors from Radar Rainfall for Design in Urban Hydrology, Water, 11, 1120, https://doi.org/10.3390/w11061120, 2019. a, b

Tian, Y., Huffman, G. J., Adler, R. F., Tang, L., Sapiano, M., Maggioni, V., and Wu, H.: Modeling errors in daily precipitation measurements: Additive or multiplicative?, Geophys. Res. Lett., 40, 2060–2065, https://doi.org/10.1002/grl.50320, 2013. a

Vasiloff, S. V., Howard, K. W., and Zhang, J.: Difficulties with correcting radar rainfall estimates based on rain gauge data: a case study of severe weather in Montana on 16–17 June 2007, Weather Forecast., 24, 1334–1344, https://doi.org/10.1175/2009WAF2222154.1, 2009. a, b

Vejen, F.: Teknisk rapport 06-15, Nyt SVK system, Sammenligning af nedbørmålinger med nye og nuværende system, Tech. rep., DMI, Copenhagen, available at: https://www.dmi.dk/fileadmin/Rapporter/TR/tr06-15.pdf (last access: 13 December 2019), 2006. a

Villarini, G. and Krajewski, W. F.: Review of the Different Sources of Uncertainty in Single Polarization Radar-Based Estimates of Rainfall, Surv. Geophys., 31, 107–129, 2010. a, b

Villarini, G., Smith, J. A., Baeck, M. L., Sturdevant-Rees, P., and Krajewski, W. F.: Radar analyses of extreme rainfall and flooding in urban drainage basins, J. Hydrol., 381, 266–286, https://doi.org/10.1016/j.jhydrol.2009.11.048, 2010. a

Wang, Y. and Chandrasekar, V.: Algorithm for Estimation of the Specific Differential Phase, J. Atmos. Ocean. Tech., 26, 2565–2578, https://doi.org/10.1175/2009JTECHA1358.1, 2009. a

Wang, Y. T. and Chandrasekar, V.: Quantitative precipitation estimation in the CASA X-band dual-polarization radar network, J. Atmos. Ocean. Tech., 27, 1665–1676, https://doi.org/10.1175/2010JTECHA1419.1, 2010. a, b

Wessels, H. R. A. and Beekhuis, J. H.: Stepwise procedure for suppression of anomalous ground clutter, in: Proc. COST-75, Weather Radar Systems, International Seminar, Brussels, Belgium, 270–277, 1995. a

WMO: Guide to Meteorological Instruments and Methods of Observation, WMO-No.8, World Meteorological Organization, Geneva, 7th ed. edn., 2008. a

Wójcik, O. P., Holt, J., Kjerulf, A., Müller, L., Ethelberg, S., and Molbak, K.: Personal protective equipment, hygiene behaviours and occupational risk of illness after July 2011 flood in Copenhagen, Denmark, Epidemiol. Infect., 141, 1756–1763, https://doi.org/10.1017/S0950268812002038, 2013. a

Wood, S. J., Jones, D. A., and Moore, R. J.: Accuracy of rainfall measurement for scales of hydrological interest, Hydrol. Earth Syst. Sci., 4, 531–543, https://doi.org/10.5194/hess-4-531-2000, 2000. a

Wright, D. B., Smith, J. A., Villarini, G., and Baeck, M. L.: Hydroclimatology of flash flooding in Atlanta, Water Resour. Res., 48, W04524, https://doi.org/10.1029/2011WR011371, 2012. a

Wright, D. B., Smith, J. A., Villarini, G., and Baeck, M. L.: Long-Term High-Resolution Radar Rainfall Fields for Urban Hydrology, J. Am. Water Resour. As., 50, 713–734, https://doi.org/10.1111/jawr.12139, 2014. a, b

Yang, L., Smith, J., Baeck, M. L., Smith, B., Tian, F., and Niyogi, D.: Structure and evolution of flash flood producing storms in a small urban watershed, J. Geophys. Res.-Atmos., 121, 3139–3152, https://doi.org/10.1002/2015JD024478, 2016.  a

Yoo, C., Park, C., Yoon, J., and Kim, J.: Interpretation of mean-field bias correction of radar rain rate using the concept of linear regression, Hydrol. Process., 28, 5081–5092, https://doi.org/10.1002/hyp.9972, 2014. a

Young, C. B., Bradley, A. A., Krajewski, W. F., Kruger, A., and Morrisey, M. L.: Evaluating NEXRAD multisensor precipitation estimates for operational hydrologic forecasting, J. Hydrometeorol., 1, 241–254, 2000. a

Zhou, Z., Smith, J. A., Yang, L., Baeck, M. L., Chaney, M., Ten Veldhuis, M.-C., Deng, H., and Liu, S.: The complexities of urban flood response: Flood frequency analyses for the Charlotte metropolitan region, Water Resour. Res., 53, 7401–7425, https://doi.org/10.1002/2016WR019997, 2017. a

Zrnic, D. S. and Ryzhkov, A. V.: Advantages of rain measurements using specific differential phase, J. Atmos. Ocean. Tech., 13, 454–464, https://doi.org/10.1175/1520-0426(1996)013<0454:AORMUS>2.0.CO;2, 1996. a, b

Zrnic, D. S. and Ryzhkov, A. V.: Polarimetry for weather surveillance radars, B. Am. Meteor. Soc., 80, 389–406, 1999. a