Regionalizing nonparametric models of precipitation amounts on different temporal scales

Parametric distribution functions are commonly used to model precipitation amounts corresponding to different durations. The precipitation amounts themselves are crucial for stochastic rainfall generators and weather generators. Nonparametric kernel density estimates (KDEs) offer a more flexible way to model precipitation amounts. As already stated in their name, these models do not exhibit parameters that can be easily regionalized to run rainfall generators at ungauged locations as well as at gauged locations. To overcome this deficiency, we present a new interpolation scheme for nonparametric models and evaluate it for different temporal resolutions ranging from hourly to monthly. During the evaluation, the nonparametric methods are compared to commonly used parametric models like the twoparameter gamma and the mixed-exponential distribution. As water volume is considered to be an essential parameter for applications like flood modeling, a Lorenz-curve-based criterion is also introduced. To add value to the estimation of data at sub-daily resolutions, we incorporated the plentiful daily measurements in the interpolation scheme, and this idea was evaluated. The study region is the federal state of BadenWürttemberg in the southwest of Germany with more than 500 rain gauges. The validation results show that the newly proposed nonparametric interpolation scheme provides reasonable results and that the incorporation of daily values in the regionalization of sub-daily models is very beneficial.

S1 Regionalization example 1 S1 Regionalization example Two general possibilities to obtain precipitation amount distributions at ungauged locations exist. The first approach is the interpolation of rainfall values for every time step to the target location, followed by an estimation of the distribution function with the interpolated values (values inter ). The second approach is first fitting a distribution function to all control locations, which is followed by an interpolation of these distributions to the target location (cdf inter ). In the following, these possibilities will be compared with each other to motivate the use of the cdf inter approach, which is the method used within our investigations. Although it is commonly accepted to follow the cdf inter approach to obtain precipitation amount distributions at ungauged locations for stochastic rainfall models, we still want to illustrate the deficiencies of the values inter method to motivate the cdf inter approach empirically. Additionally, the resulting estimation errors also appear when rainfall values are interpolated without considering the CDF explicitly. For example the use of interpolated rainfall values for hydrological models may introduce a bias in the discharge estimation caused by poor interpolation results.
In order to ensure equal interpolation weights i of the control gauges i for both possibilities, a simple inverse distance weighting (IDW) is used as interpolation technique in this example, which is based on the following Eq. S1: where d i is the distance between control gauge i and the respective target gauge. For this interpolation example IDW is preferred over OK for the following reasons: (i) Using OK with daily precipitation values (values inter ) would lead to the additional challenge of including zero rainfall values within the estimation of the variogram and the kriging itself. The focus of this paper, however, does not lie on interpolating rainfall values, therefore, the simpler IDW method is used for interpolating rainfall values. (ii) IDW leads to the same interpolation weights for both approaches and therefore assures that the better performance of one of the approaches does not originate from the calculation of the weights, but from the chosen interpolation scheme (cdf inter or values inter ). In the research article, OK is preferred over IDW, because OK is considered as a better interpolation method than the simpler IDW. The nonparametric KDE using SRT for the bandwidth selection is applied for estimating distribution functions at the control gauges.
Another exception within this regionalization example is the inclusion of zero values to show the advantages of interpolating distributions instead of precipitation values regarding P 0 . Zero values can be included within the interpolation of nonparametric distributions by applying the following steps. (i) Fitting a distribution to all precipitation values at each gauge. (ii) Estimate the quantile values for certain quantiles (non-exceedance probabilities) over the whole probability range (0-1) with the inverse of the fitted distributions at each gauge. (iii) Use the interpolation weights from IDW to interpolate the quantile values of different gauges for each chosen quantile. (iv) If the quantile is below P 0 for some (or all) gauges, the quantile value at these gauges will be 0 mm, which are then just included in the interpolation. (v) The highest quantile with 0 mm at the target gauge defines P 0 at the target.
In our example the distribution of daily rainfall values (1D) for the gauge Esslingen / Neckar is estimated from rainfall values of 30 neighboring gauges (see Fig. S1 (a)). In Fig. S1 (b) and (c), parts of the distribution functions resulting from both methods and the original EDF are shown. Clear disadvantages of the values inter method are the overestimation of days with rainfall and thus an underestimation of the probability of no rainfall ( Fig. S1 (b)) and a clear underestimation of the CDF for higher quantiles (Fig. S1 (c)).
As the cdf inter method does not provide rainfall values automatically, which are needed to calculate basic statistical measures, random rainfall values are generated with the inverse of the interpolated CDF. The number of these random values is equivalent to the number of observed daily rainfall values of the validation gauge. In Table S1 basic statistics of precipitation amounts are listed for both methods and observations. Looking at the mean values of all rainfall (x) values, the values inter method seems to reproduce this statistic very well. Considering the other statistics in Table S1 and Fig. S1 this is most probably caused by two disadvantages of this method: an overestimation of days with small rainfall amounts (see P 0 ) and a simultaneous underestimation of higher rainfall intensities (see x >0 and max). This argument is reaffirmed by the smaller standard deviation of values inter and the illustrations of the precipitation amount distributions in Fig. S1. The cdf inter method mainly provides better results summarizing the listed statistics. Only a tendency of overestimating high rainfall intensities can be observed.
As the values inter method has great problems in reproducing probabilities of zero rainfall and the shape of the distribution function, this method is not recommended to be used with rainfall over a great range of aggregations. For higher aggregations these disadvantages may have no noticeable effect, but for smaller aggregations with a greater skewness the problems might even increase. This would lead to a more pronounced underestimation of high quantile values, which are mostly the decisive ones for subsequent applications. As S1 Regionalization example 3 3 4 9 3 . 5 3 4 9 9 . 5 3 5 0 5 . 5 3 5 1 1 . 5 3 5 1 7 . 5 3 5 2 3 . 5 3 5 2 9 . 5 3 5 3 5 . 5 3 5 4 1 . 5 3 5 4 7 . 5 3 5 5 3 . 5 3 5 5 9 . 5 3 5 6 5 . 5 easting ( the cdf inter method exhibits better results concerning the basic rainfall volume statistics, it seems to be the better choice for the purpose of interpolating precipitation amount models.

S3 Usage of daily values for sub-daily values -Empirical cross validation (corresponds to section 9 in the research article)
To estimate the usage of daily observations for sub-daily distribution functions with the rescaling procedure described in section 9 of the research article, a cross validation is applied based on the high resolution gauges only, which are used as daily gauges one after another. The resulting sub-daily statistics of scaled values for these pseudo daily gauges are compared to their original sub-daily values by calculating the mean squared errors over all gauges. The scaled nearest neighbor values are compared to nearest neighbor values and to interpolated rainfall values. The interpolation is done by OK with ten neighbors using a single variogram model. During the cross validation a nearest neighbor gauge is defined as the gauge with the closest distance and at least 50 % of data overlapping. For the interpolation of the rainfall values with OK then again only this data overlapping period is chosen.
In Fig. S3 the results are shown for quantile values, but the standard deviation, the mean values and QV th were also investigated. The cross validation of the different statistical variables are very similar. For all of them the scaled nearest neighbor values (NNS) lead to the best results in summer and winter. Therefore daily gauges seem to be useful for the interpolation of sub-daily nonparametric and parametric models.  Figure S3: The mean squared errors (mse) for quantile values of discrete quantiles (in 0.001 steps) greater than Q th (see Table 1 in the research article) (a) and greater than 0.995 (b) in winter (dotted) and summer (dashed) for nearest neighbor (NN), nearest neighbor scaled (NNS) and OK of rainfall values over different aggregations. At first the mean squared error over discrete quantiles is calculated for each gauge which is followed by calculating the mean of these over the whole study region.