Parametric distribution functions are commonly used to model precipitation amounts corresponding to different durations. The precipitation amounts themselves are crucial for stochastic rainfall generators and weather generators. Nonparametric kernel density estimates (KDEs) offer a more flexible way to model precipitation amounts. As already stated in their name, these models do not exhibit parameters that can be easily regionalized to run rainfall generators at ungauged locations as well as at gauged locations. To overcome this deficiency, we present a new interpolation scheme for nonparametric models and evaluate it for different temporal resolutions ranging from hourly to monthly. During the evaluation, the nonparametric methods are compared to commonly used parametric models like the two-parameter gamma and the mixed-exponential distribution. As water volume is considered to be an essential parameter for applications like flood modeling, a Lorenz-curve-based criterion is also introduced. To add value to the estimation of data at sub-daily resolutions, we incorporated the plentiful daily measurements in the interpolation scheme, and this idea was evaluated. The study region is the federal state of Baden-Württemberg in the southwest of Germany with more than 500 rain gauges. The validation results show that the newly proposed nonparametric interpolation scheme provides reasonable results and that the incorporation of daily values in the regionalization of sub-daily models is very beneficial.

Rainfall time series of differing temporal resolutions are needed for various
applications like water engineering design, flood modeling, risk assessments
and ecosystem and hydrological impact studies

For modeling precipitation, one crucial variable is the precipitation amount,
which follows a certain distribution. Distributions of the daily precipitation
amounts are strongly right skewed, with many small values and few large
values

Interpolate the precipitation amounts from observation points for every time step to the target location(s) and set up a distribution with the interpolated values.

Fit a distribution function to the precipitation amounts separately for each gauge and interpolate the distribution functions to the target location(s).

In most stochastic rainfall models, the theoretical parametric distribution
functions are fitted to the empirical values using, e.g., the exponential
distribution or the two-parametric gamma distribution

In the present work, we introduce a regionalization strategy for
nonparametric distributions and compare it to the traditional
regionalization of parametric distributions for varying temporal resolutions
from hourly to monthly scales. The common procedure to interpolate parametric
distribution functions is outlined as follows.

Fit a parametric distribution (e.g., a gamma or exponential distribution) at each sampling site to the empirical distribution function (EDF).

Interpolate the moment(s) or parameter(s) of the fitted parametric distribution.

Set up the theoretical cumulative distribution function (CDF) at every interpolation target with the interpolated moment(s) or parameter(s).

The newly proposed procedure for the nonparametric distribution functions is
the following.

Fit a nonparametric distribution to log-transformed rainfall values using a Gaussian kernel.

Estimate the interpolation (kriging) weights with the precipitation values of a certain quantile.

Apply these weights to the values of certain discrete quantiles.

Linearly interpolate the remaining quantile values to receive a continuous CDF for all target locations.

In

After describing the study region Baden-Württemberg in Sect.

The study region is the federal state of Baden-Württemberg, which is
located in the southwest of Germany. The Black Forest mountain range, in the
west, and the Swabian Alps, extending from southwest to
northeast, exhibit the highest elevations in Baden-Württemberg. The rising
of large-scale moist air masses across the mountainous regions causes higher
rainfall amounts on the windward side and lower amounts on the leeward side.
In the summer months, slopes with differing inclinations lead to a warming of
the air that triggers convection currents, leading to a greater number of
showers and thunderstorms over the mountainous regions. This shows a dependence
of rainfall on elevation with seasonal differences. The rain-bearing westerly
winds lead to high rainfall amounts in the Black Forest. The relatively lower
altitude of the Swabian Alps results in lower rainfall amounts as they lie in
the shadow of the Black Forest

The locations of the high-resolution (hourly and 5 min;

The years from 1997 to 2011 are chosen as the investigation period, as the German
Meteorological Service (DWD) set up many new rain gauges in 1997. A
relatively homogeneous data set is obtained by only choosing gauges with
observation periods greater than or equal to 5 years, which also provide rainfall
measurements for at least 80 % of the time steps within their observation
period. We had access to (i) 242 hourly and 5 min
resolution and (ii) 347 daily gauges available in the study region, with 80 sites
having both daily and high-resolution instruments. The observations are
provided by the DWD and the Environmental Agency of Baden-Württemberg
(LUBW). The high-resolution rain gauges are mostly equipped with tipping
buckets and gravimetric measurement devices

Modeling precipitation amounts in our context means estimating the distribution
functions. The usage of these distribution functions includes the implicit
assumption of temporally independent and identically distributed (i.i.d.)
variables. This assumption is generally accepted for daily rainfall as the
autocorrelation of consecutive nonzero daily precipitation is relatively
small and usually of less importance. For higher temporal resolutions, such
as hourly, autocorrelation needs to be incorporated in the model

For the applications of rainfall estimates, like hydrological or hydraulic
modeling, the correct representation of small rainfall values is not
necessary as their contribution to decisively high discharge rates is rather
small. Furthermore, tipping bucket gauges lead to wrong estimates, especially
for low rainfall values

Therefore, the quantile threshold (

After arranging the

The

The hourly threshold quantile values (QV

The basic rainfall information of the study region for different
aggregations (agg):

Based on the hourly values (1 h) of the high-resolution data set, the aggregated
rainfall values of different temporal resolutions are obtained: 2-hourly
(2 h), 3-hourly (3 h), 6-hourly (6 h) and 12-hourly (12 h). Through the aggregation
of the daily values (1 d) in the daily data set, 5-daily (5 d) and monthly (m)
values are obtained. In order to exclude small values and still consider the
values producing a high percentage of the water volume, the

For the estimation of the basic statistics in Table

This section focuses on the spatial dependence of the precipitation amount
distributions, as the applied interpolation technique of ordinary kriging
(OK) is based on the assumption that the variable of interest (the CDF) is
more likely to be dissimilar with increasing distances. For the purpose of
describing the development of the distribution functions in space, the test
statistic

The

For the calculations of

In the following subsections, nonparametric and parametric models for
precipitation amounts at single sites are introduced. Before estimating the
nonparametric or parametric distributions at each observation gauge, the
observations smaller than QV

The nonparametric KDEs for the precipitation amount distributions were previously
used and are described for the daily precipitation amounts in

The estimation of

To model the right-skewed precipitation amounts with their bounded support on

In this work, the symmetric Gaussian kernel with a prior transformation of
data to logarithms is chosen, as this is an implicit adaptive kernel method
with increasing bandwidths for increasing values and therefore alleviates the
need to choose variable bandwidths with skewed data

Finally, the bandwidth

The simplest and most widely used selection method is Silverman's rule
of thumb

The second method is a plug-in approach developed by

Instead of minimizing the mean integrated squared error,

Within the parametric procedure, five different parametric distributions are
used to model the precipitation amounts of all aggregations in this study.
The most commonly used models are the exponential distribution and the two-parameter
gamma distribution

For the exponential distribution with the parameter

For the two-parameter gamma distribution, they are

For the two-parameter Weibull distribution,

The mixed-exponential distribution exhibits the following functions:

The generalized Pareto distribution exhibits the following
PDF

The parametric distributions with more than two parameters are not considered, as
this would complicate the regionalization of the distributions due to the
dependencies among the parameters. For the three-parameter mixed-exponential
distribution, the parameter

In order to estimate the optimal parameter sets of the presented parametric
distributions for each rainfall gauge and temporal resolution, the method of moments (MOM) and the maximum
likelihood method (MLM) using a numerical maximization via a simplex algorithm
are applied. The MLM is applied to all
mentioned parametric distributions. In the special case of the mixed-exponential distribution, the parameter

The EDFs of the hourly

In order to establish the basis of the proposed regionalization procedure for
nonparametric models and to get a more detailed idea of the spatial
relationship of the distribution functions, the EDFs of the hourly and monthly
rainfall intensities from the gauge at Stuttgart/Schnarrenberg and its five
closest gauges are plotted in Fig.

The mean rank correlations

A more global look at the spatial relation between different EDFs can be
obtained with the Spearman's rank correlation

Figure

In Table

The control quantiles (

In the following, the regionalization of the point models in order to obtain the precipitation amount models at ungauged locations is described. The regionalization method OK is introduced first. Then, the approaches to regionalize the parametric and nonparametric distributions are explained.

As only a short overview of OK will be given, the interested reader is
referred to the common geostatistical literature, like

Gaussian model:

Spherical model:

Exponential model:

Matern model (

The next step within OK is solving the corresponding equation system to
estimate an interpolated value at an unobserved location

As already outlined in the Introduction, either the parameters

Kernel-smoothed distribution functions do not provide a parameter that can be
interpolated; thus, a procedure other than that for the parametric distributions
needs to be applied. When the spatial relation of the rainfall EDFs in
Sect.

For all gauges, the quantile values QV

With these QV

Considering these additional constraints, the OK equation system is solved
with a SCIPY implementation

As the high-resolution rain gauge monitoring network in the study area is
quite sparse and the corresponding time series are often incomplete, it would
be useful to include more dense and complete secondary information in the
interpolation of the sub-daily distributions. Therefore, the applicability of
the daily values to improve their interpolation is investigated, as the daily
monitoring network has a higher density. The simple disaggregation strategy
(rescaled nearest neighbor) of

Choose a daily target gauge and allocate the sub-daily rainfall values of the closest (concerning the horizontal distance) high-resolution gauge to it.

Aggregate the sub-daily values of the high-resolution gauge to the daily values

Multiply all of the sub-daily values of the nearest gauge by this scaling factor. The scaling factor changes from day to day and simply ensures that the daily sums of the disaggregated sub-daily values at the target gauge equal the daily values measured at the target.

Repeat steps 1 to 3 for all daily gauges.

Calculate the sub-daily statistic of interest from these scaled values at every daily gauge and incorporate them in the interpolation procedure.

The applicability of this procedure is tested with a cross-validation, which is described in Sect. S3. For the incorporation of the daily values within the regionalization of the parametric and nonparametric sub-daily distributions, a special regionalization technique is not needed. The rescaling method (NNS) is applied to all available daily gauges. If for a certain day no hourly values are available for the closest gauge, the next closest gauge is used for the rescaling of that day in order to increase the sub-daily sample size at the daily gauge. After obtaining the sub-daily values at the daily gauges, they are simply treated as additional control points for the regionalization.

This section is divided into three parts. In Sect.

The validation of the precipitation amount models at point locations and their regionalization is evaluated with two different quality measures. These quality measures need to be measures considering the CDF and not the PDF, as the interpolation of the nonparametric distributions only provides CDFs for ungauged locations.

The most common goodness-of-fit test to estimate the quality of fitted distributions is the Kolmogorov–Smirnov test. As distributions of precipitation amounts are positively skewed, most of the values are small or medium values, which leads to the highest gradient of the CDF for these values. Therefore, a greater difference in the corresponding CDF quantiles would be more likely and would govern the Kolmogorov–Smirnov test. However, these medium values are less important than the higher precipitation amounts for most of the precipitation model applications.

For this reason, the Cramér–von Mises criterion as a more integral measure
and a Lorenz-curve-based measure, which allows for conclusions about the
representation of the water volume, are used. The Cramér–von Mises
criterion

As already mentioned in Sect.

The differences in the Lorenz curves are only estimated for values greater
than QV

To determine an overall performance ranking for the remaining models, the arithmetic mean and the median over the number of gauges
for
both measures of quality (the Cramér–von Mises criterion

The mean and median of the two quality measures

The performance ranking numbers of the precipitation amount models for the pointwise estimations. The underlined numbers indicate the best parametric (P) and nonparametric (NP) models. The bold numbers indicate the best overall model.

To combine the four statistics (the mean and median of

With the ranking numbers, the best-performing precipitation amount model is
estimated for each season and temporal resolution. Among the nonparametric
methods (NP), Silverman's rule of thumb (SRT) and the plug-in approach of

Empirical (data), nonparametric (SRT) and parametric
(Mixed-Exp) CDF and Lorenz curve examples for the hourly (1 h,

The performance ranking of the different methods is quite similar in winter
and summer. The nonparametric methods always lead to better performances
concerning the Cramér–von Mises criterion

The parameter estimation through MOM in combination with the Weibull distribution performs better for the higher aggregations, which exhibit more symmetric distributions. For the daily and sub-daily aggregations, the MLM parameter estimation in combination with the mixed-exponential distribution mostly leads to the best results.

The overall performance is best with the mixed-exponential distribution for temporal resolutions between 2 hours (2 h) and 1 day (1 d) in both seasons. For the hourly distribution (1 h), the nonparametric models show the best overall performance in the summer season and the third-best performance after the generalized Pareto (Pareto-MLM and Pareto-MOM) distribution in the winter season. For the monthly resolution (m), the Weibull distribution exhibits the best overall performance in both seasons. For the 5-daily resolution, the MOM estimation provides the best result in winter (Pareto-MOM) and summer (Weibull-MOM).

The locations of the two 2-fold cross-validation samples for the sub-daily

In order to estimate the quality of the regionalized precipitation amount
models, a 2-fold cross-validation (split sampling) is used. Two equally sized
samples of observation points are randomly generated (Fig.

Following the results of the pointwise estimation in the previous section,
only the Weibull-MOM and the Mixed-Exp-MLM models among the parametric models
are investigated for the regionalization, as they show good performance for
differing aggregations. They are both investigated for all aggregations to
test the difference for interpolated moments or parameters, except for the
monthly aggregation, for which only the Weibull distribution is investigated.
In order to regionalize the Weibull-MOM model, the mean and standard
deviation are spatially interpolated. For the regionalization of the
Mixed-Exp-MLM model, the parameters

Illustrations for the kriging procedure of the nonparametric
distributions with the daily values (1 d) in the summer season using calibration
sample 1 (see Fig.

As the two nonparametric approaches SRT and SJ show very similar results
during the pointwise estimation, only the SRT approach is interpolated. For
the regionalization of the nonparametric model, the QV

The performance ranking numbers for the 2-fold cross-validation of the regionalized precipitation amount models in the winter season. The underlined numbers indicate the best parametric (P) and nonparametric (NP) models. The bold numbers indicate the best overall model for each validation sample and temporal resolution.

The first step during the regionalization procedure is the estimation of the
theoretical variograms. The interpolation variables of the three
precipitation amount models, for which theoretical variograms need to be
estimated for the two seasons and eight temporal resolutions, are as follows.

P-Mixed-Exp-MLM:

P-Weibull-MOM: mean, standard deviation.

NP-SRT: QV

During the estimation of the parameters of the Weibull distribution with MOM,
QV

The performance ranking numbers for the 2-fold cross-validation of the regionalized precipitation amount models in the summer season. The underlined numbers indicate the best parametric (P) and nonparametric (NP) models. The bold numbers indicate the best overall model for each validation sample and temporal resolution.

It is difficult to compare the spatial persistence of

The regionalization of the precipitation amount models is evaluated with the
same quality measures as the pointwise estimation, the Cramér–von Mises
criterion

OK-MOM: OK of the Weibull distribution fitted with MOM.

OK-MLM: OK of the mixed-exponential distribution fitted with MLM.

OK-MOM Daily: OK of the Weibull distribution including the scaled NNS values of the daily gauges (only for the sub-daily aggregations).

OK-MLM Daily: OK of the mixed-exponential distribution including the scaled NNS values of the daily gauges (only for the sub-daily aggregations).

The interpolation approaches for the nonparametric models are as follows.

PK-NP: PK of the nonparametric models, which are estimated using SRT.

PK-NP Daily: PK of the nonparametric models including the scaled NNS values of the daily gauges (only for the sub-daily aggregations).

In Fig.

In Tables

Comparing the nonparametric interpolation approaches with the parametric interpolation approaches shows that the nonparametric approach performs best for hourly (1 h) values for both calibration samples in both seasons. This is in line with the pointwise estimations, for which the nonparametric approaches also produced very good results for the hourly resolution in both seasons.

It is obvious that using the scaled values of the daily gauges is very beneficial, as the approaches incorporating these values almost always include the best-performing method, except for the 12 h aggregation in the summer season.

As a benchmark, the interpolation results are also shown for the parametric and nonparametric estimates of the nearest neighbors (NN) and additionally using scaled daily gauges for the sub-daily aggregations (NNS). Among the benchmark methods, the NNS approaches perform better than the simpler NN approaches for the sub-daily aggregations, except for the 12-hourly (12 h) resolution in summer. Since the best interpolation approach almost always, with only three exceptions, performs better than the best nearest neighbor approach, the regionalization of the distributions seems to be worthwhile.

Comparing different modeling schemes for the precipitation amounts at point
locations (see Table

The regionalization of the precipitation amount models showed (see Tables

As auxiliary variables, the use of daily gauges for sub-daily resolutions is very beneficial, as was suggested by our data analysis in Sect. S3 and is also proven by the evaluation of the regionalization.

In general, the regionalization of the distributions seems to be worthwhile as it nearly always performs better than the nearest neighbor (horizontal distance) approaches, which would be the simplest estimate. As lower rainfall values were excluded from this study due to their minor importance and measurement errors, the results are not directly comparable to those of most of the other publications within this research field.

The difficulty for nonparametric distributions in representing water volumes
may be reduced by using the Epanechnikov kernel with finite support as
proposed by

The sub-daily precipitation data sets used here were obtained
from the LUBW during various research projects and are not available to the
public as far as the authors know. Therefore, they can not be provided by the
authors. The daily data set was downloaded from the WebWerdis homepage
(

The authors declare that they have no conflict of interest.

The work of many people developing different libraries in the PYTHON
programming language