Geostatistical radar-raingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland

. Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semi-variogram models are ﬁt from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data ﬁelds. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial ﬁeld under consideration, they yield an estimate of the semivar-iogram that is unbiased for small lag distances. This jus-tiﬁes the use of this estimation technique in geostatistical


Introduction
Raingauges yield comparatively accurate measurements of precipitation at a given location, but even dense networks of gauges cannot fully capture the spatial variability of precipitation fields on subdaily timescales.In contrast, weather radars can provide dense measurements over an entire region and at high temporal resolution.Locally, however, radar measurements tend to be associated with very large uncertainties, in particular in mountainous terrain.Methods that formally combine radar and raingauge measurements aim at improved spatial precipitation estimates exploiting the Published by Copernicus Publications on behalf of the European Geosciences Union.
Even within the area of geostatistical methods, a wide range of choices have to be made when planning for a particular application.These choices regard, for example, the actual combination method (e.g., kriging with external drift, cokriging), the kriging neighbourhood (global vs. local), the technique used to estimate the parameters of the geostatistical model (e.g.least-squares, maximum-likelihood estimation), and the transformation of the precipitation variable.In addition to these issues, there are several options for modelling spatial dependencies in the precipitation data.Correlograms (or semivariograms) used for kriging are customarily one-dimensional, but two-or higher-dimensional correlation maps are also used and are one way of taking spatial anisotropy into account.Furthermore, correlogram models can be parametric or nonparametric, they can be obtained from the radar or the raingauge data, and they can be estimated flexibly on a case-by-case basis or with data from a longer period of time.
Recently, nonparametric correlograms based on spatially complete radar rainfall fields have been used in combining radar and raingauge data (Cassiraga et al., 2004;Velasco-Forero et al., 2009).The estimation of nonparametric correlograms is fast and robust (in particular, no parametric model has to be fit) and anisotropy is naturally taken into account.The objective of this study is to compare the estimation of nonparametric correlograms with the traditional estimation of semivariograms, and to test their application in the geostatistical combination of hourly raingauge and radar data in Switzerland.Additionally, the present application tests in how far geostatisitical methods that traditionally rely, implicitly or explicitly, on a Gaussian data model, can be applied to highly non-Gaussian and non-continuous hourly precipitation data in complex terrain.This paper describes one of several current activities in the MeteoSwiss project Combi-Precip, which aims at the operational provision of spatial precipitation estimates for Switzerland on the subdaily timescale based on the combination of radar and raingauge measurements.
The structure of this paper is as follows: Sect. 2 introduces the study domain and data, compares the modeling of spatial dependence with the nonparametric correlogram and traditional parametric semivariograms, presents the geostatistical combination (Kriging) techniques tested here, and how the quality of both the estimated precipitation fields and the estimated uncertainty in these fields is evaluated.Thereafter, Sect. 3 presents several examples and a systematic evaluation of the combination methods.Section 4 concludes this study.

Study area and data
The study area is Switzerland and has a surface area of 41 285 km 2 .We combine raingauge and radar data on the hourly timescale.On this timescale, data from 75 automatic raingauges of the SwissMetNet (SMN) are available.These gauges provide measurements at 10-minute intervals in real time.They are fairly homogeneously distributed throughout the country, but remote areas and high elevations are somewhat underrepresented.The gauge locations are indicated in Fig. 5.
Radar data are taken from a composite of three Me-teoSwiss radars (see Germann et al., 2006, Fig. 1, for the radar locations).The composite is available at 5-min intervals as a gridded field of 1 km resolution covering Switzerland and adjacent areas.The construction of the radar composite is discussed in Germann et al. (2006).In particular, the radar precipitation field is adjusted to gauge measurements using a single factor for each of the three contributing radars.The factor is determined from radar-gauge agreement after integration over a large time window (6 months) and several gauges in the vicinity of the radar.This is a "climatological" bias correction; it involves a small subset of the gauge network considered in this study, and does not correct for the substantial biases that can occur in the radar composite on the hourly timescale.Further details concerning the characteristics of the two measurement platforms and uncertanties can be found in Sevruk (1985); Frei et al. (2006);Germann et al. (2006); MeteoSwiss (2006).
Apart from measured data, we use synthetic data to illustrate the behaviour of semivariogram and correlogram estimators in Sect.2.2.These data follow a one-dimensional Gaussian random process with unit variance and correlation function ρ(u) = exp − u φ , where u is the lag distance and φ a constant that determines the decorrelation length.The distance u for which ρ(u ) = 0.05 is referred to as the practical range of the process and u ≈ 3φ.Samples from this process are generated by means of Cholesky decomposition (see Wood and Chan, 1994;Ribeiro Jr and Diggle, 2001;Diggle and Ribeiro Jr, 2007, for details).

Estimation of parametric semivariograms
The semivariogram is the traditional tool for modelling spatial dependence in geostatistical applications.The semivariogram of a spatial process Z is defined as (for greater detail see Schabenberger and Gotway, 2005, whose notation we largely follow): where Z(s i ), Z(s j ) denote values of the process at locations s i , s j .For a second-order stationary process Z, it can be shown that (2) and furthermore where σ 2 = C(0) = Var(Z) and ρ(s i − s j ) are the variance and the correlation function of the process Z, and E(•) denotes the expected value.
The widely-used Matheron-estimator for the semivariance reads (we denote estimators with a hat to distinguish them from theoretical quantities): where N (s i − s j ) denotes the set of all pairs of observations at a given lag distance and |N(s i −s j )| is the number of such pairs.For complete radar grids of dimensions where k, l, ... are the components of the lag distance vector in units of the grid spacing.The customary procedure for estimating a semivariogram model is illustrated by means of synthetic data in Fig. 1ac. Figure 1a shows a single realization of a one-dimensional Gaussian process with variance 1 and exponential correlation function (the practical range equals 0.6 for this process).The sample semivariogram (or the so-called semivariogram cloud) is shown in Fig. 1b.It shows semivariogram ordinates for all pairs of observations.Since these values scatter substantially, the sample variogram is usually smoothed by calculating the estimate in Eq. ( 4) after pooling the semivariogram ordinates into a number of lag-distance classes.This yields the so-called empirical semivariogram shown in Fig. 1c (open circles).Finally, a parametric model is fit to the empirical semivariogram.Here, a curve-fitting technique (n-weighted least squares, see Diggle and Ribeiro Jr, 2007, Sect. 5) has been used to estimate an exponential semivariogram model (dashed line in Fig. 1c).Equation ( 3 difference between the estimated and the theoretical correlation is due to sampling variability and a bias of the estimator and will be discussed later.There are a number of reasons for using a parametric model.First, the parametric models

Estimation of nonparametric correlograms
The nonparametric estimate of the correlation function is given by are the sample (also called plug-in) mean and variance, and N is the number of observations (e.g., radar grid points).This estimator can be conveniently computed in terms of the discrete Fourier transform (DFT).In fact, the Wiener-Khinchin theorem affirms that the magnitude of the DFT of the standardized observations is the spectral representation of the correlation estimate computed in Eq. ( 5).Thus, Eq. ( 5) can be obtained rather simply by computing the DFT, multiplying with the complex conjugate and computing the inverse DFT of the product.This has two main advantages.First, the fast Fourier transform (FFT) allows computing Eq. ( 5) much more rapidly than by means of explicit summation.Therefore, the complete radar grid can be taken into account.In contrast, the complete semivariogram estimator Eq. (4) cannot be conveniently computed for sizeable two-dimensional radar grids, and is practically obtained from "thinned-out" subsamples of the entire field (see Appendix A, Fig. A1, for an example).Second, the estimated correlation function has, by construction, a real and positive spectral density.According to Bochner's theorem, it is therefore a positive definite function (termed "licit" in Yao and Journel, 1998).No further fitting of a parametric covariance model or manipulation of the spectral density is necessary.
In practice, the mechanics of the FFT requires that the data be padded with zeros, and to switch to the so-called wraparound order of spectral densities/lag distances and back.This is illustrated in Fig. 2 by means of a one-dimensional data sample; the details are explained in Press et al. (1992, Chapt. 13).The data sample of length N = 140 is shown in (Fig. 2a).The mean is subtracted and zeros are padded such as to give a padded data vector (Fig. 2b) whose length is equal to the smallest power of 2 larger than or equal to 2N; here equal to 512.Application of the FFT, multiplication with the complex conjugate, and normalization of the power spectral densities yields the result shown in Fig. 2c.The power spectral densities are obtained in the typical wraparound order, i.e. the left part of the spectrum corresponds to the zero frequency and positive frequencies, and the right part of the spectrum to negative frequencies (in reverse order).The spectrum is real, positive, and symmetric with respect to the zero frequency.Finally, the inverse FFT yields the estimate of the correlation function (Fig. 2d).
The nonparametric estimate Eq. ( 5) of the correlation function for the synthetic one-dimensional data sample of Fig. 1a is shown in Fig. 1d (dotted line).

Comparison of estimators and bias correction
Both estimates of the correlogram function in Fig. 1d exhibit shorter ranges than the theoretical correlation.Of course, this could be completely due to sampling variability and we cannot conclude from the estimates for a single realization (Fig. 1a) on the behaviour of the estimators.Therefore, we extend the experiment as follows: for each of three Gaussian processes with unit variance and exponential correlation function with practical ranges of 0.2, 0.6, and 1.5, we draw 100 realizations and estimate a parametric (exponential) semivariogram model and the nonparametric correlation for each of the realizations.Each realization is sampled in the domain [0,1].The median estimated parametric model for the process with practical range 0.2 is shown by the black dashed line in Fig. 3.This line is very close to the theoretical correlation (solid black line).As a matter of fact, the estimator Eq. ( 4) is known to be unbiased.For finite-size samples of correlated data, however, it is only approximately unbiased.In the present example, the positive autocorrelation causes the variance of the process (the semivariogram sill) to be underestimated.As a consequence, also the range of the semivariograms is underestimated.This effect is the more pronounced the larger the practical range is compared to the domain size, i.e. keeping the domain size constant (here equal to 1), the bias will be larger for larger ranges (red and blue dashed lines in Fig. 3).
The dotted lines in Fig. 3 show the nonparametric correlation estimates from Eq. ( 5) based on the same 100 realizations of the three Gaussian processes.For small lags and a practical range of 0.2, the estimate (black dotted line) is still fairly close to the theoretical correlation.If the practical range is on the order of the domain size, however, the nonparametric correlation is strongly biased towards too small values (red and blue dotted lines).The bias in the nonparametric correlogram estimate is much larger than in the corresponding parametric estimate.(Note: at least for small lags, the different normalizations N (s i − s j ) vs. N in Eqs.(4) and (5) are only a minor contribution to the difference between both estimates.) In order to understand this observation, we rewrite Eq. (5) as follows: For lag distances that are much smaller than the domain dimensions, we can approximate Thus, and finally Red and blue lines: the same for processes of larger practical ranges (0.6,1.5).All dashed and dotted lines show the median of estimates of 100 realizations of the Gaussian process.Solid line: theoretical correlation (for all ranges; the abscissa is scaled by the practical range).
the extrapolation performed by fitting the parametric semivariogram model.This explains the larger bias of the estimator in Eq. ( 5) compared to Eq. ( 4).
In the present context, the more important consequence of Eq. ( 6) is, however, that the nonparametric correlogram estimator Eq. ( 5) and the plug-in variance Ĉ(0) combine such as to yield an estimate of the semivariogram γ that is approximately unbiased for small lag distances.This is the justification for using these estimators for geostatistical prediction as done here as well as in earlier studies (notably Velasco-Forero et al., 2009).The semivariance provides a description of both the spatial dependence and the variance of the spatial field, and completely determines (jointly with the actual values of the predictors) the solution of geostatistical prediction (Kriging).
Kriging will be the focus of the remainder of this paper.Before, we briefly digress and show how the bias of the estimator in Eq. ( 5) can be mitigated in situations where this is of interest.Given an alternative estimate σ 2 of the variance, assumed to be superior to the sample variance Ĉ(0), the corresponding estimate of the correlation function is according to Eqs. ( 3) and (6): For the synthetic data of our introductory example (Fig. 1a), we have used the sill of the parametric semivariogram (Fig. 1c) for σ 2 in Eq. ( 7) and the corrected correlation function obtained in this way is the dash-dotted line in Fig. 1d.
Repeating the experiment described at the beginning of this section with the bias-corrected estimator Eq. ( 7) yields the results shown in Fig. 4. Indeed, the correction works and the bias-corrected correlograms are very close to the parametric correlograms for small lag distances.With increasing lag distance, the approximation the bias correction is based on deteriorates.This can be seen for the example with largest practical range (blue dotted line in Fig. 4).We have tested the calculation of bias-corrected correlograms not only for synthetic data but also for gridded radar precipitation fields.The test confirms that the bias correction works, i.e. that the bias-corrected nonparametric correlograms agree much better with the parametric correlograms than the uncorrected nonparametric correlograms (not shown).

Kriging formulations
In this study, we test four Kriging variants.We compare the use of the classical semivariance estimator and of nonparametric correlograms in ordinary Kriging (OK), and test nonparametric correlograms in two versions of Kriging with external drift (KED).Here, the OK variants are gauge interpolations where the radar composite is only used to model the correlogram.Thus, the OK variants are illustrative prototype methods for comparing the use of parametric and nonparametric correlograms in Kriging, rather than "genuine" radarraingauge combination methods.In contrast, the KED methods more fully exploit the radar information and are candidates for operational merging techniques.
Hydrol.Earth Syst.Sci., 15, 1515-1536, 2011 www.hydrol-earth-syst-sci.net/15/1515/2011/In all Kriging applications, we are given k = 1,...,K gauge measurements Z G (s k ) and l = 1,...,L radar measurements Z R (s l ) at the grid points of the radar composite.We associate all raingauges with the nearest radar pixel.The prediction locations coincide with the grid of the radar composite; for clearness we use superscripts whenever we refer to prediction locations, e.g., s l .Throughout this study, we work with a global Kriging neighbourhood and do not apply any variable transformation to the predictors.Below we shortly describe how to calculate OK and KED predictions and variances; for more detailed descriptions of the methods the reader is referred to standard texts (e.g., Cressie, 1993;Wackernagel, 2003).

Ordinary Kriging
In all Kriging formulations, the process of the interpolated field Z(s) is modelled as the sum of a stochastic part, which is a second-order stationary process Y (s), and a deterministic part.In OK, the deterministic part is assumed to be a constant mean field a, i.e.
The OK prediction is a weighted average of raingauge values and the optimal weights λ l k are the solution of the following systems of equations: where we have introduced the shorthand notation Ĉkk = Ĉ(0) ρ(s k − s k ) for covariances between gauge locations, Ĉl k = Ĉ(0) ρ(s k −s l ) for covariances between gauge and prediction locations, and µ l denotes a Langrange multiplier that ensures k λ l k = 1.The OK variance is given by The covariances Ĉ in Eqs. ( 10) and ( 11) correspond to the stochastic model part Y (s).In OK, it is natural to use available measurements of Z for the estimation of the spatial covariance structure of Y .Here, the covariances are estimated from the radar composites.We use a classical parametric semivariogram fit with anisotropy (see Appendix A) as well as the nonparametric correlogram estimate (Eq.( 5), Sect.2.2.2).The corresponding OK versions are denoted by OK p and OK np .

Kriging with external drift
In KED, the deterministic part of the model is supposed to be a linear function of an auxiliary field (here, the radar composite Z R ): Just as in OK, the prediction is a weighted mean of raingauge values as in Eq. ( 9), but here the weights are the solution of the following systems of equations: where we write R k = Z R (s k ) and R l = Z R (s l ) for the radar values at the gauge and at the prediction locations.The additional Lagrange multiplier ensures that k λ l k R k = R l .The KED variance is given by A well known problematic issue in the application of KED is that there is no straightforward choice for which data to use to estimate the covariance structure of Y (s).The estimate would have to be based on residuals between an (a priori unknown) linear function of the radar field and an (a priori unknown) merged interpolated field.An elegant solution to this problem is to fit the parameters of the stochastic and the deterministic part of the model jointly by means of maximum-likelihood methods (Diggle and Ribeiro Jr, 2007;Erdin, 2009); yet for sparse gauge networks and in situations with few wet radar-raingauge pairs, this estimation might not be very robust (In how far this is a problem for operational implementations is open.These methods are being tested in a separate activity within the CombiPrecip project.).
Here, we follow a different solution suggested by Velasco-Forero et al. ( 2009) that consists of the following steps: 1.A spatially complete rainfall field is estimated by OK np using a sparse set of radar values sampled at the gauge locations.
2. Use the residuals between the radar field and the prediction from step 1 to estimate the nonparametric correlogram and the variance of Y (s).
3. Use the correlogram and variance obtained in step 2 for the KED prediction according to Eqs. ( 13) and ( 14).
We refer to this method as KED OK .Obviously, the residuals used to estimate the spatial covariance structure in KED OK are chosen pragmatically for the lack of better alternatives.Therefore, we also test the following extension of the KED OK method:   2005-08-21 17:00-18:00 UTC  1. Use KED OK to obtain a preliminary prediction.
2. Use the residuals between the radar field and the prediction from step 1 to estimate the nonparametric correlogram and the variance of Y (s).
3. Use the correlogram and variance obtained in step 2 for the KED prediction according to Eqs. ( 13) and ( 14).This is our second KED variant and we refer to it as KED KED .The estimation of the covariance of Y (s) is still pragmatic (we use the preliminary KED OK prediction as a substitute of the interpolated target field and neglect the linear function of the radar field), but we hypothesize that it yields a better description of the covariance of Y (s) than the OK residuals used in KED OK .

Test cases
Examples of results from the Kriging variants will be shown for three test cases.Figure 5 shows raingauge (left) and radar (right) measurements for episodes of 1 h duration during these cases.
Test case 1 (Fig. 5a, b) is from the August 2005 floods that affected several European countries.In Switzerland, this event caused six casualties and the highest flood-related damage on record since 1972.Raingauges registered totals of up to 170 mm in 24 h. Figure 5a, b as well as figures for other hourly intervals and daily totals (not shown) show that radar accumulations during this event systematically underestimate gauge measurements.A narrow band of intense precipitation in the northeast of Switzerland, which can be seen on the radar composite (Fig. 5b), causes large differences in measured precipitation between nearby gauges.
Hydrol.Earth Syst.Sci., 15,2011 www.hydrol-earth-syst-sci.net/15/1515/2011/ Test case 2 (Fig. 5c, d) is characterized by intense, shortlived, and localized precipitation cells in the northern part of Switzerland.One of these cells gained a certain fame because of its interference with the football match Switzerland-Turkey at EURO 2008 in Basel.As far as the raingauge and radar accumulations for the corresponding hour are concerned, however, the effect of this cell is not very pronounced.The Basel gauge (at the northern border of Switzerland, Fig. 5c) merely registered a value 1 mm for that hour.Much larger hourly accumulations can be seen in two separate regions in the northwest and the northeast of Switzerland.Quite clearly, the gauge network is not sufficiently dense to capture the local precipitation maxima within these regions that are evident on the radar composite (Fig. 5d).
During the event of test case 3, heavy convective hailstorms moved over the Swiss Plateau at a speed of more than 60 kmh −1 .The comparison of the daily radar and gauge data (not shown) shows that the spatial pattern is very similar between both measurements, but the magnitude of the radar measurements is considerably higher than that of the gauges.This is a well-known phenomenon for radar measurements of hail (Doviak and Zrnc, 1993).Again, the gauges of the sparse real-time network do not capture the regions of strongest precipitation for the selected hour (Fig. 5e, f).A closer inspection of the radar composite shows an artificially rugged structure due to the fact that precipitation cells displace substantially between consecutive full scan periods of the radar (see also Fabry et al., 1994).

Cross validation
We use cross validation for the quantitative evaluation of the different merging techniques.One gauge is removed from the data in turn, and the prediction of the method under consideration is then compared to the value measured by the removed gauge.Since correlograms or semivariograms are estimated from the radar field for all methods considered here, the correlograms and semivariograms are the same in cross validation as for the complete set of gauges.In the KED OK and KED KED methods that involve auxiliary initial steps for the construction of a residual field used to estimate nonparametric correlograms, the cross-validated gauge is only removed in the final Kriging step.
This kind of leave-one-out cross validation with comparison to gauges is probably the most popular procedure in the evaluation of combination techniques for raingauge and radar data.Among the numerous studies that have applied cross validation are Seo (1998), Haberlandt (2007), and DeGaetano and Wilks (2009).Nevertheless, some critical issues should be born in mind in this analysis: -Gauge values are assumed to be true values at their specific locations, but include measurement errors for several reason (Sevruk, 1985).These errors are assumed to be small compared to the prediction errors in the precipitation fields at short time scales.For particular cases (snowfall, strong wind), however, these errors can be substantial and should be considered in principle (Although a quantitative correction is hardly possible.).
-A representative spatial distribution of gauges is necessary to assess the average performance of a method in the study area.As far as the distribution over different parts of the country is concerned, the MeteoSwiss raingauge network reasonably meets this requirement.
Remote and high-altitude locations, however, are somewhat underrepresented.
-The spatial and temporal support of radar and raingauges is different.Spatially, raingauges can be approximated as point measurements, whereas the radar values correspond to averages over the volume of a grid cell.This yields to a smoothing of radar values compared to gauges (Zawadzki, 1975).Thus, differences between raingauge and radar measurements are not solely due to radar errors, but also due to differences in representativeness.
-Additional uncertainty is introduced by associating the location of a raingauge with the centre of the nearest radar grid cell (nearest-neighbour approximation).
These issues illustrate that care should be excercised when interpreting radar-raingauge differences.Nevertheless, under the assumption that these effects lead primarily to a random component in the radar-raingauge differences, comparisons over a large sample of raingauges still provide useful guidance on the relative performance of different merging techniques.

Quality measures
Skill statistics are calculated from gauge observation/crossvalidation prediction pairs {Z i , Ẑi }, where i,...,I enumerates all such pairs either for a single test case or for an extended validation period.We use the following skill measures: 1.The BIAS assesses overall systematic errors of a method.We express it in terms of a logarithmic scale, as customary in radar meteorology, 2. The root-mean-square error (RMSE) is a widely-used skill measure to assess the overall quality of a method.We use it on a square-root scale, 4. SCAT (Germann et al., 2006) evaluates the performance of a method to quantify precipitation for locations where rain is actually predicted and observed.SCAT is based on the cumulative error distribution function (CEDF), defined as the contribution to total precipitation as a function of the logarithmic prediction-observation ratio (in dB) at locations where both, observation and prediction are wet (≥ 0.5 mm).SCAT is defined as half the distance between the 16 % and 84 % quantiles of the CEDF, which makes it robust to outliers with large overor underestimation.The observed CEDF points are interpolated linearly to determine the required quantiles; SCAT = 1 2 (CEDF 84 − CEDF 16 ) . (18) 5. The Hansen-Kuipers Discriminant (HK) is a skill score to assess dichotomous predictions.In our context, it can be used to measure the ability of a method to distinguish between dry and wet areas.We define a dry observation to correspond to < 0.5mm and a wet observation to ≥ 0.5mm.Observation-prediction pairs {Z i , Ẑi } can then be used to construct a 2×2 contigency table (Table 1) and HK is calculated as This is equal to the Probability of Detection (POD) minus the Probability of False Detection (POFD), and −1 ≤ HK ≤ 1. HK = 0 means that the forecast is as skillful as a random forecast, HK = 1 is a perfect forecast, and a negative HK implies a forecast worse than random.
We compute all of the above skill measures for the three test cases, and throughout an extended evaluation period.All hourly intervals in 2008 with at least one wet gauge (≥ 0.5mm) and without missing values in the radar composite are included into the extended evaluation.We find that the first four scores, especially MAD, are hard to interpret if many observation-prediction pairs with very small values are included into the evaluation.Therefore, we only include pairs with gauge observation Z ≥ 0.5mm in the calculation of the scores 1-3.For these three scores, this leaves 52/13/30 pairs of values for test cases 1/2/3.The calculation of HK is based on all observation-prediction pairs (75 for the test cases).As far as the results for the extended evaluation period are concerned, the scores are calculated from a large number of pairs (37 416 for BIAS, RMSE, MAD, and SCAT; and 222 013 for HK).The constraint of allowing radar fields without any missing radar pixel only reduces the evaluation period to 10 months (there are missing values in April and May 2008).Nonetheless, the results from the extended evaluation are highly significant and approximately represent averages across all seasons.

Evaluation of Kriging uncertainty
A potential advantage of geostatistical merging techniques is that they are based on a stochastic concept.They not only yield an interpolated field, but also an estimate of the uncertainty in this interpolation at each grid point.In particular, following the cross-validation approach described in Sect.2.4.2, a measure of uncertainty -the cross-validation Kriging variance -can be calculated at the location of the removed gauge.This can be used to assess how useful the uncertainty estimate provided by the different methods is.More specifically, we test if the Kriging variance along with a Gaussian assumption on the distribution of errors can be used to construct an accurate confidence interval at a point.To this end, we calculate for each gauge and throughout the extended evaluation period a z-score Ẑi − Z i / σi , where Ẑi and σ 2 i are the cross-validation prediction and variance, and Z i is the value measured by the removed gauge.Then, the frequency of threshold exceedances of z can be compared with the frequency that is expected under the assumption of a standard Gaussian distribution of z.

Test case 1
The methods OK p and OK np are compared in Fig. 6 for test case 1 (see Fig. 5a, b for the corresponding raingauge and radar input data).We first discuss the results of method OK p shown on the left.The empirical semivariogram obtained from the thinned-out radar field (Fig. 6a) clearly captures the anisotropy of the rainfall field for this case.It can be seen that the orientation of the axis with strongest spatial dependence changes with the lag distance.It is aligned in a southwestnortheasterly direction for all lags, but is much more zonally (west-east) oriented for small lags than for large lags.accomodate different directions of anistropy.Instead, the fitted semivariogram model is a compromise between different directions (expressed by means of a single anisotropy angle) and strengths (expressed by means of a single anisotropy ratio).In this particular example, the fit result appears to be influenced more by the larger lag distances than by the values of the empirical semivariogram near the origin.In Ordinary Kriging from a sparse network, the semivariogram model has a very strong influence on the Kriging prediction, and indeed we can clearly see its imprint in Fig. 6d.The comparison to the radar field for this case (Fig. 5b) suggests that the OK p prediction does not represent the spatial characteristics for this rainfall field very well.The dominant rainfall patterns and their orientation, in particular the narrow band of intense precipitation in the northeast of Switzerland, are not captured.
The result of the nonparametric correlogram fit -converted into a semivariogram using the plug-in estimate of the variance of the radar field -is shown in Fig. 6c.The nonparametric semivariogram naturally represents the change of the anisotropy angle with lag distance and no decisions about which lags to give preference have to be made, since the nonparametric semivariogram is used directly in Kriging.Even though radar information is only incorporated via the semivariogram, the OK np prediction (Fig. 6e) is able to reproduce the narrow precipitation band and compares much better to the original radar field.The cross-validation results corroborate what the visual inspection of the results suggests for this example: all skill measures yield a better score for OK np than for OK p (Table 2).
For both the OK p and the OK np methods we see that the Kriging prediction is strongly influenced by the semivariogram throughout the domain.Even in areas with no apparent anisotropy in the centre/southwest of Switzerland, the Kriging prediction exhibits a strong anisotropy due to the fact that the semivariogram estimate is largely determined by the radar measurements in the northeast of the country.This is the downside of estimating a semivariogram globally for the whole domain as done throughout this study.Figure 7 shows results from the two KED variants.The nonparametric semivariograms determined from radar residuals (Fig. 7a,b) are quite different from one another in terms of their description of the range and variance of the residual field.This comes as no surprise, as the residual fields used for correlogram estimation are obtained in rather different ways (see Sect. 2.3.2).Nonetheless, the Kriging predictions from both methods look quite similar (Fig. 7c, d).This is due to the fact that the radar field is used as external drift variable here and the correlogram is less decisive for the predicted field than in OK.Both methods yield merged fields that exploit the fine-scale spatial detail of the radar composite, but are also much closer in magnitude to the values registered by the raingauges.In spite of the qualitative similarity of the predictions of the KED methods, the comparison of both methods in terms of cross-validation analysis gives a quite clear result for this case: all four skill measures that depend on the precipitation amount at wet gauges favour KED KED (Table 2), only the distinction of wet and dry locations is slightly better for KED OK as expressed by the higher HK score.In fact, this property of the precipiation field is better represented in the original radar field than in any of the radarraingauge combinations.This agrees with the findings of Erdin (2009)  improved estimates in many respects; but that the distinction of wet and dry areas is best in the pure radar measurements.

Test case 2
The isolated patches of high precipitation in the radar composite for test case 2 (Fig. 5d) suggest that the spatial dependence is of short range.This is well captured by the parametric semivariogram fit (Fig. 8b) and also by the semivariogram based on the nonparametric correlogram estimate (Fig. 8c).
While the parametric semivariogram is completely isotropic, the nonparametric semivariogram exhibits some short-range anisotropy the orientation of which again changes with lag distance.The dominant anisotropy of this semivariogram at short lags appears to reproduce that of three adjacent patches of high precipitation at the northern border of Switzerland.All in all, however, anisotropy is not as important here as in the other two test cases as evident in the OK predictions (Fig. 8d, e).The fit of a simple monotonously decaying function (here exponential) for the parametric semivariogram yields a smoother prediction by OK p than by OK np .The more irregular character of the OK np prediction better agrees with the original radar composite, even though, of course, the precise location and shape of precipitating areas is not captured well since the radar is not a proper predictor variable in the OK methods.The skill measures for OK p and OK np do not differ greatly for this test case.Compared to other methods/test cases, both OK methods exhibit a large negative bias, arguably due to the fact that areas of high precipitation are simply "overlooked" in interpolation from a sparse gauge network and for a case with small-scale precipitation patterns.
As in the previous test case, the skill measures somewhat favour the KED KED prediction over the KED OK prediction (Table 2).Both KED methods score distinctively better than the OK methods, and in particular they have a much smaller bias because of their better exploitation of the radar information.The KED OK and KED KED predictions (Fig. 9c, d case, and also quite similar to one another.Consequently, the evaluation results are also quite similar for these three fields.

Test case 3
The spatial dependence structure for this test case is largely determined by three bands of intense precipitation that are aligned fairly similarly in a southwest-northeasterly direction (Fig. 5f).Here, the assumption of a domain-wide spatial dependence model and the description of anisotropy in terms of a single anisotropy ratio and angle, appears to be much better suited than for the test cases discussed above.Indeed, the parametric semivariogram model agrees quite well with the semivariogram based on the nonparametric correlogram estimate for small lag distances (Fig. 10b, c), even though the range is larger for the latter.Consequently, also the OK predictions are all in all rather similar (Fig. 10d, e), yet for OK np the impact of individual raingauge values on the interpolated field can be discerned at larger lag distances.This can be seen clearly for the station Bern ("BER") in the centre/west of Switzerland that registered an accumulation of 14 mm of rain during the hour of this test case (Fig. 5e) and the band of high precipitation to its west in the OK predictions (Fig. 10d,  e).This band is more pronounced for the OK np prediction than for OK p and remarkably similar in shape to the corresponding band actually observed in the radar composite (Fig. 5f).The cross-validation results for the two OK predic-tions (Table 2) are in line with the apparent similarity of the two fields: while some of the skill measures yield favourable results for OK np (BIAS, MAD), others give preference to OK p (SCAT, HK).
As in the previous test cases, the KED combination methods succeed to incorporate the fine-scale spatial detail of the radar and at the same time correct for the substantial bias (here positive) of the radar with respect to the gauges.This is evident from the inspection of the predicted fields (Fig. 11c,  d) and also from the quantitative evaluation (Table 2).In cross-validation, the KED methods score better than the OK predictions and also than the original radar field, even in terms of the distinction of dry and wet locations measured by HK.While for the previous test cases the KED KED predictions receive higher scores than KED OK predictions, there is no clear picture for this test case.The semivariograms for KED OK and KED KED (Fig. 11a, b), determined from different residual fields, have an anisotropy similar to the semivariograms for OK p and OK np (Fig. 10b, c), but quite naturally, the semivariogram sills and ranges are smaller than for the OK semivariograms, which are based on the original radar field.This observation holds also for the other two test cases.

Systematic evaluation
As illustrated for the test cases above, each case is unique and it is not possible to draw conclusions on the average Hydrol.Earth Syst.Sci., 15, 1515Sci., 15, -1536Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/1515/2011/ performance of a merging technique from the consideration of one or a few examples.Therefore, we have also conducted the evaluation through an extended period of time both for the Kriging prediction (best estimate) and the Kriging variance as described in Sects.2.4.2-2.4.4.Due to the comparatively slow estimation of the empirical semivariogram in the OK p method, this method is not included into the extended evaluation.

Kriging best estimate
The evaluation results for the original radar field and the three merging techniques are summarized in Table 3.The skill measures in this table assess the performance of the techniques in average terms across a large number of cases as well as across different regions of Switzerland.The evaluation results are unequivocal.The average performance of the merging techniques is better than that of the radar alone, the KED methods outperform the OK np method, and KED KED performs better than KED OK .With the exception of a higher bias for OK np than for the pure radar, all skill measures improve from left to right in the columns of Table 3. Somewhat surprisingly, this includes the HK score, i.e. the distinction of wet and dry conditions.The radar has been shown to represent this feature of the precipitation field very well for a number of relevant test cases (see Erdin, 2009 and also Table 2).In the present "climatological" evaluation, the skill of the radar deteriorates because there are some regions in Switzerland (in the south of the country; in the Valais and Grisons cantons), where the currently available radar composite is of comparatively lower quality due to the very complex topography of these regions and their relatively long distance from the nearest radar.In particular, the radar misses   many precipitation occurrences in these regions, which appears to be the reason for the rather low HK score.(In fact, the spatially varying radar skill also partly explains the differences between the radar skill for the different test cases.The radar skill is much higher for test case 2 than for the other two cases, arguably because in this case it did not rain in regions where the radar skill is typically rather low.).
Figure 12 shows the results of the extended evaluation for the RMSE on a station by station basis.In accord with the above discussion, the radar skill is comparatively low in remote regions of complex topography (Fig. 12a).The OK np method (Fig. 12b) introduces an improvement in the skill that is fairly homogeneous throughout the country.It is interest-ing to compare the two KED methods the RMSE of which is shown in Fig. 12c, d.Both methods have markedly better skill (lower RMSE) than OK np in regions where the radar performs well, in the Swiss middleland in the centre/north of the country.For KED KED however, RMSE is also substantially reduced in the mountainous regions in the south of Switzerland, much more so than for KED OK .This observation appears to justify the hypothesis we made when introducing the KED KED method in Sect.2.3.2.In regions where the radar is a powerful predictor variable, the stochastic part of the geostatistical model is less decisive for the prediction skill and, accordingly, the KED OK and KED KED methods perform similarly.In regions where the quality of the radar is lower, the quality of the prediction will depend more on the specification of the stochastic part of the model.This is consistent with the clearly superior performance of KED KED in these regions.

Kriging uncertainty
The mere inspection of the Kriging variances for individual test cases reveals considerable shortcomings.An example for test case 1 and the OK and KED methods is provided in the gauge locations due to the fact that no nugget effect is taken into account by the correlograms used herein.
The results of evaluating the Kriging variance for the OK np method are shown in Fig. 14.In the cross-validation analysis, we have computed a series of z-scores for each gauge as explained in Sect.2.4.4. Figure 14a shows for each gauge, how often the z-score is found to be smaller than the 5 %quantile of the standard Gaussian distribution, which is approximately equal to −1.64.This corresponds to situations where the value predicted by OK np substantially underestimates the actually observed value (by more than 1.64 times the Kriging standard deviation).Under the assumptions of a well estimated Kriging variance and a Gaussian distribution of errors, this is expected to occur in 5 % of the cases.But in fact the probability of underestimating is drastically larger than expected under these assumptions (Fig. 14a).In other words, if a confidence interval was constructed from the Kriging variance and under the assumption of Gaussian error distribution, the upper end of this confidence interval would be too small.The OK np method misses the peaks of the gridded field more often than suggested by the geostatistical model.As far as overestimations are concerned, the observed occurrence frequencies differ substantially from the expected value at a few gauges, yet this effect is less severe than for the underestimations (Fig. 14b).
We have also assessed the uncertainty estimates provided by the KED OK and KED KED methods.The results (not shown) are very similar to those shown for OK np in Fig. 14, but the overestimation occurrence frequencies are somewhat 'worse' (higher) than the corresponding frequencies shown in Fig. 14a for OK np .Several reasons must be expected to contribute to the poor quality of the uncertainty estimate of the methods tested here.In all methods, we have used the radar and raingauge data as is, i.e. we have made no effort to apply a variable transformation (e.g., the Box-Cox transformation) such as to make the residuals of the geostatistical models follow a Gaussian distribution.In fact, given the high skewness of hourly precipitation data it would be quite surprising to find that the residuals of the untransformed data are Gaussian.It is our current working hypothesis, that the missing data transformation is a major reason for the poor uncertainty estimate.Accordingly, the choice of an appropriate transformation family and the estimation of the transformation from the data on a case-by-case basis constitute a separate effort within the CombiPrecip project running in parallel to this study.Further issues that may contribute to the deterioration of the uncertainty estimate are the quality of the approximation in Eq. ( 6) and, for the KED methods, the pragmatic choice of residual fields used to estimate the nonparametric correlogram.The fact that the assessment of the Kriging uncertainties yields rather similar results for the OK np , KED OK , and KED KED methods, indicates that the effect of the last issue is comparatively small.

Findings summary and discussion
In this study, we have tested using nonparametric correlograms for the construction of hourly gridded precipitation fields obtained from the geostatistical combination of raingauge and radar data in Switzerland.
First, the estimation of a parametric semivariogram, as customary in geostatistical applications, has been compared with the estimation of nonparametric correlograms.Application of the two estimation techniques to synthetic data of   known correlation structure has shown that the nonparametric correlograms may severly underestimate the decorrelation length (the range of the correlogram).This estimation bias is greater, the smaller the dimensions of the data sample are in relation to the actual range of the spatial dependence.The bias in the correlogram is mostly due to the fact that the correlogram estimate is based on the sample ("plug-in") variance as an estimate of the variance of the spatial field.For positively correlated data, the sample variance may substantially underestimate the process variance, much more so than the semivariogram sill traditionally used in geostatistical applications.
It is important to note, however, that the above does not preclude the nonparametric estimation of correlograms from being used in geostatistical prediction (Kriging).We have also shown that the nonparametric correlogram and the sample variance (both substantially biased in many cases) combine into an estimate of the semivariogram that is approximately unbiased for small lag distances.This provides the justification for using nonparametric correlograms here as well as in previous studies, since the values of the semivariogram at small lags are known to be decisive for the Kriging prediction.
Hydrol.Earth Syst.Sci., 15, 1515Sci., 15, -1536Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/1515/2011/ We have also compared the nonparametric correlogram estimation with the traditional semivariogram estimation by using them in Ordinary Kriging of gauges for three Com-biPrecip test cases.The estimation of nonparametric correlograms (more precisely, of semivariograms based on the nonparametric correlograms) is very attractive from an operational point of view since (i) the entire spatially complete radar field can be taken into account, (ii) no parametric semivariogram model has to be fitted, and (iii) the estimation of the nonparametric correlogram is fast and robust.For the three test cases considered here, OK in terms of the nonparametric correlogram (method OK np ) successfully captures the spatial dependence structure of the radar field.A quantitative comparison of the OK np method with OK in terms of the traditional parametric semivariogram (method OK p ) has shown that OK np performs similarly to or better than OK p for the three test cases.
Furthermore, two variants of Kriging with external drift have been tested in this study.The first variant is the method suggested by Velasco-Forero et al. (2009), termed here KED OK .The second variant builds on the KED OK method and constructs a more realistic residual field used to estimate the nonparametric correlogram of the stochastic part of the KED model.Both variants have been assessed by means of cross-validation and a range of skill measures through an extended evaluation period of one year.The results clearly show that KED KED yields better merged precipitation fields than KED OK on average.Additionally, the extended evaluation shows that both KED methods perform better than the original radar composite or the gauge interpolation OK np .
We have also assessed the uncertainty estimate provided by the OK np , KED OK , and KED KED methods.All three methods underestimate the precipitation amount more often than expected from the Kriging variances and the assumption of a Gaussian error distribution.Consequently, uncertainty estimates for the methods presented here should be based on the empirical error distribution rather than on the Kriging variances.
A number of issues have not been addressed in this study and remain for current and future work.It is conceivable that the failure of the uncertainty estimate of the present implementations is largely due to the fact that no variable transformation is applied to the precipitation data.Finding a suitable transformation on a case-by-case basis constitutes a part of the CombiPrecip project in its own right.Given the practical advantages of the nonparametric correlogram estimation, it appears promising to test if a data transformation can be incorporated into the methods presented here (in particular KED KED ).
As explained in Sect.2.4.2, the cross-validated gauge is only removed in the final Kriging step in cross validation of the KED OK and KED KED methods.We do not expect this to have a substantial influence on the evaluation results.Even though somewhat involved, the whole purpose of steps 1 and 2 of the KED OK and KED KED methods is to yield residual fields from which the nonparametric correlograms for Y (s) (Eqs.8 and 12) can be estimated.The presence or absence of a single gauge should only have a minor influence on the general character of these correlograms.On a similar note, Erdin (2009) found for the combination of daily Swiss radar and raingauge data, that re-estimating a parametric semivariogram for each gauge removed in leave-one-out cross validation differs negligibly from the result obtained using the full set of raingauges.Nonetheless, the sparser gauge network used herein, the characteristics of the spatial distribution of hourly precipitation, and the ability of nonparametric correlograms to capture much of the actual spatial structure, might have some impact on how a single gauge can influence the estimated correlogram.We plan to quantify this in future implementations.
A potential disadvantage of nonparametric correlograms is that they can only be used when a complete spatial field is available for estimating the spatial dependence structure of the stochastic part of the geostatistical model under consideration.While this is straightforward for Ordinary Kriging, we have shown that in other Kriging formulations the choice of an appropriate field is much less clear.Moreover, the methods tested in this study assume that the estimation of the spatial dependence structure and the parameters of the deterministic part of the geostatistical model can be carried out in two consecutive independent steps.This appears to be at odds with modern geostatistical estimation techniques (maximumlikelihood or reduced maximum likelihood), where both the parameters of the deterministic part of the model and of the spatial covariance structure are estimated jointly.An iterative approach such as the KED KED method suggested here may be a first step towards methods that self-consistently estimate both the deterministic part of the geostatistical model and the spatial covariance of the stochastic part, but still take advantage of the computational convenience offered by the nonparametric estimation of correlograms.

Appendix A Estimation of parametric semivariograms with anisotropy
The procedure used for the estimation of parametric semivariograms with anisotropy consists of five steps.It is illustrated in Fig. A1 for test case 3.
1.A random subsample of radar pixels is drawn from the full composite for the sake of computational feasibility.
In the present implementation, we draw a larger sample if there are more pixels with zero precipitation in the composite.This is to avoid poor sampling of wet areas in situations where these areas are small.In the example provided here, 3370 radar pixels are sampled (Fig. A1a).

Fig. 8 .Fig. 8 .
Figure7shows results from the two KED variants.The nonparametric semivariograms determined from radar residuals (Fig.7a,b) are quite different from one another in terms of their description of the range and variance of the residual field.This comes as no surprise, as the residual fields used for correlogram estimation are obtained in rather different ways (see Sect. 2.3.2).Nonetheless, the Kriging predictions from both methods look quite similar (Fig.7c, d).This is due to the fact that the radar field is used as external drift variable here and the correlogram is less decisive for the predicted field than in OK.Both methods yield merged fields that exploit the fine-scale spatial detail of the radar composite, but are also much closer in magnitude to the values registered by the raingauges.In spite of the qualitative similarity of the predictions of the KED methods, the comparison of both methods in terms of cross-validation analysis gives a quite clear result for this case: all four skill measures that depend on the precipitation amount at wet gauges favour KED KED (Table2), only the distinction of wet and dry locations is slightly better for KED OK as expressed by the higher HK score.In fact, this property of the precipiation field is better represented in the original radar field than in any of the radarraingauge combinations.This agrees with the findings ofErdin (2009) who showed for daily fields and several Swiss test cases that raingauge-radar combination techniques yield

Fig. 14 .
Fig. 14.Assessment of the Kriging variance for the OK np method.At each gauge location the color and numbers (%) show (a) the relative frequency of underestimating the precipitation by 1.64 standard deviations or more, (b) the relative frequency of overestimating the precipitation by 1.64 standard deviations or more.The analysis includes all gauge observations ≥ 0.5 mm.
that they fulfill the property of positive definiteness of the covariance matrix.Correlation functions with this property can be used in geostatistical prediction (Kriging; see relevant texts such as Schabenberger and Gotway, 2005, for details).Additionally, the parametrization further smoothes the empirical semivariogram and allows estimation of the correlation at unobserved lag distances.

Table 2 .
Cross-validation skill measures for the original radar field and different merging techniques for the three test cases.

Table 3 .
Cross-validation skill measures for the original radar field and different merging techniques for the extended evaluation in 2008.