Multiscale evaluation of the Standardized Precipitation Index as a groundwater drought indicator

The lack of comprehensive groundwater observations at regional and global scales has promoted the use of alternative proxies and indices to quantify and predict groundwater droughts. Among them, the Standardized Precipitation Index (SPI) is commonly used to characterize droughts in different compartments of the hydro-meteorological system. In this study, we explore the suitability of the SPI to characterize localand regional-scale groundwater droughts using observations at more than 2000 groundwater wells in geologically different areas in Germany and the Netherlands. A multiscale evaluation of the SPI is performed using the station data and their corresponding 0.5 gridded estimates to analyze the local and regional behavior of groundwater droughts, respectively. The standardized anomalies in the groundwater heads (SGI) were correlated against SPIs obtained using different accumulation periods. The accumulation periods to achieve maximum correlation exhibited high spatial variability (ranges 3–36 months) at both scales, leading to the conclusion that an a priori selection of the accumulation period (for computing the SPI) would result in inadequate characterization of groundwater droughts. The application of the uniform accumulation periods over the entire domain significantly reduced the correlation between the SPI and SGI (≈ 21–66 %), indicating the limited applicability of the SPI as a proxy for groundwater droughts even at long accumulation times. Furthermore, the low scores of the hit rate (0.3–0.6) and a high false alarm ratio (0.4–0.7) at the majority of the wells and grid cells demonstrated the low reliability of groundwater drought predictions using the SPI. The findings of this study highlight the pitfalls of using the SPI as a groundwater drought indicator at both local and regional scales, and stress the need for more groundwater observations and accounting for regional hydrogeological characteristics in groundwater drought monitoring.


Introduction
Drought as a natural hazard is often associated with high socio-economic losses and damage to ecosystems (Wilhite, 2000).Many of these drought effects are not directly caused by rainfall deficits, but are related to below-average storage conditions in surface water, reservoirs, and groundwater that are the consequences of the propagation of a meteorological drought into the hydrological system (Tallaksen and Van Lanen, 2004;Mishra and Singh, 2010;Sheffield and Wood, 2011;Seneviratne et al., 2012).Due to a lack of large-scale groundwater and surface water observations, most scientists and water resources managers interested in drought predictions have to rely on proxy data to quantify storage conditions.
One widely used approach is to use drought indices based solely on precipitation (e.g., the Standardized Precipitation Index; SPI), because precipitation records generally have good spatial coverage and long observation periods required for drought analysis.It is then assumed that by computing the SPI over longer timescales (e.g., 3, 6, 12 or more months), it mimics the filtering effect of catchment storage conditions and hence captures the smooth precipitation deficits typical of hydrological (groundwater) droughts (Seneviratne et al., 2012;Joetzjer et al., 2013;Li and Rodell, 2015).Although the SPI is recognized as an effective meteorological drought index (Hayes et al., 2010) due to its relative ease of computation and comparability across climates, some studies have questioned its application for groundwater drought monitoring because the translation of precipitation deficits into hydrologic (groundwater) droughts is nonlinear (Bloomfield and Marchant, 2013;Teuling et al., 2013;Van Loon et al., 2014).Both catchment and climate characteristics such as the differences in underlying soil, terrain, vegetation and geological properties, precipitation seasonality, snowmelt timing, and the availability of atmospheric water supply and demand (evapotranspiration) control the development of hydrologic droughts and the resulting drought characteristics (Bloomfield and Marchant, 2013;Haslinger et al., 2014;Van Loon et al., 2014;Stoelzle et al., 2014;Van Loon, 2015).Recently, Vicente-Serrano et al. (2010) introduced the Standardized Precipitation Evapotranspiration Index (SPEI) with a similar multitemporal characteristic to the SPI, but accounting for both the atmospheric water supply (precipitation) and evaporative demand (potential evapotranspiration).SPEI can account for the influence of temperature variability and thus it is better suited than the SPI for drought studies under global warming conditions.In regions with high precipitation variability (e.g., humid areas), both the SPI and SPEI are expected to generally exhibit a similar behavior, albeit with slight differences among each other during a specific calendar month and time period (Vicente-Serrano et al., 2012).
Another approach to quantifying drought is based on the use of large-scale gridded data products, e.g., from hydrologic models or satellites (e.g., Sheffield et al., 2004;Andreadis et al., 2005;Vidal et al., 2010;Samaniego et al., 2013;van Huijgevoort et al., 2013;Prudhomme et al., 2014;Mo and Lettenmaier, 2013;Nijssen et al., 2014;Hao et al., 2014;Li and Rodell, 2015;Damberg and AghaKouchak, 2014;Wanders et al., 2015;AghaKouchak et al., 2015).An extensive multi-model study (Prudhomme et al., 2014), for example, projected increases in hydrological drought severity in many areas around the world.This approach also has limitations in its application to local-to regional-scale hydrological drought monitoring because of scale mismatches and some issues in the correct representation of storage in models (Gudmundsson et al., 2012;Van Loon et al., 2012;Tallaksen and Stahl, 2014).The importance of spatial variation in groundwater drought conditions resulting from complexity in subsurface conditions is increasingly recognized (Peters et al., 2006;Bloomfield and Marchant, 2013;Stoelzle et al., 2014).
While there are many studies that have focused on analyzing the propagation of meteorological droughts through the hydrologic systems for improved process understanding of the evolution of hydrologic (groundwater) droughts (e.g., Eltahir and Yeh, 1999;Peters et al., 2003Peters et al., , 2005Peters et al., , 2006;;Tallaksen et al., 2006Tallaksen et al., , 2009;;Weider and Boutt, 2010;Vicente-Serrano et al., 2012;Bloomfield and Marchant, 2013;Haslinger et al., 2014;López-Moreno et al., 2013;Van Loon et al., 2014), there is still a lack of comprehensive observation-based studies to verify whether hydrological drought proxies, like precipitation-based indices (SPI) and gridded data products, are suitable for groundwater drought monitoring at regional to local scales relevant for water management.In recent years there have been some efforts to analyze the relationship between meteorological and groundwater-based drought indices (Bloomfield and Marchant, 2013;Folland et al., 2015;Bachmair et al., 2015).Bloomfield and Marchant (2013), for example, introduced the Standardized Groundwater level Index (SGI), similar to the SPI, and found a site-specific relationship between the two indices.Their study was, however, limited to the analysis of local-scale behavior of groundwater droughts at 14 sites across the UK.
In this data-based exploratory study, we tested the suitability of the SPI for characterizing groundwater droughts using observations at more than 2000 groundwater wells located in Germany and the Netherlands.We used this large set of groundwater wells to comprehensively analyze the local to regional behavior of groundwater droughts and to investigate the scale mismatch between local-and regional-scale estimates.A focus on groundwater was preferred to other hydrological variables because of the immense multi-sectoral importance of the resource (Famiglietti, 2014).Given the widespread availability and usage of precipitation-based drought indices, we hypothesize that if adequate accumulation periods and lead times are applied to the precipitation signal, the observation-based SPI can predict groundwater droughts.In this maiden attempt we carried out a quantitative evaluation of the performance of the widely used SPI for groundwater drought monitoring on a local to regional scale using a large collection of groundwater well records, focusing on the statistical skill and not on the causing factors.The results of this study will provide insight to the water-sector practitioners and managers into the precautions demanded if they are to use local precipitation data or large-scale gridded estimates to characterize groundwater droughts.

Study area and data
The study was performed using monthly groundwater observations from two hydro-geologically different regions located in southern Germany and the central Netherlands (Dutch province of Gelderland) with 1991 and 49 groundwater wells, respectively (Fig. 1).The Dutch region is characterized by a maritime climate and the wells are located on a relatively low terrain, but with large spatial differences in unsaturated zone and groundwater conditions.The German wells are located in a region with hilly to mountainous terrain, less oceanic influence on climate and a wide range of unconsolidated and consolidated geological formations.The monthly groundwater data for the German wells were collected from the Bavarian Environment Agency (LfU Bayern) and the State Institute for Environment, Measurements and Nature Conservation Baden-Württemberg (LUBW).The data for the Dutch wells were acquired from Dutch institute TNO (www.dinoloket.nl/).
To be able to attribute groundwater level changes to climatic causes, it was necessary to exclude the possibility that these changes are a consequence of anthropogenic influences such as pumping or hydraulic structures.Therefore, those wells that exhibit obvious signs of anthropogenic influences were excluded from the analysis.In general, in both regions it can be expected that the effects of groundwater withdrawals are only local, as water consumption constitutes only a very minor portion of the potentially available water resources (precipitation-evapotranspiration).In the German study area it is estimated that only about 3 % of the potentially available water is used (Nickel et al., 2005).Irrigation is not widely applied.Moreover, the groundwater withdrawal in the region is relatively constant all year round, as it is mainly domestic and industrial use without peak loads in specific seasons.Notably, the German wells are located in quite densely populated regions (approx.15 million population), and groundwater forms the main source of drinking water.This, however, did not have a large impact on the presented analysis since the observation wells used in this analysis are typically located far away from pumping wells.Thus, fluctuations in groundwater levels can be mainly attributed to weather/climate and not to fluctuations of groundwater use.
The majority of the wells (around 90 %) are located in shallow aquifers with an average depth to the water table within 20 m below the ground surface (see Fig. 3a for the well distribution).The length of records varied from well to well, with a minimum of 10 years (Fig. 1) starting from the year 1951 for the German wells and 1988 for the Dutch wells.It should be noted that the 10-year criterion does not meet the recommended minimum of 30 years (McKee et al., 1993;Guttman, 1999) for estimating drought indices (i.e., SPI or SGI).However, a longer cutoff of 30 years would lower the number of qualifying wells significantly.For example, all the Dutch wells would have been excluded under this criterion (Fig. 1).The limited availability of in situ groundwater data records as well as the variable record lengths are inevitable problems in performing groundwater drought studies over a large domain (see, e.g., Peters et al., 2006;Weider and Boutt, 2010;Li and Rodell, 2015).We nevertheless have tested the reliability of our results in this data availability issue (as discussed later in Sect.3.1).
The sampling time interval of groundwater observations varied from well to well and also within a single well from one time period to another, at daily, weekly, and monthly time intervals.For example, the original data for German wells were measured on at least a weekly time interval un-R.Kumar et al.: Groundwater drought prediction using the SPI til about 1990; from then on a steadily increasing number of observations switched to daily measurements.Roughly from 2000 onwards, all stations have provided data at a daily time interval.To harmonize these disrate data sets at a common timescale, we performed the analysis at a monthly timescale wherein shorter timescale data sets were averaged to produce the monthly groundwater time series.The missing groundwater observations were left out of the analysis (i.e., left missing).Finally, we consider only those wells that have at least 10 years of valid monthly records (i.e., without missing values).
The daily precipitation time series at every well were extracted from their gridded estimates computed based on the available raingauge network (Samaniego et al., 2013;ten Broek et al., 2014).The underlying point measurement data from about 5600 rain gauges for Germany and 51 rain gauges for the Netherlands were acquired from the German Meteorological Service (DWD) and the Royal Netherlands Meteorological Institute (KNMI), respectively.Interested readers may refer to Samaniego et al. (2013) and ten Broek et al. (2014) for more details on processing with precipitation data sets for the German and Dutch regions, respectively.The monthly total precipitation was then computed from their respective daily estimates to match the temporal resolution of groundwater records.Additionally, prior to the SPI calculations, the precipitation time series was filtered based on the temporal availability of groundwater records to ensure the comparability between the two time series.In other words, the months with missing groundwater records were also set to missing in the precipitation time series.This filtering step was however applied after the accumulation of precipitation (for any selected time periods, e.g., 3, 6, and 12 months) for longer timescales had been performed.In this way, we ensured the consistency of the longer timescale (accumulated) SPI estimates, as well as their compatibility with the availability of groundwater records such that both variables had the same sample size for the estimation of the corresponding drought indices (i.e., SPI and SGI).

Drought indices
The Standardized Precipitation Index (SPI) was developed by McKee et al. (1993) to characterize the wetness and dryness conditions of a region based on the departure of the monthly precipitation estimate from their (average) normal value.The SPI can be estimated for different timescales by accumulating the monthly precipitation over different periods, typically at 3, 6, 12, 24, or 36 months (see McKee et al., 1993, for a detailed treatment).In most applications of the SPI, an analytic distribution function (e.g., the gamma) is fitted to the long-term precipitation record for a given accumulation period, and then the corresponding cumulative probability distribution is computed.Finally, the cumulative probability distribution is transformed to the standard normal distribution to estimate the SPI (McKee et al., 1993;Guttman, 1999).Any month with an SPI value below (above) zero is assumed to reflect dry (wet) conditions.Fitting theoretical distribution functions to data is potentially problematic because it is difficult to determine the structural form of the distribution function in advance.For example, Guttman (1999) found for the SPI that the Pearson type III distribution was the best universal model based on a large set of US data sets, whereas Lana et al. (2001) found that data from Catalonia in Spain could best be modeled with the Poisson-gamma distribution.Additional problems may arise if the data exhibit multi-modality.
To avoid these problems and minimize the uncertainty associated with the selection and estimation of parametric distribution functions, we used a non-parametric kernel density estimator to compute the cumulative probability distributions of the precipitation and groundwater data.The kernel density f (x) is given as where h represents the bandwidth, K(x) the kernel smoothing function, x 1 , . .., x n the set of variables of interest (i.e., precipitation or groundwater level), and n the sample size.We used the Gaussian kernel in this study because of its unlimited support and estimated the bandwidth h by an optimization against a cross-validation error estimate (see Samaniego et al., 2013, for details).The distribution functions and the corresponding bandwidths were estimated for each well and calendar month separately.The resulting quantiles, bounded on [0, 1], are denoted hereafter as the SPI and SGI for precipitation and groundwater, respectively.The quantile-based index has been used in several recent drought studies (Sheffield et al., 2004;Andreadis et al., 2005;Vidal et al., 2010;Samaniego et al., 2013), and can be easily transformed to the unbounded range of the standard normal distribution (Vidal et al., 2010).The SPI and SGI values below (above) 0.5 denote dry (wet) conditions.Compared to the absolute values of groundwater heads (precipitation estimates), the transformed SGI (SPI) values facilitate better the comparison across space and season (Sheffield et al., 2004).We note that our approach of estimating the SPI time series differs from a more conventional approach of fitting a defined distribution function to the precipitation time series and then estimating the corresponding SPI estimates (Guttman, 1999;Hayes et al., 2010).A non-parametric approach was used here to avoid the problem of assigning a unique distribution function to all data sets (as mentioned above), and to ensure the consistency in the estimation of drought indices for the precipitation and groundwater time series (i.e., both variables use a similar approach so that the resulting drought indices fall within the same range [0, 1]).We note that many recent drought studies have adopted a non-parameteric approach for the estimation of drought indices (see, e.g., Andreadis et al., 2005;Vidal et al., 2010;Bloomfield and Marchant, 2013;Samaniego et al., 2013;Hao et al., 2014).Bloomfield and Marchant (2013), for example, had difficulties in identifying a unique best distribution function that fits all groundwater records at various locations, and even at a given location a fitted distribution function varied from one calendar month to another.Here we adopted their approach and estimated precipitation and groundwater drought indices (SPI and SGI) through a non-parameteric method.

Experimental setup and evaluation criteria
In this study, we explored the ability of the SPI to characterize the local-and regional-scale behavior of groundwater droughts.Therefore, we carried out our analysis at two disparate scales denoted hereafter as the point and grid scales.
The point-scale analysis was performed on a well-by-well basis using their available SPI and SGI time series.We based this analysis on the assumption that the zone of influence for changes in groundwater levels is limited to the area directly surrounding the well, as most of the wells are located within shallow aquifers (see Fig. 3 for the distribution of average depth to the water table across the investigated wells).
We note that the approach chosen here is consistent with that commonly used in previous groundwater studies (e.g., Bloomfield and Marchant, 2013;Li and Rodell, 2015).On the other hand, the grid-scale analysis was carried out using the monthly estimates of drought indices gridded at a 0.5 • spatial resolution -the scale that is commonly used in regional-and global-scale drought studies (e.g., Sheffield et al., 2004;Andreadis et al., 2005;Gudmundsson et al., 2012;Seneviratne et al., 2012;Van Loon et al., 2012;Tallaksen and Stahl, 2014;Wanders et al., 2015).The gridded fields of drought indices were estimated using a procedure similar to the ones employed in creating multi-model drought indices (see, e.g., Mo and Lettenmaier, 2013;Nijssen et al., 2014).Following this procedure, the individual estimates of a well-specific drought index were combined into a single grid representative estimate by averaging the drought index from those wells that lie within the selected grid cell.The resulting monthly estimates at each grid were then converted into a percentile-based drought index following the (nonparametric kernel) density estimator approach illustrated in Sect.2.2 for the well-specific data sets.The number of qualifying wells with at least 10 years of records per grid cell varied across the study domain between 1 and 261 for Germany and between 1 and 14 for the Netherlands, with a median value of around 21 and 5 wells, respectively.We note that as a preliminary investigation towards the regional assessment of groundwater droughts, we used a well-adopted ensemble mean approach to estimate 0.5 • gridded fields of the SPI and SGI.In this simple approach, we used data of all available wells that fall within a particular grid cell, without accounting for the differences in sample size (i.e., the number of wells within a grid cell), to create the gridded estimates of the SPI and SGI.We have analyzed the differences in sample size in the gridded SPI and SGI skill scores (reported in Sect.3.4).
To provide a qualitative skill of the SPI to characterize the SGI, we first examined the spatio-temporal relationship between the two indices based on the cross-correlation analysis.
Here we considered the entire spectra [0, 1] of the SPI and the SGI, without distinguishing between dry or wet regimes.In other words, this part of the analysis was conducted using the entire time series of the SPI and SGI that covers the whole spectrum of hydro-meteorological conditions spanning from extremely dry to very wet conditions.The analysis was performed separately for both point-and grid-scale data sets, with different accumulations and lags of the SPI ranging from 1 to 48 months.In this analysis, we used the Spearman rank correlation coefficient (r) as a non-parametric measure to quantify the strength of a monotonic relationship between the SPI and SGI.Since the propagation of precipitation signals to the groundwater is highly nonlinear, the rank correlation was preferred over the traditional Pearson (linear) correlation coefficient in this analysis.The goal here was to identify what accumulations and lags of the SPI are required to align the signals of precipitation with the groundwater heads, and how they varied in space for both point and gridded data sets.
In the subsequent analysis, we focused on assessing the ability of the SPI to detect groundwater droughts based on the SGI.A drought is defined when the indices (i.e., the SPI or SGI) fall below a certain threshold (τ ), taken here as 0.2 following previous studies (Sheffield et al., 2004;Andreadis et al., 2005;Vidal et al., 2010;Samaniego et al., 2013).According to the drought classification scheme used by the US Drought Monitor (USDM; http://droughtmonitor.unl.edu), more severe and extreme drought conditions appear when the indices fall below the τ value of 0.1 and 0.05, respectively.The reliability of groundwater drought predictions made by the SPI for different drought classes can be assessed using probabilistic scores based on the probability of detection or hit rate (H ) and the false alarm ratio (F ).Following the (2 × 2) contingency table, the hit rate (H ) is given by and the false alarm ratio (F ) is represented as where a, b, and c are the hits, false alarms and misses, respectively (Wilks, 2011).In our case, the hit rate H is the fraction of all groundwater drought events correctly predicted by the SPI (i.e., the ratio of the number of times the SPI predicts a groundwater drought when the SGI indicates the occurrence of one, to the total number of times the SGI indicates drought conditions).The false alarm ratio F represents the fraction of forecasted drought events that were false alarms (i.e., the ratio of the number of times the SPI predicts a groundwater drought when the SGI does not indicate one, to the total number of times the SPI predicts droughts).
The best scores for H and F are 1 and 0, respectively, and the worst values are 0 and 1, respectively.More recently, Haslinger et al. (2014) used a similar approach to assess the link between the SPI and other atmospheric indices (e.g., SPEI and PDSI) to detect low-flow events in the Austrian catchments based on hit rates (H ).

Cross-correlation analysis between the SPI and SGI
The results of the cross-correlation analysis between the SGI and SPI at different accumulations and lags revealed a large degree of spatial variability in the accumulation period A required to achieve maximum correlation r m at both point and grid scales (Fig. 2).The A value corresponding to the r m is referred hereafter as an "optimal" accumulation period.The estimates of A across the majority of wells and grid cells (> 90 %) in both study regions varied broadly between 3 and 36 months, with an overall median value of around 6-12 months.The relatively large variation of A values across the investigated wells signified the importance of the underlying climate, soil, vegetation, and aquifer properties in modulating the precipitation signals for groundwater flows.
Our preliminary analysis indicated that the wells located in comparatively very thick unsaturated zones or deeper groundwater tables exhibited on average higher accumulation periods, and vice versa (Fig. 3a).For example, the higher accumulation values (> 24 months) in the middle of the (Gelderland) Dutch region are due to the presence of a relatively thicker unsaturated zone going up to 30 m deep (Fig. 2).Consistent with the theoretical expectation, a similar relationship between the accumulation periods and the depth to water table was reported recently by Li and Rodell (2015) when analyzing groundwater droughts at wells located in the Mississippi River basin and nearby regions.In general, deeper groundwater tables (or thicker unsaturated zones) cause more attenuation of the high-frequency precipitation signals and require longer accumulations of precipitation to properly align with the smoothed variability of groundwater signals (Barthel, 2011).On the other hand, the shallower groundwater table responds more quickly to high-frequency precipitation events, and the variability of the groundwater anomalies is better explained by the shorter timescale of the SPI.There were, however, exceptions to this general behavior, and the temporal dynamics of groundwater Average depth to water table (m) indices (SGI) at some wells in shallower aquifers exhibited a better correlation with a longer timescale SPI, going up to 48 months (Fig. 3a).This highlighted the need to take into account other hydrogeological and well-specific information like aquifer release and storage characteristics, perforation type, and borehole location (Bloomfield and Marchant, 2013;Stoelzle et al., 2014).
We also examined the role of geological characteristics in the spatial variability of optimal accumulation period (A) and maximum correlation (r m ) between the SPI and SGI.For this purpose, the underlying hydraulic conductivity values of the uppermost aquifer were extracted from the available largescale hydro-geological map of Germany (HUEK200, available at a scale of 1 : 200 000).The wells were grouped into four dominant conductivity classes: high (> 10 −3 m s −1 ), medium (10 −3 -10 −5 m s −1 ), low (10 −5 -10 −7 m s −1 ), and very low (< 10 −7 m s −1 ).Results of this analysis indicate that there is no clear trend in the optimal accumulation period (A) between the SPI and SGI over these classes (Fig. 3b).The correspondence between the optimal SPI and SGI appears to be relatively weaker at wells located in aquifers, with lower conductivity as indicated by a relatively lower value of the maximum correlation (r m ).The optimal accumulation periods (A) appeared on average higher for the wells located in the medium to low type of aquifer permeability class as compared to that noted for the very low conductivity class for which one could have expected the largest smoothing (or attenuation) of precipitation signals.These seemingly contradictory results indicated that the influence of local geological conditions on the propagation of precipitation signals to groundwater flows cannot be assessed by looking at single factors (here aquifer conductivity) alone.We note that other geological parameters such as transmissivity and horizontal extent of an aquifer, which are not readily available, would have been more adequate in characterizing the aquifer response time (e.g., Kraijenhoff-van de Leur, 1958;Gelhar, 1993).Also, other local factors such as depth to the groundwater and properties of the unsaturated zone play an impor-tant role, and their contribution is neither linear nor independent.It adds to the complexity of this problem that data on local conditions are only available from rather coarse largescale hydro-geological maps (e.g., the HUEK200 map), with possible large deviations from the actual well-specific conditions.These issues thus require careful and detailed analyses that are beyond the scope of this study.We note that the focus of this study was not on identifying potential factors or relationships explaining the spatial variability of accumulation periods.Nevertheless, we emphasize that the results of our above-presented analysis (Fig. 3a) showed the opportunity for establishing a first-order regional relationship between the accumulation period and the average depth to water table, for which global estimates are now becoming available (Fan et al., 2013).
The lag times (L) leading to maximum correlation (r m ) between the SPI and SGI showed a limited spatial variability across the majority of wells and grid cells with values generally close to zero (Fig. 2b).This implied that the temporal anomalies of the groundwater heads (SGI) at those locations are aligned to those of the (accumulated) precipitation (SPI).Results of our analysis did indicate a substantial variation in the maximum correlation (r m ) across the investigated wells, pointing out the lack of a uniform strong relationship between the SPI and SGI (Fig. 2c).The r m values ranged between 0.40 and 0.87 for the majority of German wells, and between 0.47 and 0.87 for the Dutch wells with the overall median r m value of around 0.68 and 0.70, respectively.A relatively weaker correlation between the SPI and SGI was found for wells located in a shallower aquifer, where the average depth to the water table is less than 5 m (Fig. 3b).The r m value estimated across these shallower wells was on average around 0.64, whereas for wells located in a relatively deeper aquifer (with water table depth > 5 m) the average r m was 0.72.This trend of the correlation (r m ) with the average water table depth wasfaccumulations, respectively.Around, however, not so strongly pronounced as in a case of the accumulation period (Fig. 3a and b).
We also tested the reliability of the above results against the data availability issue.The A and r m obtained across all wells were grouped into three categories according to their available record lengths (i.e., into 10-20, > 20-30, and > 30 years).Both the spread and the average behavior of the optimal accumulation period (A) and the maximum correlation (r m ) were comparable across the group of wells with different record lengths (Fig. 3c and d).This shows that the above-presented results are reliable and are not contingent on the selection of wells with either short or long record lengths.We also emphasize here that our results are not biased to the selected statistical criteria (i.e., rank correlation).Similar results (not shown here) were obtained using other criteria such as the Pearson correlation coefficient and the mean absolute error; both exhibited substantially large (small) variations in the accumulation (lag) periods across the analyzed wells and grid cells.

SPI with spatially uniform accumulation periods
The spatial variation in the optimum accumulation periods shown in Fig. 2 demonstrates that there existed no single representative value that is applicable over the entire domain.A noticeable reduction in the correlation values (r) between the SGI and SPI was observed when a uniform accumulation period was applied to all wells or grid cells (Fig. 4).For instance, the correlation estimated across the investigated wells on average decreased from the r m value of 0.67 to 0.23, 0.46, 0.53, 0.50, 0.44, and 0.27 for the 1, 3, 6, 12, 24, and 48 months of uniform accumulations, respectively.Around 10 to 65 % of the wells had a notably low correlation with r values less than 0.3.The gridded data sets exhibited slightly better correspondence between the SPI and SGI than the point ones, with the maximum correlation being dropped from 0.73 to 0.26, 0.54, 0.62, 0.64, 0.56, and 0.35 for the 1, 3, 6, 12, 24, and 48 months of uniform accumulations, respectively.In this case, nearly 5-60 % of the grid cells exhibited notably low correlation values below 0.3.Among the different uniform accumulation periods, the strongest correlation between the SGI and SPI was observed for 6-12 months of accumulations, while the weakest link was found for the 1month precipitation accumulation (Fig. 4).These results suggested that the changes in the monthly groundwater levels can not be explained by the month-to-month precipitation variability; rather, the smoothed response of groundwater requires the contribution from seasonal to annual precipitation.These results were consistent with findings of other recent studies performed in different regions (e.g., Bloomfield and Marchant, 2013;Li and Rodell, 2015), and the findings here assert the general notion of the groundwater system acting as a low-pass filter, responding to moderate climate forcings (e.g., Eltahir and Yeh, 1999;Weider and Boutt, 2010).
The discrepancy between the SGI and the SPI was further quantified using the mean absolute error (E) criterion to provide a quantitative estimate of the error E in the units of the SPI or the SGI (i.e., between 0 and 1).The resulting E value for both point and gridded data sets on average ranged between 0.17 and 0.26 for different accumulation periods of the SPI (Fig. 4).These were quite substantial errors considering that the threshold used to distinguish between a drought and no-drought event is usually taken as 0.2 for the quantilebased drought indices (Sheffield et al., 2004;Andreadis et al., 2005;Vidal et al., 2010;Samaniego et al., 2013).In this case, even the minimum mean absolute error (E) estimates corresponding to the spatially varying optimal accumulation periods were fairly large, with an average estimate of around 0.15-0.16for the point and gridded data sets.These high degrees of discrepancies between the SGI and the SPI clearly indicated the inability of the precipitation-based drought index to adequately characterize groundwater drought events.
Results of our analysis also showed a relatively larger spread in both statistical criteria (i.e., r and E) estimated for the point data sets as compared to the gridded ones  (bottom) estimated between the SGI and SPI of the 1, 3, 6, 12, 24, and 48 months of uniform accumulations for the point and gridded data sets.Their respective maximum (r m ) and minimum (E m ) estimates corresponding to the optimal accumulation periods of the SPI are also shown in the leftmost of the panels.Summary statistics are provided as an average ± 1 standard deviation, and the entire range is depicted as filled bars in the background.
(Fig. 4).This once again emphasizes the importance of localscale heterogeneities in propagating the precipitation signals to groundwater.Clearly, the exhibited high variability of precipitation and groundwater anomalies at a point scale is smoothed out at a grid scale due to the spatial averaging that resulted in a better correspondence between the gridded indices at a regional scale (Fig. 4).Despite the better agreement, the error between the gridded SGI and SPI at any of the uniform or optimal accumulation periods remained substantially high, with an average value of at least 0.15 -the error level that is comparable to a threshold value (τ = 0.2) used to classify droughts.
Overall, the above-presented results signified the importance of identifying an appropriate drought timescale, i.e., the optimal accumulation period of precipitation-based drought indices that is best correlated with impact variables (e.g., streamflow or groundwater levels indicating hydrological or groundwater drought indices).The application of a single accumulation period over the entire domain or among different impact variables could induce large errors and therefore is not recommended.The diversity of relationships that are usually recorded between drought indices and impact variables stresses the need for testing initially the best timescales of a drought index to determine possible impacts.It is however noted that such analysis would require a good quality of impact variable data sets and, for many regions for which we need reliable and accurate data sets (e.g., on groundwater drought information), these observations are often not readily available.Nevertheless, the issue of analyzing an appropriate drought timescale is not only specific for the groundwater system, but is also relevant for several other hydrological and ecological systems (e.g., Pasho et al., 2010;Vicente-Serrano et al., 2011, 2012;López-Moreno et al., 2013;Vicente-Serrano et al., 2013;Haslinger et al., 2014;Bachmair et al., 2015;Van Loon, 2015).

Temporal evolution of the SPI and SGI
Figure 5 shows the exemplary time series of the SGI and SPI at 6 and 12 months of the accumulation periods for all wells and grid cells and their respective spatial averages for an overlapping period of 1995-2006.The SGI estimates for both point and gridded data sets exhibited higher spatial variability that cannot be adequately represented by their respective SPIs regardless of the accumulation periods used.This pointed out the enhanced role of soil, vegetation, and hydrogeological properties in propagating the precipitation signal through the subsurface.These observations were consistent with the findings of Weider and Boutt (2010), who also found that groundwater levels in New England have higher (spatial) variability in their responses than other hydro-meteorological variables, including precipitation and streamflows.
The SPI and the SGI for large-scale drought events like those of 1996 and 2003 showed a remarkable regional difference between German and Dutch wells (Fig. 5).A drought is defined when the indices (e.g., SPI) fall below a threshold (τ ) value of 0.2.For instance, the regionally averaged SPI estimates indicated the most severe and extended (prolonged) droughts during the 1996 event for Dutch wells, which was not so strongly pronounced at the German wells (Fig. 5).The opposite behavior was, however, noticed for the 2003 drought event, where the SPI pointed towards more severe drought situations at the German wells than at the Dutch wells.The regional differences were also apparent in the anomalies of groundwater heads (SGI), with German wells showing on average a relatively smoother groundwater response compared to the highly fluctuating and variable groundwater anomalies at the Dutch wells (Fig. 5).In comparison to the SPI, the regionally averaged SGI exhibited far less severe drought conditions, although there were some wells at which the drought severity based on the SGI and SPI were comparable (Fig. 5).This is in accordance with a wellknown phenomenon of the drought attenuation while propagating through subsurface media and the groundwater compartment of the terrestrial water cycle (Hisdal and Tallaksen, 2000;Van Loon, 2015).Notably, the 1996 and 2003 drought events that appeared in the averaged SPI at the Dutch and the German wells, respectively, were not so strongly pronounced in their respective SGI estimates to characterize these events as severe large-scale groundwater droughts.
The above-presented results underpin the inability of the SPI to satisfactorily track the drought events in the groundwater compartment even when applied at longer timescales.The propagation precipitation to droughts is largely controlled by both catchment and climatic characteristics such terrain, and geological properties, and seasonality, snowmelt timing, and atmospheric water supply and evaporative demand.This results in a pronounced variation hydrologic (groundwater) drought (see, Peters al., Bloomfield and Marchant, 2013;Haslinger et al., 2014;Serrano et al., 2012;et 2013;Loon et al., 2014;Stoelzle al., Van 2015).the aspect of climatic the SPI that fully accounts for the atmospheric water supply side does not include the effects of the evaporative water demand that could be a determining factor in a (groundwater) analysis (Vicente-Serrano al., Teuling et 2013).meteorosuch the (Vicente-Serrano 2010) both atmospheric supply, evapois to be better suited for characterizing hydrologic (groundwater) droughts.However, results of our preliminary investigation (not shown here) indicated that there was not much benefit in using the SPEI over the SPI for the drought in the study regions.This could be because of the fact that these regions are char-acterized by high variability dominates the of temperature variability (expressed in the evaporation term of the SPEI).We however recognize that both meteorologically based drought may a slight difference during some specific (summer) months and time periods.We note that our study mainly focused on assessing the skill of the SPI, and the evaluation of other drought indices the or model-based indicators), which in itself would be an interesting research work, is the scope of the current study.

Skill of the SPI in predicting groundwater droughts
The of the SPI in predicting groundwater droughts was assessed using probabilistic scores based on the hit rate (H ) and false alarm ratio (F ) (see Sect. for descriptheir The results shown in Fig. 6 for H indicate that for a drought threshold τ of 0.2, the SPI was only able to correctly predict three out of five (H ≥ 0.6) SGIbased droughts less 12 of the German and 16 of the Dutch wells for any of the two uniform accumulation periods (6 and 12 months) of the SPI.Even in the case of the SPI corresponding to the spatially varying  optimal accumulation period (Fig. 6a), only 21 and % of the German and wells an H score than 0.6, respectively.The low reliability of the groundwater drought predictions using the SPI was also confirmed from the F scores, shown in Fig. 7, for which at least three in every five events (F > 0.6) were wrongly predicted at around 50 % of the wells for both uniform accumulation periods (6 and 12 months) of the SPI.In this case, around 30 % of the wells in both regions exhibited a high false alarm ratio (F > 0.6) for groundwater drought predictions using the SPI with the optimal accumulation periods (Fig. 7a).
Although the skill of the SPI for the gridded data sets was better than that of the well-specific ones, both the H and F scores for the gridded data were far from their best scores (Figs. 6 and 7).Overall, the H score on average ranged between 0.52 and 0.58 for the optimal and uniform accumulation periods of the SPI, and the corresponding F score varied between 0.44 and 0.50.These results clearly highlighted the limited skill of the gridded SPI in capturing regionalscale groundwater droughts with either optimal or uniform accumulation periods of the SPI.Furthermore, the grid cells for which the SPI and SGI were constructed based on the point-scale data of very few underlying wells (< 3) exhibited slightly lower H (and higher F ) scores compared to the others.This lower correspondence between the SPI (optimal) and SGI was noticed in a few grid cells (7 out of a total of 69 cells).For the remaining grid cells, there was no systematic pattern of improvement or deterioration in the skill scores with the increasing number of underlying wells, which indicated that the difference in the number of underlying wells among the grid cells had a relatively minor to no effect on the results presented here for the regional-scale groundwater drought analysis.
Results of the further analysis for predicting more severe and extreme groundwater drought conditions also revealed significantly poor skill of the SPI at both point and grid scales (Fig. 8).For example, the 6 and 12 months of uniform accumulation period-based SPI predictions for the severe groundwater drought conditions (τ = 0.1) exhibited an average hit rate H of around 0.26 (i.e., only one in every four events is correctly predicted) for the point data sets, and around 0.33 (i.e., only one in every three events is correctly predicted) for the gridded data sets.The corresponding average F score was quite high, around 0.79 (i.e., nearly four in every five events predicted are false alarms) and 0.67 (i.e., two in every three events predicted are false alarms), respectively.Even with the spatially varying optimal accumulation period, the overall skill of the SPI was poor, with an average H score of 0.30 and 0.39 for the point and gridded data sets, respectively.The corresponding F score was 0.72 and 0.60, respectively.
The performance of the SPI further deteriorated drastically for the predictions of the extreme groundwater drought conditions (τ = 0.05), regardless of the accumulation periods and spatial resolution of the data sets (Fig. 8).These results highlighted the limited reliability of the SPI for pre- dicting groundwater droughts at different severity levels.Among other things, these levels are used for watching (or tracking) the onset, development, and termination of drought events -essential elements to any effective drought monitoring system (e.g., USDM).The skillful predictions of these drought conditions are of critical importance because planners and water managers need to know for example the onset of droughts to take appropriate drought mitigative actions to reduce damages (Hayes et al., 2010).

Conclusions
In this study we assessed the ability of the precipitationbased drought index (SPI) to characterize groundwater droughts at more than 2000 wells located in two regions in Germany and the Netherlands.These two groundwater networks consisting of a large number of wells and available records allowed us to quantitatively evaluate the skill of the SPI for groundwater drought monitoring at a local and regional scale using the well-specific and 0.5 • gridded data sets, respectively.On the basis of this data-based exploratory analysis, we found that the precipitation needs to be accumulated over several (3-24) months to (temporally) align the SPI with the SGI time series at both local and regional scales, reflecting the significantly smoothed response of groundwater to precipitation signals.Despite this align-ment and with a relatively fair degree of correlation, the SPI lacked the skill to predict groundwater droughts based on the SGI.The necessary accumulation periods varied considerably in space, however, and were not known beforehand.We found that the thickness of the unsaturated zone (expressed here as the average depth to the water table) partly but not entirely controlled the spatial variation of the accumulation period.The groundwater levels at the wells located in relatively deeper aquifers exhibited on average a stronger correlation with longer timescale SPIs, and vice versa.There was, however, considerable noise in this relationship, and further studies are required to investigate the possible role of other land-surface and hydro-geological properties, including aquifer storage and transmission characteristics.
The application of the uniform accumulation periods over the entire domain significantly reduced the correlation between the SPI and SGI, indicating the limited applicability of the SPI as a proxy for groundwater droughts even at long accumulation times.The differences between the SGI and SPI at both point and gridded data sets were substantially high and generally comparable to the often used threshold value (τ = 0.2) to classify droughts.Based on the results of this multiscale analysis, the assumption of an average smoothing of precipitation to mimic groundwater response during droughts is highly unrealistic.
Depending on the region, the severity of SPI-based drought events differed greatly from those based on the SGI.In some cases the SPI-based extreme droughts (e.g., 1996 or 2003) only showed up in some groundwater wells but not in the spatially averaged SGI, indicating the enhanced role of the subsurface medium in modulating the precipitation signal.Future studies may look into disentangling the roles of the individual subsurface medium attributes and climatic factors.The predictions of groundwater droughts at different severity levels are crucial for water utilities and regulators for planning (e.g., management of abstraction rates and tariffs) and decision-making (e.g., restricting water usage and rationing).The results of the probabilistic scores based on the hit rate and the false alarm ratio clearly indicated the inability of the SPI to capture these aspects of drought conditions, and would therefore be inadequate for monitoring and planning purposes.While these categorical contingency table-based skill scores clearly outlined the limitations of the SPI in detecting the SGI-based groundwater drought events, more insights could be gained by analyzing the differences among different drought characteristics (e.g., duration, severity and maximum intensity) derived based on the SPI and SGI time series.Future studies may therefore look into these aspects.
In addition to the analysis focusing on the correspondence between the SPI and SGI over their entire ranges [0, 1], representing both dry and wet conditions, we put specific emphasis on assessing the skill of the SPI in predicting groundwater drought conditions at different severity levels (i.e., SGI ≤ 0.2 or 0.1 or 0.05).These analyses have allowed us to gain more insights into the limitations of a precipitation-based drought index to properly identify the groundwater droughts.Based on the results obtained in this study, the hypothesis that the observation-based SPI can adequately predict groundwater droughts could not be supported for the analyzed point and gridded data sets.
The evidence presented in this study regarding the inability of the SPI to characterize groundwater drought events at both local and regional scales calls for a different observation-based indicator like the SGI.If for data availability reasons, the precipitation-based drought indicator is used for groundwater drought studies, the aforementioned limitations should be borne in mind.We stress the need to put more efforts into the collection and collation of groundwater data, so that groundwater observations become available on a global scale to characterize groundwater drought and the availability of subsurface water resources during drought, at spatial scales small enough to be relevant for water resources management.Finally, in this study we screened our observational wells to keep minimal human influence on groundwater levels, but we note that anthropogenic changes in land use and water use in most of the world today are contributing to the discrepancy between the SPI and SGI.This human influence can and should not be disregarded by only using the SPI to characterize hydrological drought, because it creates a false image of the drought situation on the ground and its impact on people (Van Loon et al., 2016).

Figure 1 .
Figure 1.The locations of (a) German and (b) Dutch wells overlaid on the respective terrains.The marker colors show the number of months N m with available records during the periods 1950-2013 and 1988-2013 for German and Dutch wells, respectively.

Figure 2 .
Figure 2. The (a) optimal accumulation A (month) and (b) lag periods L (month) required to obtain the (c) maximum correlation r m (-) between the SGI and SPI at point and gridded (0.5 • ) scales for German (top) and Dutch (bottom) data sets.

Figure 3 .
Figure 3. Box-and-whisker plots of the optimal accumulation period A (top) and the maximum correlation r m (bottom) estimated for a group of wells with varying depth to water tables (left: a), aquifer hydraulic conductivity classes (middle: b), and record lengths (right: c).Resultsshown for the aquifer hydraulic conductivity corresponding to the German wells are grouped into four distinct classes: high (> 10 −3 m s −1 ), medium (10 −3 -10 −5 m s −1 ), low (10 −5 -10 −7 m s −1 ), and very low (< 10 −7 m s −1 ).The percentage of wells falling within each group is indicated at the top of every plot.

Figure 4 .
Figure 4.The correlation r (top) and the mean absolute error E(bottom)  estimated between the SGI and SPI of the 1, 3, 6, 12, 24, and 48 months of uniform accumulations for the point and gridded data sets.Their respective maximum (r m ) and minimum (E m ) estimates corresponding to the optimal accumulation periods of the SPI are also shown in the leftmost of the panels.Summary statistics are provided as an average ± 1 standard deviation, and the entire range is depicted as filled bars in the background.

Figure 5 .
Figure 5.The monthly series the 6-and 12-month point (light blue) and gridded (light pink) SPI and the respective spatial averages (dark blue and dark red) (a) German and Dutch data The plots the SGI series their averages.black line the threshold of 0.2.

Figure 6 .
Figure6.The hit rate (H ) to detect SGI-based groundwater droughts using the SPI with the (a) optimal accumulation period and c) 6 and 12 months of uniform accumulation periods at the point and gridded scales for German (top) and Dutch (bottom) data sets.A threshold value τ of 0.2 is used to identify drought events.

Figure 7 .
Figure7.The false alarm ratio (F ) to detect SGI-based groundwater droughts using the SPI with the (a) optimal accumulation period and (b, c) 6 and 12 months of uniform accumulation periods at the point and gridded scales for German (top) and Dutch (bottom) data sets.A threshold value τ of 0.2 is used to identify drought events.

Figure 8 .
Figure8.The hit rate (H ) and the false alarm ratio (F ) averaged over all investigated (a) wells and (b) grid cells to detect SGI-based groundwater droughts using the SPI with the optimal accumulation and 6 and 12 months of uniform accumulation periods for varying levels of threshold value τ (0.2, 0.1, and 0.05) used to identify drought events.