Gridded statistical downscaling methods are the main means of preparing climate model data to drive distributed hydrological models. Past work on the validation of climate downscaling methods has focused on temperature and precipitation, with less attention paid to the ultimate outputs from hydrological models. Also, as attention shifts towards projections of extreme events, downscaling comparisons now commonly assess methods in terms of climate extremes, but hydrologic extremes are less well explored. Here, we test the ability of gridded downscaling models to replicate historical properties of climate and hydrologic extremes, as measured in terms of temporal sequencing (i.e. correlation tests) and distributional properties (i.e. tests for equality of probability distributions). Outputs from seven downscaling methods – bias correction constructed analogues (BCCA), double BCCA (DBCCA), BCCA with quantile mapping reordering (BCCAQ), bias correction spatial disaggregation (BCSD), BCSD using minimum/maximum temperature (BCSDX), the climate imprint delta method (CI), and bias corrected CI (BCCI) – are used to drive the Variable Infiltration Capacity (VIC) model over the snow-dominated Peace River basin, British Columbia. Outputs are tested using split-sample validation on 26 climate extremes indices (ClimDEX) and two hydrologic extremes indices (3-day peak flow and 7-day peak flow). To characterize observational uncertainty, four atmospheric reanalyses are used as climate model surrogates and two gridded observational data sets are used as downscaling target data. The skill of the downscaling methods generally depended on reanalysis and gridded observational data set. However, CI failed to reproduce the distribution and BCSD and BCSDX the timing of winter 7-day low-flow events, regardless of reanalysis or observational data set. Overall, DBCCA passed the greatest number of tests for the ClimDEX indices, while BCCAQ, which is designed to more accurately resolve event-scale spatial gradients, passed the greatest number of tests for hydrologic extremes. Non-stationarity in the observational/reanalysis data sets complicated the evaluation of downscaling performance. Comparing temporal homogeneity and trends in climate indices and hydrological model outputs calculated from downscaled reanalyses and gridded observations was useful for diagnosing the reliability of the various historical data sets. We recommend that such analyses be conducted before such data are used to construct future hydro-climatic change scenarios.
Water resources infrastructure is designed to accommodate
hydrologic extremes such as floods and droughts (Cunderlik and Ouarda, 2009;
Cunderlik et al., 2004; Ouarda et al., 2006). The frequency and magnitude of
extreme hydrologic events such as floods and droughts have changed with
climate and there is broad agreement that changes will continue with
projected increases in greenhouse gases (IPCC, 2013). The direction and
magnitude of change is not uniform across the globe, but is regionally
specific, distinguishable by hydrologic regime and by local changes to
temperature and precipitation (Cunderlik and Ouarda, 2009; Monk et al., 2011;
Sheffield et al., 2012; Stahl et al., 2010, 2012). For example, in Canada,
floods in snowmelt-dominated regimes decreased in magnitude, while floods in
rainfall-fed regimes had no significant trend over 1974 to 2003 (Cunderlik
and Ouarda, 2009). Conversely, Canadian annual low-flow indices showed
spatially uniform decreases over 1970 to 2005 (Monk et al., 2011). Thus,
future changes in hydrologic extremes need to be estimated at regionally
relevant resolutions (
Global climate models (GCMs) are one of our only tools for projecting the
future climate, but they operate at scales too coarse
(
Gridded statistical downscaling methods provide a computationally efficient
and effective means of producing plausible hydro-climatology from a large
ensemble of GCMs (Salathe et al., 2007; Salathé, 2005; Wood et
al., 2004). A number of studies have compared multiple statistical
downscaling methods for use in climatological or hydrological projections.
Maurer and Hidalgo (2008) compared constructed analogues (CA) and bias
correction spatial disaggregation (BCSD) using the National Centers for
Environmental Prediction/National Center for Atmospheric Research
Reanalysis I (NCEP1) (Kalnay et al., 1996) as a surrogate GCM. Methods were
comparable in producing precipitation and temperature at a monthly and
seasonal level, but skilfully downscaled daily data depended on the ability
of the climate model to show daily skill. Bürger et al. (2012a) compared
five methods for their ability to represent climatic extremes including BCSD
and expanded downscaling (XDS). The fixed diurnal temperature range in BCSD
was seen as a shortcoming in Bürger et al. (2012a). XDS performed best,
passing 48 % of single tests on average for 27 Climate Indices of
Extremes (ClimDEX), with BCSD close behind, passing 45 % (Bürger et
al., 2012a). Pierce et al. (2013) found that projected increases in annual
precipitation versus decreases in California were due to disagreements in the
occurrence of the heaviest precipitation days (
The first generation National Centers for Environment Prediction/National Center for Atmospheric Research Reanalysis I (NCEP1) reanalysis (Kalnay et al., 1996) is often used as a surrogate GCM when testing downscaling techniques (Bürger et al., 2012a; Gutmann et al., 2014; Maurer et al., 2010), primarily because of its long record length. Recently new reanalysis products have come online, bringing to light possible issues with NCEP1, such as a spurious pattern in precipitation fields at high latitudes (Sheffield et al., 2012), and lack of skill in producing daily air temperature at high altitudes versus other reanalyses (Hofer et al., 2012). Reanalyses differ due to variations in assimilated observational data, assimilation methods, representations of surface and boundary layer processes, physics packages, and dynamical cores, and the resulting uncertainty in output fields can be considerable, especially for climatic extremes (Sillmann et al., 2013a). For instance, discrepancies between reanalyses for some climate extreme indices, such as frost days in some regions, are sometimes as large as the typical inter-model spread of the Coupled Model Intercomparison Project ensembles (Sillmann et al., 2013a). These differences arise because near surface temperature and precipitation extremes are calculated from variables that are relatively poorly constrained by observations in reanalyses. Additionally, non-stationarity exists in some reanalysis products because they amalgamate observational data sets from different sources over time (Donat et al., 2014). In the context of historical validation of downscaling methods, statistical downscaling methods may perform poorly simply because reanalysis outputs are not stationary over the calibration and validation periods (Maurer et al., 2013). All of these factors suggest that multiple reanalysis products should be used as GCM surrogates to ensure methods are not failing due to irreparable errors in reanalyses, and also to explore the variability in results due to reanalysis uncertainty.
Gridded climate observations underpin hydrologic projections. They are used
to calibrate the downscaling technique and the hydrologic model, serving as
targets and inputs, respectively. Gridded observations are commonly evaluated
via comparison with station observations (Hutchinson et al., 2009; Werner et
al., 2015), intercomparison with other gridded observations (Eum et
al., 2014), or by using them to drive a hydrologic model and comparing
outputs to observed water balance fluxes and streamflow over large basins
(Livneh et al., 2013; Maurer et al., 2002). We know that statistical
downscaling methods perform poorly when non-stationarity occurs between the
calibration and validation periods (Maurer et al., 2013), but we have not
evaluated how apparent non-stationarity caused by natural climate variability
(Huang et al., 2014; Maraun, 2012) is amplified or diminished with methods
used to create gridded observations, which could also affect the success of
downscaling methods. Furthermore, stationarity in mean annual precipitation
and temperature does not dictate stationarity in climatic or hydrologic
extremes. Not all, but some, previous studies have included as many years as
possible in the calibration, with the goal of maximizing the available
historical record available for resampling in the temporal disaggregation
step applied in BCSD (Bürger et al., 2012a; Salathé, 2005; Werner,
2011). This approach is also supported by other studies that found bias
correction is more robust for larger samples from longer time series,
especially for extremes such as flood events (Huang et al., 2014; Themeßl
et al., 2011). The pros and cons of this extended calibration period have not
been fully evaluated. This investigation will help the hydrologic modelling
community build a better evaluation system for gridded observations to ensure
their strength not only for projections of mean monthly changes over large
basins
When used to make climate change projections, distributed hydrologic models
such as VIC are best driven with gridded daily data, which is usually
produced via gridded statistical downscaling techniques such as BCSD, CA, and
BCCA, three gridded methods that have been tested to date. Applying BCSD
using minimum and maximum monthly temperature instead of mean monthly
temperature has not been tested and may correct some issues with diurnal
temperature range (Bürger et al., 2012a). It is important to note that
the effect of BCSD on daily temperature range (DTR) when used with daily data
and ways to ensure minimum temperature is less than maximum temperature has
been tested by Thrasher et al. (2012) and is not the focus of this study. A
few other methods have been developed recently that warrant investigation.
These include double bias corrected constructed analogue (DBCCA), which is
similar to BCCA but applies a second quantile mapping bias correction as a
post-processing step to correct drizzle and other residual biases (Maurer et
al., 2010). Additionally, the climate imprint delta method (CI) (Hunter and
Meentemeyer, 2005) and the “reverse” BCSD (similar to SDBC in Ahmed et
al., 2013, and AR in Gutmann et al., 2014), which we refer to as bias
corrected climate imprint (BCCI) due to its use of CI for interpolation, have
not been explored for their applicability to hydrology. A recently developed
hybrid of BCCA and BCCI, referred to as BCCAQ (Cannon et al., 2015; Murdock
et al., 2014), has the potential to be an improvement versus other gridded
statistical downscaling techniques and has not been tested with hydrologic
extremes. This work will also help to inform use of the resulting BCSD and
hydrologic model output provided by the Pacific Climate Impacts Consortium
(PCIC;
The ClimDEX indices are recommended by the World Meteorological Organization
Expert Team on Climate Change Detection and Indices (ETCCDI) (Zhang et
al., 2011) as a means of summarizing daily temperature and precipitation
statistics, focusing particularly on aspects of climate extremes. They have
been developed to allow seamless comparison of climate conditions on an
international basis. There are many projects applying the ETCCDI indices to
detect changes in extremes historically (e.g. Sillmann et al., 2013a), to
project future changes (e.g. Sillmann et al., 2013b), and to provide future
changes via data portals to allow local analysis
(
The objectives of this study are the following.
To compare several reanalyses in the study region against two gridded
observation data sets. To test the ability of the BCCA, DBCCA, BCCI, CI, BCSD (mean temperature),
BCSDX (minimum and maximum temperature), and BCCAQ downscaling techniques to
simulate 26 ClimDEX indices using four reanalyses and two gridded
observations. To test the ability of the BCCA, DBCCA, BCCI, CI, BCSD (mean temperature),
BCSDX (minimum and maximum temperature), and BCCAQ downscaling techniques
when used to force the VIC hydrologic model, to simulate 3-day peak-flow and
7-day low-flow indices using four reanalyses and two gridded observations. To learn more about the strengths and weaknesses of two gridded observations
for use with hydrologic modelling. To see whether the strength of a method to downscale for climate extremes relates
to abilities for use with hydrologic extremes.
The Peace River basin will be the focus of this work. The snow-dominated
regime of this basin makes the findings of this work applicable to many
mid-latitude areas. The Peace River is located in interior north-eastern BC
and encompasses the 101 000
The Peace River basin (above Taylor, BC) study area analysed for ClimDEX indices (black boundary) and the five sub-basins investigated for hydrologic extremes, including the Finlay River above the Akie River (FINAK), the Ingenika River above the Swannell River (INGEN), the Parsnip River above the Misinchinka River (PARMS), the Peace River above the Pine River (PEAPN), and the Peace River at Bennett Dam (BCGMS).
Annual daily hydrograph 1985–1995 for the (top) Ingenika and (bottom) BCGMS hydrometric sites.
Two daily, gridded observational data sets were available over the study
area. The first was generated for BC for application with the Variable
Infiltration Capacity (VIC) macro-scale distributed hydrologic model
following the methods of Maurer et al. (2002) and Hamlet and Lettenmaier
(2005). Daily gridded surfaces of minimum and maximum temperature and daily
precipitation accumulation were produced at the spatial resolution of
The second data set was created for all of Canada using the Australian
National University Spline (ANUSPLIN) implementation of trivariate thin plate
smoothing splines (Hutchinson et al., 2009). The Canada-wide ANUSPLIN
observational data set was created at a
Four atmospheric reanalysis products were selected to span a range of
complexity and spatial resolution. Chosen methods include NCEP1, European
Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis 40 (ERA40),
ECMWF Re-Analysis Interim (ERAInt), and the National Oceanic and Atmospheric
Administration – Cooperative Institute for Research in Environmental
Sciences 20th Century Reanalysis V2 (20CR). NCEP1 is a popular reanalysis
product applied in the validation of statistical downscaling techniques
(Bürger et al., 2012a; Maurer et al., 2010). It spans the period from
1948 to the present, is
Availability of gridded observations and reanalyses.
Seven statistical approaches are selected based on their wide use and/or potential strength in downscaling coarse-scale models to gridded observations for representing extremes. BCSD has been applied across North America (Maurer and Hidalgo, 2008; Salathé, 2005; Schnorbus et al., 2014; Wood et al., 2002, 2004). Monthly minimum temperature, maximum temperature, and precipitation from GCMs or reanalyses are bias corrected, using quantile mapping, against gridded observations aggregated to the large-scale model grid. Bias corrected, spatially disaggregated monthly data are temporally disaggregated to a daily time step via random sampling of historical months. Days in the selected month are rescaled (multiplicative for precipitation and additive for temperature) to match the bias corrected monthly precipitation and average temperature (Fig. 3a). Two variations of BCSD are tested; one derives minimum and maximum temperature from mean temperature in the coarse-scale model by assuming a uniform monthly diurnal temperature range (BCSD); the other uses monthly minimum and maximum temperature directly from the large-scale model (BCSDX).
Two constructed analogue downscaling approaches are tested: BCCA and DBCCA (Maurer et al., 2010). BCCA bias corrects the large-scale temperature and precipitation using quantile mapping, as in BCSD, except on daily rather than monthly large-scale data. In the constructed analogue (CA) component, a library of observed daily coarse-resolution and corresponding high-resolution climate patterns of the variable to be downscaled is built (Hidalgo et al., 2008). Daily data are downscaled by selecting 30 days from the coarse-scale library that have the closest similarity to a given simulated day; optimal weights are determined via ridge regression and the 30 corresponding fine-scale library patterns are combined using the same weights (Maurer et al., 2010). In the DBCCA technique, a second quantile mapping bias correction is then applied at the fine scale to fix drizzle and other biases caused by the linear combination of daily fields in the CA step (Fig. 3a).
Two climate imprint methods are tested, the CI delta method (Hunter and Meentemeyer, 2005) and bias corrected CI (BCCI), which applies quantile mapping to the interpolated series from CI (Fig. 3a). For the imprint methods, long-term averages (i.e. 30 years) from the fine-scale data provide a “spatial imprint” that is used to represent environmental gradients. The ratio of daily to average monthly values is multiplied by the fine-scale monthly values for a location to get the daily precipitation. This is similar for minimum and maximum temperature, except values are calculated as the difference between the monthly mean and the daily value (Hunter and Meentemeyer, 2005).
While BCCI applies quantile mapping as a post-processing step to the interpolated fine-scale outputs from the CI method, BCCAQ is a post-processed version of BCCA where the final quantile mapping bias correction is based on BCCI. First, the BCCA and BCCI algorithms are run independently, and then BCCAQ corrects BCCA with BCCI. The daily BCCI outputs at each fine-scale grid point are reordered within a given month according to the daily BCCA ranks. Because the optimal weights used to combine the analogues in BCCA are derived on a day-by-day basis, without reference to the full historical data set, the algorithm may be prone to Huth's paradox, wherein models that are calibrated based on short-term variability may be biased and fail to produce realistic long-term trends (Benestad et al., 2008; Huth, 2004). Reordering data for each fine-scale grid point within a month effectively breaks the overly smooth representation of sub reanalysis-grid-scale spatial variability inherited from BCCI (Maraun, 2013), thereby resulting in a more accurate representation of event-scale spatial gradients; this also prevents the downscaled outputs from drifting too far from the BCCI long-term trend. Over longer timescales, the spatial variability of BCCAQ converges to that of BCCI.
Statistical methods are calibrated from 1950 to 1990 for 20CR and NCEP1, and from 1958 to 1990 and from 1979 to 1990 for ERA40 and ERAInt, respectively (Table 2). Calibration periods were selected to include the longest overlapping record between the gridded observation and reanalyses to replicate the approach taken in Werner et al. (2011). Thus, the 20CR and NCEP1 reanalyses results will serve to evaluate the gridded observations and these two reanalyses, and also to validate the calibration–validation approach taken with BCSD for a series of studies conducted in this region (Bürger et al., 2012a, b; Schnorbus et al., 2014; Shrestha et al., 2012; Werner et al., 2013). The resulting modelling framework for these two gridded observations, four reanalysis products, and seven gridded statistical downscaling techniques is displayed in Fig. 3b. All statistical downscaling methods use precipitation and temperature as predictors and predictands.
ClimDEX is a common climate indices package that computes values for 27 core
indices based on daily precipitation and minimum and maximum temperature
(Karl et al., 1999; Peterson, 2005; and
Calibration and validation periods for downscaling methods by reanalyses.
Hydrologic projections for the Peace River basin are derived using the
Variable Infiltration Capacity (VIC) model (Liang et al., 1994, 1996). The
VIC model is a spatially distributed macro-scale hydrologic model that was
originally developed as a soil–vegetation atmosphere transfer scheme for
general circulation models. It has been used to evaluate climate change
impacts on global river systems (Nijssen et al., 2001) and in the mountainous
western United States and BC (Elsner et al., 2010b; Hamlet and Lettenmaier,
2005, 2007; Schnorbus et al., 2014; Shrestha et al., 2012). Its spatially
distributed nature makes it suitable for capturing regional variation in the
hydrologic cycle due to topographic, physiographic, and climatic controls.
The VIC model is also process based, allowing for a more plausible
extrapolation of hydrologic processes into future climate regimes (Leavesley,
1994). The VIC model is applied at a resolution of
Metadata for five select sub-basins of the Peace River basin.
Calibration and validation statistics for five select sub-basins of the Peace River basin under the under VIC Forcings and ANUSPLIN gridded observational data sets including the Nash–Sutcliff efficiency score (NS), the Nash–Sutcliff efficiency score of the log-transformed discharge (LNS), and the percent volume bias error (%VB).
The Finlay River above Akie River, Ingenika River above Swannell River,
Parsnip River above Misinchinka River, and Peace River above Pine River
sub-basins of the Peace River were calibrated to observations from Water
Survey of Canada (Fig. 1). Peace River at Bennett Dam was calibrated to
naturalized flow provided by BC Hydro. The sub-basins range in drainage area
from 4200 to 83 900
There are several daily streamflow metrics that are useful for water resources design and management, which are also ecologically relevant (Monk et al., 2011; Richter et al., 1996; Shrestha et al., 2014b). A recent intercomparison of statistical downscaling techniques for use with daily streamflow investigated the hydrologic extremes 3-day peak flow and 7-day low flow (Maurer et al., 2010). To build on that study we investigate the strength of seven downscaling methods for the same metrics using 3-day peak flow to represent flood and, 7-day low flow, drought. Two low-flow periods are investigated because the lowest discharge takes place in the months of October–April in sub-basins of the Peace River (Fig. 2) and summer low flows (July–September) are of interest to agriculture and ecology. Hydrologic models can have low flows in different seasons than observations due to their poor parameterization of baseflow conditions and because calibration approaches favour good performance for peak flow (Najafi et al., 2011). This issue can be exaggerated by downscaling approaches (Shrestha et al., 2014b). Thus, narrowing the window over which low flows are accessed is important to prevent low flows in one season being compared to low flows in another. Peak flows are analysed between May and July.
The seven statistical downscaling methods vary in their approach, which can result in differing strengths and weaknesses. We chose our statistical tests to fully evaluate these downscaling techniques for the climate and hydrologic results and to follow the framework of Bürger et al. (2012a). The time period for calibration of the downscaling techniques was selected to match Bürger et al. (2012a) (pre-1991, depending on the availability of the reanalyses). Longer calibration periods available for NCEP1 and 20CR were also seen as favourable when applying bias correction based downscaling methods, especially when working with extremes (Huang et al., 2014; Themeßl et al., 2011), and assisted with evaluating the two gridded observations. Validation was set to 1991–2005 to accommodate the overlap of available reanalyses, gridded observations, and observed streamflow records. ERA40 is an exception, with the last full year of available record for 2001. Validation results for ERA40 are provided for 1991–2001.
All statistical tests used in this study are conducted at the 5 % significance level, meaning that the tests are conducted in such a way that rejection of the null hypothesis is expected to occur in 5 % of tests when the null hypothesis is true. Statistical hypothesis testing with absolute certainty is impossible. The choice of significance level reflects a balance between the rate at which false rejection of the null hypothesis is expected to occur (so-called type I error) and the rate at which a given test will correctly reject the null hypothesis when it is false (the so-called power of the test), with the choice of a more conservative significance level, such as 1 %, leading to lower power in exchange for a lower type-I error rate (e.g. von Storch and Zwiers, 1999).
Two statistical tests are applied to the ClimDEX results over the Peace River basin: the Kolmogorov–Smirnov (KS) test and the test for Pearson's correlation. The KS test is used to see how well the distribution of climate indices for the statistically downscaled reanalyses matches the distribution of those calculated from the gridded observations used as downscaling targets. The KS test is a nonparametric test of the equality of continuous one-dimensional probability distributions. Here, it is used to compare two samples, namely annual climate indices for the statistically downscaled reanalyses and the associated gridded observation. The KS test statistic is used to quantify the distance between empirical distribution functions of these two samples. The null hypothesis is that the two samples are drawn from the same distribution. The distributions considered under the null hypothesis have to be continuous distributions, but are otherwise unrestricted. While some of the climate indices are not strictly continuous (e.g. frost days), asymptotic critical values may still be used in the presence of a small number of ties (Janssen, 1994). Pearson's correlation is used to test the temporal correspondence between the annual climate indices for the statistically downscaled reanalyses and the associated gridded observation. Pearson's product moment correlation coefficient is used to measure the linear correlation between climate indices from downscaled reanalyses and indices from observations. The null hypothesis is that the downscaled and observed samples are not linearly correlated.
The 101 000
The KS test and the test for Pearson's correlation were applied to the 3-day peak flow and 7-day low flow in winter and summer for hydrologic data from the five sub-basins of the Peace River. In this case, with the KS test the null hypothesis is that the distribution of the hydrologic extremes created by driving the VIC model with the statistically downscaled reanalyses are drawn from the same sample as those derived from driving the VIC model with the two gridded observations. The null hypothesis for Pearson's correlation is that the hydrologic extremes created by driving VIC with downscaled reanalyses versus gridded observations are not linearly correlated.
Four reanalyses (NCEP1, ERA40, ERAInt, and 20CR) are compared to two gridded
observations (VIC Forcings and ANUSPLIN) over the Peace River basin. Daily
precipitation, minimum temperature, and maximum temperature are converted to
total monthly precipitation and average monthly temperatures over the
1950–2005. Average minimum and maximum temperatures in ANUSPLIN and VIC
Forcings are similar from year to year in most months (Figs. 4 and 5).
However, prior to 1970, ANUSPLIN can be up to 5
Monthly average minimum temperature by gridded observations (VIC Forcings and ANUSPLIN) and reanalysis (NCEP1, ERA40, ERAInt, 20CR) over the Peace River basin.
There is a warm bias in minimum temperature in 20CR and ERA40 from May to
November and a cool bias in NCEP1 from March to October relative to gridded
observations (Fig. 4). The biases in NCEP1 tend to be greater over part of
the record in some months, such as from 1970 to
This confirms that near surface temperature and precipitation values from the selected reanalyses have different characteristics due to their different resolutions, model physics, and contributing data in the Peace River basin. The two gridded observations also displayed some dissimilarity in time. Differences between these four reanalyses in this particular region should act as a stringent test of the downscaling techniques applied. However, we expect that the time-dependent differences between gridded observations and NCEP1 for minimum and maximum temperature, and precipitation, will reduce the success rate of any of the downscaling techniques (Maurer et al., 2013). Nevertheless, we carry NCEP1 through the analysis to quantify the impacts of using a potentially flawed reanalysis and also to evaluate VIC Forcings and ANUSPLIN over their full record (1950–2005) with two reanalyses (NCEP1 and 20CR).
Mean annual ClimDEX values for VIC Forcings and ANUSPLIN averaged over the Peace River basin.
Downscaled minimum temperature, maximum temperature, and precipitation from
seven gridded downscaling methods, two gridded observations, and four
reanalyses were used to generate 26 ClimDEX indices. Results were compared to
the indices generated from the respective gridded observations at their
native resolution (VIC Forcings (
Monthly average maximum temperature by gridded observations (VIC Forcings and ANUSPLIN) and reanalysis (NCEP1, ERA40, ERAInt, 20CR) over the Peace River basin.
In the calibration (1950–1990) and validation (1991–2005) periods, the VIC
Forcings and ANUSPLIN data sets are similar for most temperature based
indices and show some large differences for precipitation based indices
(Table 5), Namely, PRCPTOT, annual total wet day precipitation
(
Summary of the number of tests passed for Pearson's correlations and similarity in distributions (KS test) based on the Walker field significance test between ClimDEX indices for downscaled reanalyses versus target gridded observation over the Peace River basin for 1991–2005 (1991–2001 ERA40), summarized by gridded observation, reanalysis, and the downscaling method. Max indicates the maximum possible tests to pass in that category.
Number of tests passed for each ClimDEX index for VIC Forcings and ANUSPLIN for 1991–2005 (1991–2001 in ERA40).
Summary of the number of tests passed for Pearson's correlations and similarity in distributions (KS test) based on the Walker field significance test between ClimDEX indices for downscaled reanalyses versus target gridded observation over the Peace River basin for 1991–2005 (1991–2001) for reanalysis (ERA40) versus the downscaling method for each gridded observation.
Monthly total precipitation by gridded observations (VIC Forcings and ANUSPLIN) and reanalysis (NCEP1, ERA40, ERAInt, 20CR) over the Peace River basin.
Irrespective of downscaling method or reanalysis, those methods calibrated and validated against the ANUSPLIN gridded observations were more successful versus those based on VIC Forcings overall (Table 6), although there were some cases where VIC Forcings passed more tests than ANUSPLIN (Table 8). For example, under the BCCA method, precipitation amounts on extremely wet days (R95p) for all reanalyses based on VIC Forcings failed the Walker field significance test for the Pearson correlation, while those for ANUSPLIN passed (Fig. 7). (Note: time series shown are averages of all of the VIC Forcings or ANUSPLIN cells in the Peace basin, while the significance of results was based on the Walker field significance of the correlation tested on each grid cell in the basin.) The largest differences in the number of tests passed primarily occur for precipitation based indices where ANUSPLIN passes more than VIC Forcings. VIC Forcings passes 29 more tests than ANUSPLIN for DTR (Table 7). This result is not unexpected because the differences between the calibration and the validation period are precipitation related in VIC Forcings and temperature related in ANUSPLIN (Table 5). Step changes in daily temperature range (DTR) from 1950 to 2005 are apparent in ANUSPLIN (Fig. 8). DTR is a strong driver of snowpack generation and melt, and errors in simulating realistic DTR could affect hydrologic modelling results.
The sequencing of precipitation indices, such as CWD, PRCPTOT, R10mm, R20mm, R95p, R99p, Rx1day, Rx5day, and SDII, is most difficult to replicate for all methods, especially under VIC Forcings. VIC Forcings has a higher station density than ANUSPLIN because it includes stations from BC Hydro, the BC Ministry of Forests Lands and Natural Resource Operations, and the Ministry of Environment's BC River Forecast Centre Snow Survey Network in addition to those from Environment Canada (Werner et al., 2015). The BC Hydro network provided a large number of stations in the Peace River basin, most of which were not available until the 1980s (Werner et al., 2015). The increase in the number of stations after 1980 in the VIC Forcings likely resulted in more complex spatial patterns in precipitation, despite the monthly temporal adjustment, because it is designed to maintain spatial variability (Hamlet and Lettenmaier, 2005). Increased spatial variability in the validation period, coupled with a different interpolation method in VIC Forcings, could have made precipitation patterns harder to replicate with downscaling. If we are going to rely on these data sets to investigate changes to extreme climate and hydrology, we should develop a way to maintain temporal and spatial homogeneity for daily values while allowing data sets to reflect natural trends. Minimizing homogeneity problems throughout the record is favourable when using gridded observations to calibrate statistical downscaling methods (Gutmann et al., 2014; Livneh et al., 2013; Maurer et al., 2002).
Considering results for all downscaling methods and both gridded
observations, results based on ERAInt had the highest score of all four
reanalyses for the Pearson correlation and KS tests combined (Table 6).
ERAInt results matched sequencing of events most often, as indicated by
frequent rejection of the null hypothesis for the Pearson correlation test
(Fig. 7; Table 8), and ERA40 results matched distributions most often
according to the KS test (Fig. 9; Table 8). The zero-correlation null
hypothesis was rejected when comparing ERAInt for the ANUSPLIN and VIC
Forcings gridded observations for the number of heavy precipitation days
(R10mm), but was not rejected with other reanalyses (Fig. 7). ERA40 and
ERAInt monthly average minimum and maximum temperature and total
precipitation matched those of the gridded observations most closely (see
Sect. 4.1). ERAInt is the highest resolution (1.5
The highest ranked downscaling method based on the combined results for field significance of Pearson's correlation and the KS test for all gridded observations, reanalyses, and ClimDEX indices was DBCCA (Table 6). It tied for highest rank with CI for correlation, while BCCAQ superseded all other methods for distribution. Bias remains in results of the BCCA method for precipitation due to the linear combination of fine-scale analogues and uncorrected “drizzle” and related biases (Guttmann et al., 2014). All downscaling methods, except CI, include a quantile mapping bias correction step and are expected to do well in matching distributions with their respective gridded observation. All methods except CI pass 86 % or more of the tests for distribution (KS test), while CI passes 78 %. The correlation of DTR was a problem for all the downscaling methods and both gridded observations (Fig. 8) and for distribution based on ANUSPLIN (except BCCAQ), but not when based on VIC Forcings. BCCAQ in combination with ANUSPLIN matched DTR distributions for ERAInt, ERA40, and 20CR when all other methods failed, which points to the success of its approach of post-processing BCCA with a final quantile mapping bias correction based on BCCI. As mentioned above, DTR is an important driver in snowpacks. Additionally, it plays a key role in evaporation (Sheffield et al., 2012). Rates of evaporation are an important component of projecting future water availability and drought (Sherwood and Fu, 2014). Therefore, accurately downscaling DTR should be a priority. Including minimum and maximum monthly temperature predictors in BCSDX did not improve the correlation of DTR as was hypothesized in previous studies (Bürger et al., 2012a).
Field significant correlations based on the Walker field significance test over the Peace River basin between ClimDEX indices for downscaled reanalysis versus target gridded observation, VIC Forcings (left) and ANUSPLIN (right), by the downscaling method for 1991–2005 (1991–2001 ERA40). Dark grey boxes indicate cases in which the null hypothesis is rejected at the 5 % significance level.
Time series of average DTR from VIC Forcings (left) and ANUSPLIN (right) for NCEP1 (top), 20CR (second), ERA40 (third), and ERAInt (bottom) downscaled using BCCA, DBCCA, BCCI, CI, BCSD, BCSDX, and BCCAQ over the Peace River basin.
Field significant similarities in distributions based on the Walker field significance test over the Peace River basin between ClimDEX indices for downscaled reanalysis versus target gridded observation, VIC Forcings (left), and ANUSPLIN (right), by the downscaling method for 1991–2005 (1991–2001 ERA40). Dark grey boxes indicate cases in which the null hypothesis is not rejected at the 5 % significance level.
The previous section shows how raw reanalyses and observations differ in the Peace River basin and how downscaled reanalyses can differ in their representation of climate extremes when calibrated to one gridded observation versus another. NCEP1 has routinely been used to compare the performance of statistical downscaling methods in terms of climate and hydrologic extremes (e.g. Bürger et al., 2012a and Maurer et al., 2010). We thus continue our comparison of multiple gridded observations, reanalyses, and downscaling techniques for hydrologic extremes. Results are compared for 15 years from 1991 to 2005 (inclusive) for the five sub-basins, except for ERA40 (11 years; 1991–2001). We evaluate methods for their ability to replicate the timing (Pearson's correlation) and distribution (KS test) of the 3-day peak flow, 7-day low flow in summer, and 7-day low flow in winter.
Irrespective of reanalysis or downscaling method, VIC hydrologic model
simulations based on the VIC Forcings gridded observations passed 8 %
more tests than those based on the ANUSPLIN gridded observations (Table 9),
whereas for the ClimDEX indices ANUSPLIN passed 7 % more tests than VIC
Forcings (Table 6). The difference in the number of tests passed is not
great. Therefore, the success of the downscaling methods does not depend
strongly on which of the gridded observations is applied overall. However,
the greater number of tests passed for hydrologic modelling with the VIC
Forcings gridded observations could relate to VIC Forcings being created at
the native resolution of the VIC hydrologic model (1/16
When considering results regardless of gridded observation or downscaling technique, the number of tests passed under ERA40 was the highest overall (Table 9). Additionally, the number of tests passed for Pearson's correlation and the KS test were both highest for ERA40. The truncated validation period for ERA40, 1990–2001 versus 1990–2005 for other reanalyses, could have avoided some challenging hydrologic extreme events in 2002–2005. However, ERAInt, which was validated over 1990–2005, passed nearly the same number of tests as ERA40. Thus, the shorter calibration period in ERA40 and ERAInt avoids step changes in the gridded observations and reanalyses prior to 1958. Peculiarities with the gridded observations were apparent from 1950 to 1958 for the monthly average minimum and maximum temperatures (Figs. 4 and 5) and for the DTR and SDII ClimDEX indices (Figs. 8 and 10). Avoiding these years could have reduced artefacts in the downscaled products and hydrologic model results. Nevertheless, many studies have demonstrated that ERA40 and ERAInt are superior products versus NCEP1 (Donat et al., 2014; Ma et al., 2008, 2009; Sillmann et al., 2013a). In our own analysis ERA40 and ERAInt have similar timing and magnitude in minimum and maximum temperature and precipitation (Figs. 4, 5, and 6) to the gridded observations when NCEP1 and 20CR do not. These results confirm that downscaling methods will succeed when applied to reanalyses that have correct timing, magnitude, and trends such as ERA40 and ERAInt, more so than when applied to reanalyses such as NCEP1 and 20CR that have irregular step changes (Maraun, 2013). We should be able to assume that although the biases in GCMs will be greater than those found in reanalyses, they are consistent over time. The strength of downscaling methods when downscaling ERA40 and ERAInt versus NCEP1 and 20CR was also found with the ClimDEX indices.
Summary of the number of tests passed for Pearson's correlations and similarity in distributions (KS test) based on the Walker field significance test between hydrologic extremes for downscaled reanalyses versus target gridded observation over the Peace basin for 1991–2005 (1991–2001 ERA40), summarized by gridded observation, reanalysis, and downscaling method. Max indicates the maximum possible tests to pass in that category.
Number of basins where the null hypothesis that the downscaled and observed (VIC Forcings and ANUSPLIN) derived 3-day peak flows are not linearly correlated was rejected and the number of basins where the null hypothesis that the downscaled and observed based distributions are drawn from the same sample was not rejected, by downscaling method/reanalysis combinations for 1991–2005 (1991–2001 ERA40).
Time series of average SDII from VIC Forcings (left) and ANUSPLIN (right) for NCEP1 (top), 20CR (second), ERA40 (third), and ERAInt (bottom) downscaled using BCCA, DBCCA, BCCI, CI, BCSD, BCSDX, and BCCAQ over the Peace River basin.
The BCCAQ method was the best overall performer for the three hydrologic extremes. It was the best method according to Pearson's correlation and tied for second place with DBCCA and BCCI, after BCSD and BCSDX, for the KS test. BCSD and BCSDX passed the fewest number of tests for correlation, while CI passed the fewest for distribution. In the case of ClimDEX, BCCAQ ranked third after BCCA and BCCI. The strength of the BCCAQ method when tested in terms of basin-wide hydrologic modelling and hydrologic extremes, rather than in terms of ClimDEX indices at individual grid cells, comes from the maintenance of daily spatial patterns resulting from the combination of BCCA and BCCI methods. Event-scale spatial gradients and magnitudes are preserved by reordering the BCCI outputs based on the rank order structure from BCCA. In effect, this removes the overly smooth representation of sub reanalysis-grid-scale variability from BCCI (Maraun, 2013) and largely corrects remnant biases in magnitude from BCCA (Guttmann et al., 2014). Spatial covariability is much more relevant in hydrologic modelling than the comparison of climate indices between products on a grid cell to grid cell basis. This method is also better at maintaining long-term trends, which might explain failed tests in some of the sub-basins when downscaling NCEP1 and 20CR, which, as shown earlier, exhibit inhomogeneities between calibration and validation periods. BCCAQ could be failing for the “right reason” when the trend in VIC Forcings or ANUSPLIN for a given metric is opposite that in NCEP1 or 20CR. BCCAQ is the only method to pass the Pearson correlation and KS test in all five sub-basins when downscaling ERA40 or ERAInt to VIC Forcings or ANUSPLIN for all three hydrologic extremes. BCCAQ has overcome some of the challenges of BCCA that Maurer et al. (2010) would not have been able to find using NCEP1 alone as surrogate GCM. It is also more successful than the BCCI method, which is analogous to the statistical downscaling and bias correction (SDBC) method in Ahmed et al. (2013) and asynchronous regression (AR) in Gutmann et al. (2014), by avoiding overestimates of extreme events at aggregate scales (Maraun, 2013).
Boxplots, time series, and distributions of 3-day peak flow in the spring months (May–July) for NCEP1, 20CR, ERA40, and ERAInt in the BCGMS basin based on VIC Forcings (top) and ANUSPLIN (bottom). Legend same as Fig. 9.
The BCSD methods pass the most tests for distribution for all basins and reanalyses, while they fail more tests than any other downscaling method for correlation due to their reliance on random sampling of historical months when temporally disaggregating from the monthly to daily time step (Table 6). Thus, these methods will get the frequency and magnitude of events correct, but will get the timing of when these events occur wrong. Again, including the minimum and maximum temperatures from the large-scale model (reanalysis) does not improve the number of tests passed with BCSDX versus BCSD. For 3-day peak flow (Table 11; Fig. 11) and 7-day low flow in summer (Table 10; Fig. 12) these methods pass the majority of tests for correlation. Very few tests are passed for correlation in 7-day low flow in winter (Table 12; Fig. 13). Winter low flows are challenging to monitor and to model. There could be ice on the river causing the stage–discharge relationships to be incorrect. Also, as mentioned, models are not parametrized or calibrated to best represent base flow. However, BCSD and BCSDX have more trouble than any of the other downscaling methods. Due to the resampling of daily events from the historical gridded observations there can be precipitation occurring in combination with temperatures warm enough to generate runoff (Fig. 14). This is because of the stochastic resampling of the historical precipitation, but is also related to temperature since runoff is occurring when conditions should be near freezing. Additionally, the random selection of months from the historical record can lead to large discontinuities across month boundaries, such as in December–January (Fig. 14). This is when it is important to get daily events from the GCM or reanalyses (e.g. as in the CI, BCCI, BCCA, DBCCA, and BCCAQ methods). As calibrated, the VIC model is known to have limited performance for low flows and additional errors were suspected to have been contributed by BCSD in downscaled 20C3M GCM results (Shrestha et al., 2014b). Some sharp spikes on the rising limb of the hydrograph suggest rain-on-snow events caused by the downscaling-driven results that are not displayed in the runs based on gridded observations. The CI method is the closest to the delta method that we have investigated. The median and ranges for CI are much lower for winter 7-day low flow (not shown). The poorer performance of the CI method for the KS test is due to the lack of quantile mapping bias correction in this method.
As in Table 10 but for summer 7-day low flow.
Time series of 7-day low flow in the summer months (July–September) for NCEP1, 20CR, ERA40, and ERAInt in the BCGMS basin based on VIC Forcings (top) and ANUSPLIN (bottom). Legend same as Fig. 9.
As in Table 10 but for winter 7-day low flow.
Time series of 7-day low flow in the winter months (November–April) for NCEP1, 20CR, ERA40, and ERAInt in the BCGMS basin based on VIC Forcings (top) and ANUSPLIN (bottom). Legend same as Fig. 9.
Time series of daily streamflow in the BCGMS basin as driven by ANUSPLIN (base) and ERA40 downscaled to ANUSPLIN with the BCCA, DBCCA, BCCI, CI, BCSD, BCSDX, and BCCAQ methods over 1991–2005.
We have tested the applicability of seven techniques for downscaling coarse-scale climate models in terms of ClimDEX indices and hydrologic extremes. The seven approaches investigated include several methods commonly used in hydrologic modelling. Some of these had been explored before (i.e. BCSD and BCCA), but not using multiple reanalyses. Choice of reanalysis was found to affect the number of tests passed for a given downscaling technique. Downscaling methods were more successful under ERA40 or ERAInt than they were under NCEP1 or 20CR. The quality of reanalyses and gridded observations changed over the calibration period due to changes in availability of satellite/radiosonde data and station observations. NCEP1, the reanalysis used as a surrogate GCM in many previous downscaling intercomparisons, had an obviously erroneous step change in temperature over the Peace River basin. Between the calibration and the validation period, changes in ClimDEX indices were greater for precipitation with VIC Forcings but greater for temperature with ANUSPLIN. Thus, trends in ClimDEX indices differed in these gridded observations. ANUSPLIN passed 5 % more tests than VIC Forcings, mostly for precipitation-related ClimDEX indices. Through this work we learned a lot about these gridded observations and discovered evaluation procedures that will be useful for future studies.
BCSDX, DBCCA, and BCCAQ downscaling methods had not been evaluated in terms of ClimDEX indices and hydrologic extremes before now. The BCSDX method included minimum and maximum temperature from the reanalyses instead of mean as is done in BCSD, but this did not improve its ability to resolve temperature indices, such as diurnal temperature range or hydrologic extremes. DBCCA was an improvement over BCCA and passed the greatest number of tests for the ClimDEX indices. The double bias correction proved capable of reducing some of the drizzle and remnant bias in precipitation amounts found in BCCA. The BCCAQ method, which combines BCCA and BCCI, performed well in terms of number of tests passed for the ClimDEX indices, but it really shone for use with modelling hydrologic extremes. In this context, it exceeded all other methods. BCCAQ provides a more accurate representation of event-scale spatial gradients, removing the overly smooth representation of sub reanalysis-grid-scale variability inherited from BCCI and correcting biases from BCCA. These attributes are important for simulating the climate events that occur over a basin that drive runoff. All methods passed correlation and distribution tests for 3-day peak flow and 7-day low flow in summer for the majority of sub-basins and reanalyses. BCSD and BCSDX failed all or most correlation tests and CI failed all or most distribution tests for 7-day low flow in winter. Based on results from this study, use of a daily downscaling method, such as BCCAQ, in conjunction with a rigorously constructed and validated observational data set, is recommended to supplement the existing hydrologic modelling efforts at PCIC and improve projections of hydrologic extremes.
We can build on this work to develop tools that predict changes to hydrologic extremes from changes in climate extremes without the direct application of a hydrologic model. Similar emulations have been made by drawing on the relationship between GCMs and hydrologic model projections (Schnorbus and Cannon, 2014) and by identifying relationships between GCMs and RCMs (Li et al., 2011). The next step is to identify which of the 26 ClimDEX indices are predictors of 3-day peak flow and 7-day low flow and avoid those downscaling methods that simulate them poorly.
We are grateful to three anonymous reviewers for providing valuable feedback.
We thank David Bronaugh for developing and making available the climdex.pcic
R package that expedited this work. Belaid Moa's assistance with the Compute
Canada/WestGrid/University Systems is also appreciated. Hailey Ekstrand
provided useful assistance with hypsometric information for the five
sub-basins of the Peace River basin upstream of Taylor. Thoughtful and
thorough reviews were provided by Markus Schnorbus and Francis Zwiers of PCIC
that greatly improved this work. We appreciate the easy access to reanalyses
provided by