Interactive comment on “ Assessing climate change impacts on daily streamflow in California : the utility of daily large-scale climate data ” by E

I am very grateful to the reviewers for taking the time for reviewing the manuscript and the constructive criticism which will lead to the improvement of the paper. This paper describes a new downscaling method that uses daily large scale GCM data instead of monthly GCM data. "The paper addresses a relevant scientific question" with interesting results (reviewer 2). "The result is a compact description of a study that provides new results of interest for HESS in the scope of hydrological climate impact studies" (reviewer 1). The manuscript is highly rated by the reviewers and all C1228


Introduction
As climate change science matures and is better able to estimate the regional magnitudes of potential climate change, estimates of local and regional impacts to the resources at risk are of increasing interest (IPCC, 2007a One common issue facing all regional assessments of climate change impacts is that the scale of general circulation model (GCM) outputs are at too spatially coarse a scale for direct use in impact models. Regional studies, such as those examining hydrologic impacts of climate change, thus rely on spatial downscaling to translate the large-scale climatic shifts projected by GCMs to scales more representative of local 5 areas of interest (Christensen et al., 2007).
The recent availability of large databases of raw GCM outputs in a consistent format (Meehl et al., 2007) has facilitated the use of multiple GCMs and greenhouse gas emissions scenarios in impact studies. The greatest value from studies of multiple GCM runs is that model-to-model, scenario-to-scenario, and even chaotic realization- 10 to-realization uncertainties in the physical response of the climate system to changing greenhouse gas concentrations, the primary sources of uncertainty in climate impacts analysis (Fowler and Ekström, 2009), can be quantified to some degree. Furthermore, the skill of a multimodel ensemble consistently outperforms any individual model for detection and attribution studies (Brekke et al., 2008;Gleckler et al., 2008;Pierce 15 et al., 2009). To consider many future projections of climate in a regional impacts study requires that the downscaling procedure be computationally very efficient. This generally limits these studies to using statistical downscaling techniques, where some large-scale signal is related statistically to local climate, as opposed to regional climate simulations, where a dynamical model of regional climate is used to simulate local Introduction  . When downscaling from GCM-scale climate simulations to regional scales to study hydrologic impacts, the most desirable downscaling methods have the ability to translate the local changes in climatic extreme events simulated by GCMs to the local scale needed by hydrological models . While dynamic downscaling, using a regional climate model (RCM) driven at the boundary by a GCM, has been used in the West-5 ern US to produce physically realistic projections of changes in hydrologic extremes (Kim et al., 2002;Snyder and Sloan, 2005), these types of models are still too computationally intensive to be applied to a large ensemble of GCM output to characterize uncertainties associated with inter-GCM variability and different emission scenarios. For this reason, more computationally efficient statistical downscaling approaches will 10 continue to serve as the methodological workhorse for downscaling ensembles of long climate simulations. Statistical methods, building statistical relationships between GCM-scale climate features and fine scale climate and applying those to future projections, have been more widely applied than dynamical model downscaling in studies of hydrologic impacts of 15 climate change over the western United States (Christensen and Lettenmaier, 2007;Maurer et al., 2007;Payne et al., 2004;Wood et al., 2004). In most applications the focus has been on monthly, seasonal or annual hydrologic changes and generally only monthly GCM output was used. Some efforts have used daily GCM output to study extremes in this region (e.g., Dettinger et al., 2004), though this approach has generally 20 been to downscale GCM output directly to specific weather stations. To characterize projected changes in both seasonal and extreme for larger watersheds or over continental areas, a downscaling method should have the ability to generate gridded fields of downscaled daily climate, to capture the spatial structure of climate features. To achieve this using daily GCM output was a motivation for the development of the con- 25 structed analogues (CA) approach .
In a prior effort (Maurer and Hidalgo, 2008) the CA approach was contrasted with the bias-correction/spatial disaggregation (BCSD) statistical downscaling approach, with each applied over the Western US for downscaling large-scale observationally-derived Interactive Discussion reanalysis data as a surrogate GCM. The methods takes different approaches to downscaling daily extreme precipitation and temperature. CA downscales each day's output from the GCM simulation, capturing projected changes in daily weather events that sum together to reflect long-term climate changes, while BCSD works with GCM monthly output, then randomly selects a month from the historical record and rescales 5 its daily precipitation and temperature to match the projected monthly values. Each has the ability to downscale to a gridded field over a wide region, maintaining spatial correlations of driving hydroclimatic conditions that drive hydrologic impacts.  found the BCSD method performed well when compared to several statistical and dynamic downscaling methods in the context of assessing hydrologic impacts. 10 The ability of the CA method to exhibit considerable skill of daily precipitation and temperature statistics has also been demonstrated . Both methods have been widely used in regional studies in the United States and globally (Barnett et al., 2008;Cayan et al., 2009;Das et al., 2009;Girvetz et al., 2009;Hayhoe et al., 2007;Maurer et al., 2009).
Both methods are capable, to some degree, of capturing projected changes in extremes. They have been shown to produce similar downscaling skill for many measures of temperature and precipitation extremes (Maurer and Hidalgo, 2008). In that study, both CA and BCSD exhibited limited skill, attributed to substantial large scale precipitation biases, for both wet and dry daily precipitation extremes and the difference Introduction 3. Are there opportunities to combine the best attributes of the methods to improve 5 downscaling performance?
Ultimately, the goal of this study is to address question 3, by identifying, testing, and developing an improved statistical downscaling method capable of skillfully downscaling extreme hydroclimate, while being applied at regional to continental scales. To do this we refine the prior analysis in Maurer and Hidalgo (2008) to evaluate how differences in the downscaling approaches propagate through the hydrologic system, and to determine whether improvements in downscaling methods, especially in the context of simulating hydrologic extremes, may be possible.

Methods and data
The approach for this study follows that of Maurer and Hidalgo (2008), in which National 15 Center of Environmental Prediction and the National Center of Atmospheric Research (NCEP/NCAR) reanalysis (Kalnay et al., 1996) was used as a surrogate for a General Circulation Model (GCM) output, as others have done (e.g., Widmann et al., 2003). The benefit of using reanalysis rather than GCM output is that biases will be expected to be lower, since atmospheric observations are assimilated in the reanalysis frame- 20 work. In addition. the data assimilation process produces year-to-year and day-to-day correspondence to observed climate and weather that an unconstrained GCM would not, making it more defensible to compare downscaling performance against observations. We downscale the reanalysis daily and monthly precipitation and temperature using different techniques, and use the downscaled data to drive a hydrology model. Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version Interactive Discussion The hydrologic model skill is evaluated by comparing these simulations to the hydrologic model output produced by driving the hydrology model directly with the gridded observed precipitation and temperature of Maurer et al. (2002). Results are compared using a 2-sample Kolmogorov-Smirnov (KS) test (Wilks, 2006) at a 0.05 significance level.

Reanalysis as a surrogate GCM
NCEP/NCAR reanalysis (Kalnay et al., 1996) data include daily and monthly precipitation and temperature on a T62 Gaussian grid (approximately 1.9 • square), a resolution comparable recent GCMs. Reanalysis is often held up as an example of the best possible historical GCM output (Reichler and Kim, 2008), which makes it appropriates for 10 use in this study, as the focus is on how downscaling approaches distinguish themselves in the presence of large-scale skill. As noted by Maurer and Hidalgo (2008), because reanalysis temperature is strongly connected to observations, the comparisons of temperature skill will reflect differences almost exclusively in the downscaling techniques. However, because precipitation observations are not assimilated into re-15 analysis estimates, the intercomparison will reflect differences between the downscaling methods, plus influences of the reanalysis precipitation biases and errors. The precipitation and temperature daily variability in the reanalysis has been shown to be realistic in many locations in the Western US (Widmann and Bretherton, 2000), and so the existence of skill in daily statistics of large-scale climate model output (in this case, 20 reanalysis) will be a major factor potentially distinguishing the downscaling methods compared in this study. Following Maurer and Hidalgo (2008), we divide the second half of the 20th century into two periods, with 1950-1976 representing "observations" used as the sample catalog from which model estimates are derived, and 1977-1999 "projections" for which 25 the model estimates are derived and verified upon. The later period exhibits small but statistically significant differences in both temperature and precipitation compared to the early period, with 1977-1999 being generally wetter and warmer over the study 1215 Introduction  (Mantua and Hare, 2002), there are also a changes in the sources of observations assimilated in the NCEP/NCAR reanalysis beginning in 1979 (Kistler et al., 2001). These differences provide the opportunity to assess the performance of the downscaling techniques un-5 der a climate that, while not dramatically different, is statistically significantly different.

Downscaling techniques
The two primary downscaling techniques used in this study are the constructed analogues (CA, Hidalgo et al., 2008;van den Dool, 1994) and bias correction and spatial downscaling (BCSD, Wood et al., 2004). These are described and contrasted in detail by Maurer and Hidalgo (2008). The most important distinction between the two methods is that by using daily reanalysis (or GCM) output CA retains the daily sequencing of weather events from the coarse resolution, while in BCSD only monthly reanalysis averages are used, with daily patterns reconstructed by randomly resampling a historic month and scaling its daily precipitation and temperature values to match the monthly 15 projected values. Where a climate model exhibits skill in simulating daily variability, CA would in theory be capable of capturing that skill, while BCSD would reflect historical intra-month variability. Thus, for daily statistics, the two methods will be expected to distinguish themselves only inasmuch as the large-scale climate model exhibits skill at the daily time scale. Another distinction between BCSD and CA has been observed in 20 areas near coasts and other areas with sharp climate gradients at a scale much finer than the large-scale climate model output begin downscaled. While BCSD reproduces climatological patters of precipitation and temperature, projected changes tend to be smooth spatially. CA by contrast captures changes in day-to-day variability, which can evolve differently than the large-scale forcing, and thus CA can produce sharper spatial 25 gradients of precipitation and temperature changes than BCSD. A second distinction between CA and BCSD that bears on the analysis that follows is that CA builds relationships between large-scale climate anomalies and fine-scale 1216 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version Interactive Discussion anomalies using gridded observations, and then applies those relationships to largescale reanalysis (or GCM) anomalies. BCSD first bias corrects the large scale monthly reanalysis data, using a quantile-mapping approach (Panofsky and Brier, 1968), so that for each month there is a statistical match (for the observed period) for all statistical moments to those of large-scale observations, and the bias-corrected monthly data 5 are then spatially downscaled. The implication of this is that while CA accounts for potential biases in the mean by using anomalies, higher order biases in reanalysis spatial or temporal variability deed directly into the CA downscaled results in ways that BCSD explicitly corrects and avoids. 10 To assess the ability to downscale to the watershed scale, daily downscaled meteorology is used to drive the variable infiltration capacity (VIC) hydrologic model (Cherkauer et al., 2003;Liang et al., 1994). VIC is a spatially distributed hydrology model that solves the energy and water budgets at the land surface. It has been widely applied in forecasting and climate change analyses on spatial scales ranging from watershed to continental areas (Abdulla et al., 1996;Maurer, 2007;Maurer and Lettenmaier, 2003;Nijssen et al., 1997;Wood et al., 2002). In this study, we apply the VIC model at the same resolution (1/8 degree, approximately 12 km) and with the same parameterization as was used in several prior studies of the area (Barnett et al., 2008;. The VIC model output is processed through a stream routing network following 20 Lohmann et al. (1996), which is used to generate simulated flow at the stream gauge locations listed in Table 1

Results and discussion
Large scale skill in reanalysis temperature data are well established, since observations of temperature are assimilated. This skill has been demonstrated for monthly data as well as for daily statistics. While precipitation is less well simulated in reanalysis, being model output rather than assimilated data, some skill is evident. We summarize 5 below the ability to recover fine scale precipitation and temperature statistics from the large-scale reanalysis, assess how the differences in downscaling skill affect hydrology, and develop a method for combining positive attributes of the two methods to improve downscaling skill. 10 Monthly and daily skill for downscaling precipitation and temperature using the two downscaling methods were analyzed in a prior study (Maurer and Hidalgo, 2008), which forms the basis for the current study. Monthly downscaling skills for CA and BCSD were found in that study to be comparable, as were their skill levels for daily extreme precipitation amounts (which was generally low for both methods, reflecting the 15 lack of skill in precipitation simulation at the large native reanalysis scale). However, CA demonstrated better skill, based on correlation and r 2 values, at some locations in downscaling some of the daily statistics, such as sequences of wet and dry days, and high and low temperature extremes, where the large-scale reanalysis data contain greater skill. 20 While correlations (and r 2 values) were higher for CA than for BCSD for some variables, correlation analysis is unable to pick up systematic biases in the large-scale data. For example, while Maurer and Hidalgo (2008) show comparable correlations with observations for both CA and BCSD downscaled reanalysis for monthly, daily, and extreme wet and dry precipitation amounts, Fig. 2  Interactive Discussion precipitation intensity, especially January in the Pacific Northwest (PNW) and the Sierra Nevada in California, two features emerge. Most notably, CA shows a large negative bias in precipitation intensity in California, and a positive bias in the PNW. This bias in downscaled CA precipitation intensity in regions with relatively low precipitation is similar to the well-documented "drizzle" bias typical in GCMs (Iorio et al., 5 2004;Mearns et al., 1995), where weak precipitation events are overly common. Figure 3 illustrates that while reanalysis produces average precipitation intensities (for a grid point over central California) that appear reasonable, the frequency of occurrence of events at the lowest intensities is oversimulated. Approximately 40% of the daily January observations (from Maurer et al. (2002) aggregated to the Reanalysis grid res-10 olution) show zero precipitation (Fig. 3, center panel, where the "OBS" line intersects the ordinate at a value of 0.4), while all days in the reanalysis have some precipitation (same panel, the dashed line never intersects the ordinate). At higher precipitation intensities there is a similar bias, with observed data indicating approximately 1% of daily values above 9 mm, while Reanalysis shows 4% of daily precipitation above this 15 level and 1% of daily precipitation above 16 mm.

Downscaling meteorology for assessing hydrologic impacts
By working with anomalies, CA effectively removes the biases in Reanalysis mean precipitation and mean temperatures. However, it is evident from the biases in precipitation intensity that, especially in light of our interest in hydrologic extremes, that accounting for mean biases at the large scale is inadequate. We introduce here a 20 third downscaling approach, by combining the initial large-scale bias correction step of BCSD prior to applying the CA method. We refer to this approach as BCCA.
The bias correction employed in BCCA is conceptually identical to that in BCSD, using the same quantile mapping approach. However, rather than apply this to monthly precipitation and temperature, the quantile mapping is used for all daily (precipitation, 25 maximum and minimum temperature) values within each month. For example, all daily precipitation observations (aggregated to the reanalysis spatial scale) for all Januarys in the 27-year "observed" period are lumped into one pool to create a distribution of daily precipitation observations for January of n=27·31=837 days. The pool of n daily HESSD in the reanalysis time series, where the precipitation value is converted to a quantile using the cumulative distribution for reanalysis, and that quantile is then drawn from the observed cumulative distribution to obtain a new, bias-corrected precipitation value for that day. For example, if reanalysis simulates a very small precipitation amount that is exceeded 90% of the time (for that month), and observations show 30% of the days with no precipitation, the small amount of reanalysis precipitation will be re-mapped to a value of zero. In this way, the bias-corrected daily data will match the observations for the number of rainy days and the average rainfall intensity (for the observed period). Finally, in BCCA since all biases are explicitly corrected, the constructed analogues are then developed on absolute values rather than anomalies, which contrasts with the use 15 of anomalies in the original CA. Since the biases in reanalysis temperatures are much smaller, in a relative sense, than precipitation biases, the discussion below focuses on bias correction of precipitation.
While the bias correction included with BCCA forces the cumulative distribution function to match observations for the historical (observed) period, some biases due to the 20 downscaling methods remain. Figure 2 shows the comparison of BCCA to observations, after downscaling to the 1/8 degree spatial resolution. The high bias in precipitation intensity in the PNW was successfully reduced by the bias correction process in BCCA, showing an improvement over CA. This indicates that large scale bias may have been the primary factor for bias in downscaled precipitation intensity in this area. 25 The bias toward underestimation of precipitation intensity with CA over California is not removed by the bias correction, suggesting that some bias is introduced in the CA method in this region. Although the underestimation by CA and BCCA in California is largest during the rainy season, January and March in Fig. 2,  large, is small relative to the mean observed intensity in these months (in the leftmost column of Fig. 2). While the daily bias correction ensures that the cumulative distribution of daily precipitation (or maximum and minimum temperature) values will exactly match the observed distribution for the all daily values for any month, it does not explicitly force the monthly 5 distributions to match. In other words, by assembling all January daily values for 1950-1976 for a reanalysis grid cell into a single cumulative distribution function (as in Fig. 3), the bias correction only guarantees that the entire set of daily values (for this example, 837 days) will match the statistics for the set of 837 days for the observations. There is no guarantee that the distribution of monthly precipitation values (for example, 27 Jan-10 uary average precipitation values) is also improved. However, as illustrated in Fig. 4, the monthly values are also largely corrected for their biases at all quantiles when the daily values are bias corrected. This indicates that the modeled precipitation variability in reanalysis at the daily scale within a month is consistent with observations, inasmuch as the monthly bias is largely addressed by the daily bias correction.

Impact of downscaling approaches on daily hydrology
Prior to analyzing daily metrics, we assessed the ability of each downscaling method to reproduce annual flow volumes for the projected 1977-1999 period at each gauge site listed in Table 1. For BCSD, three sites had distributions of annual flow volumes that differed from the annual flow volumes produced by the hydrologic model simulation 20 driven by observations. Similarly, CA differed at four of the stream gauge sites. BCCA, by contrast, produced a distribution of annual flow volumes that were indistinguishable from the observation-driven hydrologic model run, showing substantially improved downscaling skill even for annual measures of performance.
Three daily-scale streamflow metrics are evaluated in this study: center timing 25 (CT), 3-day peak flow, and 7-day low flow. Center timing is defined as in Stewart et al. (2005)  verification (or projection) period of 1977-1999 the metrics are calculated and then the results are assembled into distributions for each metric. These distributions are compared among downscaling methods and with the simulation using gridded 1/8 degree observations to drive the VIC model. Figure 5 shows the performance of the three downscaling techniques along with the 5 observations-based streamflow simulation for the CT statistic. Since CT in snowmeltdominated basins tends to be driven more by temperature than precipitation, the distribution of CTs simulated by CA are able to capture the skill in daily temperature present in the reanalysis (since temperature observations are assimilated in the reanalysis product, bias is relatively low). CA appears to perform better than BCSD at several locations, for example OROVI, NF AM, and LK MC. What this demonstrates is that there is skill in simulating CT at many sites with BCSD, which assumes the distribution of daily values within any month are statistically the same for the observed (or training) period of 1950-1976 as for the later projected period of 1977-1999. The CA method, by contrast, recognizes changes in the occurrence of large-scale climate patterns at 15 the daily scale, and produces downscaled daily values that reflect them, allowing the specific variations within months in each given year to change in the projected period, which results in improved skill at some locations. BCCA does not appear to differ greatly from CA at most locations, suggesting that the relatively low bias in reanalysis temperature causes the bias correction step to have a relatively small effect on this 20 temperature-driven statistic. It should be noted that the small (0.2 • C), but significant domain average temperature difference between the 1950-1976 and 1977-1999 periods is dwarfed by the large projections for later in the 21st century for this region  of up to 4.5 6•C. Thus, as the climate diverges from the historical record to a greater degree, it would be expected that the difference in skill between 25 BCSD and the analogue-based methods (CA and BCCA) could become more stark. Figure 6 shows the CT values for each water year for the "projected" period of 1977-1999 at the NF AM site for BCSD, CA, and BCCA relative to the observations-driven simulated CT values. This supports the observation in the prior paragraph, where the scale data into a downscaling technique. The first three columns of Table 2 summarize the KS test performed to determine whether the 22 simulated CT values using the three downscaling methods can be assumed to be drawn from the same distribution with 95% confidence. This verifies that CA outperforms BCSD, providing a statistically significant improvement at two locations 10 (NF AM and LK MC). BCCA is generally as good or better then CA, producing CT values with a distribution statistically indistinguishable from the CT values from the observationally driven hydrologic simulation at all sites. Figure 7 shows the results for the 3-day peak flow for distribution of values for each of the 22 water years from 1978-1999 at each site. The statistical test results for peak flows are in columns 4-6 in Table 2. In contrast to the CT measure, 3-day peak flow is much more highly driven by precipitation, which is less well represented in reanalysis, and thus would be expected to benefit from the bias correction to a greater degree than temperature driven phenomena. Figure 7 shows that CA has a tendency to somewhat underpredict peak flow at most locations. BCCA produces visibly better simulated 20 peak flow values than CA at many sites (e.g., NF AM, FOL I, DPR I, MILLE, KINGS), with BCCA distributions showing a closer match to observations than CA. Surprisingly, Table 2 shows peak flows derived using downscaled meteorology from all three techniques from are statistically indistinguishable from those driven by observations at all sites at 95% confidence, so while BCCA appears to be an improvement over CA, the Introduction

Conclusions
References Tables  Figures   Back  Close Full Screen / Esc

Printer-friendly Version
Interactive Discussion adequately, if not as well as possible, for supporting hydrologic skill of this peak flow statistic. Figure 8 illustrates the performance of BCCA and BCSD at one site relative to the hydrologic simulation using gridded observations, showing one wet year and one dry year. For the wet year both peak flows and low flows are captured relatively well, com-5 pared to the observations-driven simulation, for both BCCA and BCCA. BCCA shows the temporal correspondence to the simulation driven by observations, demonstrating that, even though the large scale reanalysis precipitation is numerical model output rather than assimilated observations and has well-known biases, the bias correction procedure employed here recovers the daily signal present in the observations. BCSD, 10 by design, has no correspondence to the sequencing in the daily observations-driven simulation. However, even with its random generation of daily sequences within any month, BCSD does produce numbers and magnitudes of peak flows that resemble the observations-driven peak flows. The flows during the dry year, show similar patters to the wet year, though one example of the shortcoming of selecting random daily 15 sequences in BCSD is seen in October-November, where BCSD shows too many smaller peak flows, whereas BCCA concentrates the flow on one larger peak event, better matching the observations-driven peak flow. The difficulty in matching the very low flows during May-June in the dry year by both downscaling procedures suggests this issue lies with the biases in the large-scale reanalysis that are not accommodated 20 by the bias correction procedure, since BCSD and BCCA use conceptually different spatial downscaling techniques and a similar bias appears with both methods.
Simulating 7-day low flows with downscaled meteorology is more problematic, as shown in Fig. 9 and columns 7-9 of Table 2. Several sites exhibit a distribution of low flows that are statistically different for both BCSD and CA downscaling approaches 25 from low flows simulated using gridded observed meteorology. As with peak flows, CA appears to have a tendency to produce low flows that are lower than observed at many sites. While BCSD produces reasonable values at some sites, low flows are overpredicted in some locations, especially apparent at NF AM and FOL I. BCCA, by HESSD Introduction

Conclusions
References Tables  Figures   Back  Close Full Screen / Esc

Printer-friendly Version
Interactive Discussion contrast, appears to produce low flows values that are closer to those produced by the observationally-driven simulation. Table 2 bears these observations out, showing that at two sites BCSD produces low flows different from observations, and at four sites CA produces different values from observations, with high statistical confidence. For the low flow distribution, BCCA is again statistically indistinguishable from observationally-5 driven hydrology at all sites. It is evident that the choice of downscaling method may influence results more for low flows than for other measures of streamflow. A factor contributing to this may be the relatively greater reanalysis skill (lower biases compared to reanalysis precipitation) for daily temperature, allowing the bias correction to have a greater effect. Since low flows would be affected by evapotranspiration more so than 10 peak flows, a better representation of daily temperatures, more closely resembling observations, would improve skill for the BCCA method. As a postscript, the improvement seen in applying the bias correction to large-scale daily forcing data begged the question of whether a post-downscaling bias correction, applied using the same quantile mapping approach at the 1/8 degree spatial scale, 15 could provide additional improvement in simulated hydrology. We conducted this experiment using both the BCCA and the BCSD downscaled meteorology, performing quantile mapping bias correction of daily precipitation, and maximum and minimum temperatures, again using 1950-1976 as the "observed" period and 1977-1999 as "projections." We found no consistent improvement in the simulated hydrologic mea-

Summary and conclusions
We statistically downscaled NCEP/NCAR reanalysis precipitation and temperature over the Western US using three different methods and drove a hydrology model with the resulting sets of downscaled meteorology. The historic record was divided into an "observed" period of 1950-1976 and "projections" from 1977-1999. Streamflow was 5 estimated at 11 sites across California, and these were analyzed to determine the ability to estimate three streamflow statistics important to hydrology: seasonal timing, peak flow, and low flow. One method, BCSD, uses monthly large-scale output, and rescales a historic month to estimate daily variability within each month. A second method, CA, uses daily large-scale output to downscale daily precipitation and temperature to a 1/8 10 degree grid. A new hybrid, the third method, BCCA, combined the bias correction step of BCSD and the daily downscaling of CA.
We found that daily large scale skill can be effectively downscaled from the large scale to the regional scale to simulate these streamflow statistics. Reanalysis assimilates daily temperature observations, and thus has some large-scale skill for tempera- 15 ture, though reanalysis precipitation is solely model output and is prone to substantial biases. The timing of the annual hydrograph was captured by all downscaling methods at most locations, though the hybrid BCCA method was the only one to perform well at all sites. For downscaling meteorology to generate extreme peak flows (3-day annual peaks), all methods performed well at all sites. The annual flow volume was 20 reproduced with better skill by the hybrid BCCA method than either the BCSD or CA methods, showing that the improvement with the BCCA method is also evident at temporal scales longer than daily.
Low flows were more difficult to capture with the downscaled data. While most of the streamflow sites included in our study had low flows simulated with downscaled data 25 that were statistically indistinguishable from those derived when driving the hydrology model with observations, BCSD and CA had shortcomings. As with the seasonal flow timing statistic the BCCA method outperformed both the BCSD and the CA methods, statistically matching observationally-driven low flows at all sites. In summary, to downscale large-scale climate data to generate estimates of extreme hydrologic events, downscaling daily large-scale output can provide measurable improvements in regional hydrologic skill, exceeding that of simply assuming that variability within a month will be similar to historical variability. However, without a bias correction step to correct large-scale biases (which can only be expected to be worse 5 in free-running GCMs than in the data-assimilation constrained reanalysis model outputs), the skillful signal in the daily data was less likely to be exhibited in the downscaled data and the resulting hydrology. The bias correction step, applied to daily large-scale meteorology prior to downscaling, produced some significant improvements in skill in simulating hydrologic extremes. The biases exhibited at the large scale are in both mean and variability, thus working with anomalies (as in the CA method) is not adequate to compensate for large scale biases, but the quantile mapping approach used in BCCA appears more promising.     Figure 9. Same as Figure 7, but for the 7-day low flow. As in Figure 7, note the y-axes have 2 different scales for each panel. 3 Fig. 9. Same as Fig. 7, but for the 7-day low flow. As in Fig. 7, note the y-axes have different scales for each panel.