A Large-Scale, High-Resolution Hydrological Model Parameter Data Set for Climate Change Impact Assessment for the Conterminous US

To extend geographical coverage, refine spatial resolution, and improve modeling efficiency, a computationand data-intensive effort was conducted to organize a comprehensive hydrologic data set with post-calibrated model parameters for hydro-climate impact assessment. Several key inputs for hydrologic simulation – including meteorologic forcings, soil, land class, vegetation, and elevation – were collected from multiple best-available data sources and organized for 2107 hydrologic subbasins (8-digit hydrologic units, HUC8s) in the conterminous US at refined 1/24 ◦ (∼ 4 km) spatial resolution. Using high-performance computing for intensive model calibration, a high-resolution parameter data set was prepared for the macro-scale variable infiltration capacity (VIC) hydrologic model. The VIC simulation was driven by Daymet daily meteorological forcing and was calibrated against US Geological Survey (USGS) WaterWatch monthly runoff observations for each HUC8. The results showed that this new parameter data set may help reasonably simulate runoff at most US HUC8 subbasins. Based on this exhaustive calibration effort, it is now possible to accurately estimate the resources required for further model improvement across the entire conterminous US. We anticipate that through this hydrologic parameter data set, the repeated effort of fundamental data processing can be lessened, so that research efforts can emphasize the more challenging task of assessing climate change impacts. The pre-organized model parameter data set will be provided to interested parties to support further hydro-climate impact assessment.


Introduction
With the advance of high-performance computing and more abundant historical observation data, hydrologists and water resource engineers are better equipped to improve the scale, resolution, and accuracy of hydrologic simulation. Depending on the need, a suitable hydrologic model may range from a statistical model (e.g. artificial neural network, ANN) or a conceptual model (e.g. bucket model) to a more sophisticated model that can simulate a series of hydrological processes based on physical mechanisms. However, although a statistical model can generally simulate hydrologic variables well with fewer predictors, the assumption of stationarity may be questionable in a changing environment in which many hydrologic processes are expected to be disrupted (Milly et al., 2008). Under such conditions, historical relationships may not provide fully accurate information about future streamflow and water availability. One example is ANN (and the various related machine learning algorithms). Although these types of advanced statistical methods are extremely powerful in forecasting reservoir outflows with minimal observation, the physical relationships among various predictors cannot be interpreted (Govindaraju and Rao, 2000); this hinders their direct extension across different locations and climate patterns. Therefore, these methods may not be suitable choices for climate-related research.
Unlike statistical models, process-based models are theoretically justifiable for climate research. Since most hydrologic mechanisms are simulated through deterministic laws, the assumption of stationarity is a lesser issue. However, a high number of observations and parameters is required to drive process-based models. For instance, a distributed Published by Copernicus Publications on behalf of the European Geosciences Union.
L@ cp:J rainfall-runoff model may require fine-resolution inputs of vegetation, precipitation, temperature, solar radiation, topography, and many other soil properties to simulate various hydrologic processes such as evapotranspiration, infiltration, vegetation root absorption, snowmelt, and runoff generation. With increasing complexity in model resolution, scale, and processes, the required computational resources also increase exponentially. As a result, it is generally more challenging to conduct a process-based hydrologic simulation for a large study area with fine spatial resolution. A tradeoff usually must be made between scale and resolution because of resource limitations.
Numerous studies have investigated the hydrological impacts of climate change in the US using process-based models (Mote et al., 2005;Christensen et al., 2004;Payne et al., 2004;Maurer et al., 2002;McCabe and Hay, 1995;Hamlet and Lettenmaier, 1999;Wolock and McCabe, 1999;Ashfaq et al., 2010). Output from global climate models (GCMs) is usually downscaled, bias corrected, and used in conjunction with hydrologic models to assess future water availability. However, owing to resource limitations, many hydro-climate impact assessments either are focused on smaller US regions or provide lower spatial resolution (Christensen et al., 2004;Payne et al., 2004;Maurer et al., 2002;McCabe and Hay, 1995). Repeated efforts may be needed for fundamental data processing and model calibration, and these may unavoidably reduce the amount of computing time available for the more challenging task of assessing climate change impacts. To extend geographical coverage, refine spatial resolution, and make hydro-climate impact assessment more efficient, a comprehensive set of calibrated physical parameters is desired that can provide the most up-to-date, highresolution watershed soil, vegetation, elevation, and other hydrologic characteristics. If a fine-resolution hydrological model parameter data set could be pre-organized, generally calibrated, and constantly updated, it would enable numerous researchers to easily extend hydro-climate impact assessment efforts to different watersheds.
One major structural difference between climate and hydrologic models is their respective requirements for vertical and horizontal resolution. Although spatial resolution is relatively less critical for current GCMs, they do need more vertical layers to better simulate the boundary layer and tropospheric processes that govern complex land-atmosphereocean interactions at varying timescales. However, horizontal resolution is the dominant factor for hydrologic models, since complicated topography and heterogeneous land surface characteristics have the greatest impacts on model performance. Previous studies have shown that runoff is sensitive to spatial variations in soil properties, precipitation inputs, and topography (Haddelenad et al., 2002;Nijssen et al., 2001;Sharif et al., 2007;Dooge and Bruen, 1997;Merz and Plate, 1997;Shah et al., 1996;Wolock and Price, 1994) and finer resolution data input would result in better model performance in comparing simulated and observed discharge (Shrestha et al., 2006). Additionally, fine-resolution hydrological model parameter data sets are needed to explore tasks such as identifying climatic controls on the spatial variability of hydrologic parameters; performing statistical analysis between climatic variables and land surface processes and states; and developing subgrid parameterization approaches for hydrologic parameters, particularly in wet and dry conditions. Therefore, to enhance the performance of hydrologic simulations for climate change impact assessment, one would need to simultaneously improve both spatial resolution (for hydrologic research needs) and geographical coverage (for climate change research needs). Maurer et al. (2002) has developed a long-term hydrologically consistent data set for the conterminous US at 1/8 • (∼ 12 km) spatial resolution for water and energy budget studies. The data set was recently extended by Livneh and Lettenmaier (2013) for the time period of 1915-2011 with a refined 1/16 • (∼ 6 km) resolution. With the continuous improvement of spatial resolution through regional climate models (e.g. North American Regional Climate Change Assessment Program and Coordinated Regional Climate Downscaling Experiment), refined climate projections will soon become available. A corresponding enhancement of hydrologic models is hence required for detailed hydro-climate impact assessment.
Given the motivation, a data-and computation-intensive effort was performed in this study. A widely used hydrologic model, variable infiltration capacity (VIC), was chosen as the baseline hydrologic model in this initial effort. Using multiple state-of-the-art geospatial data sets for the conterminous US, various key hydrologic model inputs -including topography, soil characteristics, vegetation, land surface classification, meteorological forcing, and runoff observation -were organized in a nationally consistent 1/24 • (∼ 4 km) grid and grouped by the US Geological Survey (USGS) 8digit hydrologic unit code (HUC8 or subbasin; Seaber et al., 1987). Since precipitation and other hydrologic variables are highly resolution dependent (Gao et al., 2006), the refinement from the commonly used 1/8 • (∼ 12 km) resolution (e.g. Maurer et al., 2002;Mitchell et al., 2004) will allow more flexibility in characterizing detailed hydrologic responses, particularly for extreme hydrologic events. In addition, to spatially increase the model accuracy across different geographical locations, each HUC8 subbasin was calibrated independently through a computationally intensive exercise. By using a multi-site (watershed) calibration approach, we hope that the model performance can be consistently improved across various subbasins. This approach of refining model performance uniformly across all HUC8 subbasins is new and different from previous studies (e.g. Maurer et al., 2002). A parallelization scheme was developed for each of the 2107 HUC8 subbasins to greatly reduce the required computation time. Using high-performance supercomputing, an exhaustive model calibration exercise was then performed to improve the model accuracy across the conterminous US. The calibrated model was evaluated using USGS observed runoff, and the evaluation showed that the parameter data set results in satisfactory performance for most US HUC8 subbasins. In Sect. 2, the data source and methodology used to develop the parameter data sets are described. The current model performance is presented in Sect. 3, and a summary and conclusions are provided in Sect. 4.

Study area and assessment grids
The effect of spatial resolution on hydrologic modeling is illustrated in Fig. 1 for the Ashley-Brush Subbasin in Utah (HUC8 ID 14060002). Based on the 1/3 arcsec-resolution (∼ 10 m) USGS national elevation dataset (NED;Gesch et al., 2002), the average elevation was computed for both 1/24 • (∼ 4 km) and 1/8 • (∼ 12 km) grids. Flow direction and flow accumulation were then derived using standard geographical information system (GIS) software. Grids with flow accumulation values of greater than 20 in Fig. 1a and greater than 2 in Fig. 1b are marked in grey, illustrating the locations of rivers seen by computer models. Three USGS gauge stations (09261700, 09263500, and 09271550) are also marked in Fig. 1. Whereas the 1/24 • grids can capture two streams, the 1/8 • grids fail to depict the system. Without additional information, the hydrological model might consider both outlets (09263500 and 09271550) as being located on the same river, which would directly affect the model calibration and validation. In addition, most of the 1/8 • grids flow to the neighboring subbasin instead of to the two outlets because of the insufficient spatial resolution for this watershed.
Therefore, to refine hydro-climate assessment from the regional to the watershed scale, spatial resolution is key. After evaluating the available resources, it was decided to select 4 km as the targeted resolution, which would require 9 times the computational resources needed for the commonly used 12 km grids, plus some additional effort for large data set management and quality control. The 4 km grids selected herein in fact follow the same configuration as the Parameterelevation Regressions on Independent Slopes Model meteorological data sets (PRISM; Daly et al., 2002), with the north boundary extending to 53 • N to cover the entire Columbia River basin on the Canadian side. Since PRISM is recognized as the most accurate grid-based monthly meteorological observation of precipitation and temperature based on ground-based meteorological stations, it is convenient to use the same grid configuration for future model evaluation and comparison. Each 4 km grid was given a unique identifier and further labeled with USGS HUC8 IDs. While our focus in this paper is on the conterminous US, some headwater basins in Canada (e.g. British Columbia) and Mexico also contribute to the downstream hydrology and should eventually be considered. As a placeholder, the watershed boundary outside the conterminous US was obtained from the USGS National Hydrography Dataset Plus, version 1 (EPA/USGS, 2010). When the forcing and parameter data are available, they are also organized in a consistent format. Further simulation and calibration for the non-US region will be conducted in future studies. Overall, there are ∼ 480 000 grid points and 2107 HUC8s in 18 hydrologic regions (HUC2) in the conterminous US. HUC2 and HUC8 are illustrated in Fig. 2.

Meteorological forcing
"Meteorological forcing" refers to the required meteorological inputs for hydrologic modeling, including precipitation, temperature, wind speed, and others. Conventionally, these values are looked up from surface weather observation stations (e.g. National Weather Service Cooperative Observer Program) and then spatially interpolated for further hydrologic application. However, given the heavy data processing requirements, such an approach is not applicable for largescale hydrologic simulation; therefore, pre-processed gridbased observations are primarily used. Several meteorological forcing data sets are commonly used in hydrologic studies for the conterminous US. These data sets are either fully based on observation or partially assimilated through weather forecasting models. Four were selected for consideration in this study: PRISM, Maurer, Daymet, and North American Regional Reanalysis (NARR). PRISM provides accurate grid-based observations of precipitation and temperature by considering topographical effects and some other adjustment factors (Daly et al., 2002(Daly et al., , 2008. It is available on a monthly timescale from 1895 to the present and provides 1/24 • (∼ 4 km) spatial resolution (http://www.prism.oregonstate.edu/). Maurer et al. (2002) is a widely used forcing data set for hydrologic studies . It is based on observation and is available on a daily timescale from 1950 to 2010 in 1/8 • (∼ 12 km) spatial resolution. Targeted for fine-scale ecological studies, Daymet (Thornton et al., 1997) is another commonly used meteorological data set based on groundbased meteorological stations (White et al., 2006;Keane et al., 2008;Manter et al., 2005). Daymet is available on a daily timescale from 1980 to the present at a projected 1 km spatial resolution (http://daymet.ornl.gov/dataaccess). NARR (Mesinger et al., 2006) is an assimilated meteorological reanalysis data set that provides a complete set of meteorological variables (e.g. pressure, wind). It is available on a 3 h timescale from 1979 to the present at a 36 km horizontal grid spacing. Some other new meteorological forcing data sets are also available (e.g. Abatzoglou, 2013).
To compare differences, the four selected meteorological forcing data sets were processed to a consistent format on the 4 km grids described in Sect. 2.1. Spatial interpolation was performed for both Maurer and NARR from 1/8 • (∼ 12 km) and 36 km to 1/24 • (∼ 4 km) for direct comparison. The 3 h wind speed from the lowest layer in NARR was used to calculate mean daily wind speed for comparison with the wind speed provided by Maurer et al. (2002).
Spatial aggregation was performed for Daymet to gather information from 1 km to 1/24 • (∼ 4 km) horizontal grid spacing. Overall, daily precipitation and minimum and maximum temperatures are available from 1980 to 2010 for Maurer, Daymet, and NARR; but daily wind speed is available only for Maurer and NARR. An overall comparison is presented in Sect. 3.1. Methodologically speaking, aggregation is considered to be more justifiable than interpolation since it does not result in information loss. The direct interpolation of Maurer and NARR to 1/24 • (∼ 4 km) resolution, while sufficient for the purpose of comparison at the HUC8 scale in Sect. 3.1, is questionable for 1/24 • (∼ 4 km) simulation unless other factors such as topography and elevation are also considered during interpolation. Since the main purpose of this study is not to create a new forcing data set, Daymet was considered to be the most appropriate choice for 1/24 • (∼ 4 km) simulation given its finer 1 km spatial resolution. Daymet may be further improved through monthly scale bias adjustment (e.g. Ashfaq et al., 2010), but that was not done in the current study. For non-US regions in which Daymet is unavailable, NARR information was used.

Soil parameters
Soil parameters are mainly used to describe the process of infiltration and base flow generation in hydrologic modeling. Given their heterogeneous nature and the lack of an effective remote sensing method, soil parameters remain the most uncertain of all parameters. Intensive hydrological model calibration is usually performed on soil parameters to improve overall model performance. In this initial effort, the conterminous US soil data set (CONUS-SOIL; Miller and White, 1998) was used to provide the required soil information for hydrologic modeling. CONUS-SOIL was derived from the State Soil Geographic data set (Schwarz and Alexander, 1995). It provides commonly used soil characteristics, such --as texture, bulk density, and porosity, arranged in 11 standard layers ranging from 0 to 2.5 m in depth and is specifically aimed at hydro-climate applications. The CONUS-SOIL data set is available in 1 km spatial resolution and is provided in common GIS formats (e.g. raster or polygon). Each CONUS-SOIL grid is spatially joined to the 4 km grids (described in Sect. 2.1) so that the required soil characteristics can be summarized efficiently for further hydrologic application. Future effort will be invested in collecting soil characteristics for non-US regions that are not covered by CONUS-SOIL.

Vegetation parameters
In considering the water budget for a large study domain, uptake and evapotranspiration from vegetation is a critical factor, since it has a significant influence on the seasonality of the simulated hydrology. Because of the rapid improvement in remote sensing data over the past decade, historical surface vegetation can now be more effectively captured to support large-scale hydro-climate simulation.
In this study, both the University of Maryland (UMD) land cover classification (Hansen et al., 2000) and the National Aeronautics and Space Administration Moderate Resolution Imaging Spectroradiometer (MODIS) model 15A2 leaf area index (LAI) information were imported. The UMD land cover classification consists of 14 categories of vegetation (e.g. evergreen needleleaf forest, mixed forest) and is available at 1 km spatial resolution in GeoTIFF format. To link the 1 km land cover classification to the 4 km grids used in this study, a conversion table for these two grid systems was developed. Both grids were first converted to polygons and then spatially intersected to form a massive table that contains the overlapping surface area of each spatial unit and the unique identifiers from both data sets. This massive conversion table was then used to summarize the portion of the UMD land cover classification in each 4 km grid efficiently.
To capture the seasonal pattern of surface vegetation, the MODIS15A2 LAI was included. LAI, defined as the green leaf area per unit of ground area (leaf area/ground area), is a widely used dimensionless canopy index. Depending on the type of vegetation, LAI may show significant seasonal trends. The historical time series of LAI were constructed using MODIS remote sensing data. The MODIS15A2 information is available every 8 days and is stored in HDF format at approximately 1 km spatial resolution in sinusoidal projection. For consistency with the UMD land classification, the MODIS LAI values were spatially interpolated to the UMD 1 km grids. The interpolated 8 day LAI values were then aggregated for each month and linked to the 4 km grids via the same conversion table developed for the UMD grids. Overall, the monthly LAI time series are organized from 2003 to the present and can be used to support various hydrologic applications. The overall statistics are summarized in Sect. 3.2.

Elevation and topography
Elevation and topography have a significant influence on surface hydrology. For instance, slope directly affects flow velocity, and local topographical depressions may create impoundments and delay surface runoff. Snow accumulation and snowmelt are also closely related to elevation. To refine the spatial resolution of a hydro-climate assessment, a fine-resolution elevation data set is needed. In this study, the 1/3 arcsec-resolution (∼ 10 m) USGS NED (Gesch et al., 2002) was used for the conterminous US. NED is a seamless data set with the best available raster elevation data in the US. Similar to the treatment of UMD grids, each NED grid was labeled with a unique 4 km grid identifier. Average elevation, average slope, and the histogram of the elevation at each 4 km grid were then computed. For regions outside the US, the 90 m Shuttle Radar Topography Mission elevation was used instead (Farr et al., 2007). The pre-organized information can be processed efficiently for further applications.

Observed streamflow and runoff
Historical hydrologic observations are required for model calibration and validation. Two types of observations, streamflow and runoff, can be used to support hydrologic model calibration. Gauge-based streamflow observations directly measure flow discharge at a specific river section. Comprehensive daily flow observations can be obtained from the USGS National Water Information System (NWIS) for more than 22 000 current and retired gauge stations throughout the US.
Another observational product, the USGS WaterWatch runoff (Brakebill et al., 2011), was found to be more useful in this study. Derived from the comprehensive NWIS gauge observation, WaterWatch runoff is the assimilated time series of flow per unit of area calculated for each conterminous HUC8 subbasin. For each HUC8 subbasin, multiple NWIS gauge stations located within or downstream of the HUC8 were used to estimate the runoff generated locally at each HUC8. The contributing drainage areas (both gaugeto-HUC8 and HUC8-to-gauge) were converted as weighting factors to merge runoff time series from all stations. As a result, gauges with drainage coverage most similar to that of the particular HUC8 received the highest weights. Therefore, the influence of highly regulated gauge stations (usually with large drainage coverage across multiple HUC8s) could be reduced. This approach may effectively assimilate streamflow observations from multiple gauge stations as a consistent areal HUC8 runoff measurement that has a unit similar to that for precipitation (depth/time). WaterWatch runoff is available monthly from 1901 to the present. Note that WaterWatch runoff is based on an earlier version of watershed boundaries and was found to be slightly different from the new watershed boundaries adopted in Sect. 2.1. Using the polygon shapefiles from both versions of the watershed 72 A. A. Oubeidillah et al.: A large-scale, high-resolution hydrological model parameter data set boundaries, a conversion table based on overlapping drainage areas was developed to adjust the WaterWatch runoff to a consistent watershed boundary for further comparison.
Although WaterWatch runoff is a promising new data set for hydro-climate studies and has been found useful in Ashfaq et al. (2013) and Sale et al. (2012), there are some questions and limitations that deserve further exploration. For instance, for HUC8 subbasins with very few gauge stations (e.g. arid regions in the central US), data accuracy and quality would be an issue. In addition, since the exact weighting factors for each station are not published by USGS, it would be difficult to examine and further quantify the influence of highly regulated stations (e.g. reservoir or diversion) in WaterWatch runoff. Nevertheless, for the purposes of initial model development and calibration, WaterWatch should be an ideal choice given its consistent format across various subbasins. Future detailed studies on the accuracy of Water-Watch runoff would be highly useful.

Baseline hydrologic model application
In this study, the widely used VIC model (Nijssen et al., 1997;Liang et al., 1994Liang et al., , 1996Cherkauer et al., 2002) was selected as the baseline hydrologic model for the conterminous US. VIC has been successfully tested in a many hydrologic studies and large river networks (Gao et al., 2010;Ashfaq et al., 2010;Su et al., 2005;Nijssen et al., 1997Nijssen et al., , 2001Bowling et al., 2004;Lohmann et al., 1998). The current VIC model studies were mostly conducted at 1/8 • spatial resolution (∼ 12 km). Given its wide acceptance and the fact that it can be directly implemented for parallel computing, the VIC model was considered the most suitable baseline hydrologic model for this initial effort. VIC is a process-based hydrological model that simulates evapotranspiration, snow pack, surface runoff, base flow, and other hydrologic mechanisms at daily or subdaily time steps. The water and energy balances are solved for multiple elevation bands and vegetation types, which allows the model to capture the subgrid-scale variability of these land surface features. The model simulates all processes in each grid cell independently in an equally spaced grid. Infiltration and runoff are estimated using the VIC model curve, which uses the soil moisture content of the upper two soil layers to approximate the spatial variability of surface saturation. The empirical Arno curve is used to generate base flow based on the soil moisture content in the bottom layer (Cherkauer and Lettenmaier, 2003). A routing algorithm external to the VIC model can then be used to simulate the streamflow at a specified location by routing runoff and base flow from each grid cell (Lohmann et al., 1998).
VIC requires a large number of parameters, including soil, vegetation, elevation, and daily meteorological forcings, at each grid cell. By taking daily precipitation, maximum/minimum temperature, and wind speed as inputs, VIC computes potential evapotranspiration using the Penman-Monteith equation (see Maidment, 1993). Other forcings, including shortwave and long-wave radiation, relative humidity, and vapor pressure, are estimated using algorithms from MTCLIM (Kimball et al., 1997;Thornton and Running, 1999) at subdaily time steps (3 h). Additionally, subdaily temperatures are estimated within the model as a parameterization of maximum/minimum temperature (Bohn et al., 2013).
For soil physical properties, the CONUS-SOIL information was divided into three layers covering the total depth from 0 to 2.5 m (CONUS-SOIL layers 1 and 2 to VIC model layer 1, CONUS-SOIL layers 3-7 to VIC model layer 2, and CONUS-SOIL layers 8 and 9 to VIC model layer 3). The 1 km CONUS-SOIL information was then aggregated to 4 km grids. When a three-layer configuration was used, a total of 53 soil parameters was required, including saturated hydrologic conductivity, initial soil moisture, bulk density, layer thickness, fraction of soil moisture at wilting point, and some other conceptualized parameters such as the variable infiltration curve parameter (b infilt ), which required further calibration. Although CONUS-SOIL contains a number of different soil characteristics, only a few are directly called out in the VIC model soil parameter file (e.g. bulk density). Most VIC soil parameters are derived from porosity and soil texture class according to a standard index table provided by the VIC modeling group. A few other non-soil parameters requested in the soil parameter file -such as the average annual air temperature, average annual precipitation, average elevation, and slope (used to derive the maximum velocity of the base flow) -were derived from Daymet and NED. If the CONUS-SOIL information was totally unavailable for a specific grid point, the information from the nearest grid point was used instead.
The VIC model vegetation parameter file describes the number and percentage of vegetation types in each grid cell. The conversion table described in Sect. 2.4 was used to efficiently summarize the UMD vegetation classification to the format requested by VIC. To improve the characterization of surface vegetation, a data-intensive enhancement was applied to import the monthly LAI observation from MODIS at each 4 km grid. The monthly LAI was first computed from the 2003-2008 MODIS at each 1 km UMD grid and then converted to the subgrid vegetation information in the VIC model format. Although the time frame between UMD (before 2000) and MODIS LAI (2003 is somewhat inconsistent, given that there is no suitable alternative, both data sets are considered to be the best proximity for the actual vegetation class and LAI. To better represent snow accumulation and snowmelt, subgrid elevation band information can be set up (i.e. fractions of the grid area with their corresponding mean elevations). Since NED provides a much finer spatial resolution (10 m) than the 4 km grid, the subgrid elevation band can be described in considerable detail. However, the elevation band should also be considered in terms of the required computation resources. It was found that the required

Calibration through high-performance computing
Although the process-based models incorporated various explicit physical mechanisms, given the complexity of hydrologic phenomena, parts of the processes still relied on conceptual statistical parameterization. As a result, several nonphysically based parameters required further calibration before a model could be put to use. Calibration was also required for those parameters with high measurement uncertainty (e.g. most of the soil parameters), since they may affect the performance of hydrologic modeling significantly. Nevertheless, a full hydrologic calibration is extremely resource intensive, especially when the model application requires fine resolution and sophisticated mechanisms. As recommended by other studies, calibrating the model at multiple locations at a small scale may improve streamflow predication (Chien et al., 2013;Zhang et al., 2008;Wang et al., 2012;Jetten et al., 2003). For example, Jetten et al. (2003) conclude that predicting the hydrologic response at a single watershed outlet may result in the phenomenon of "predicting the correct result for the wrong reasons" and suggested using multisite calibration within a watershed to reduce the possibility of accurate simulation at a single watershed outlet from a combination of locally inaccurate simulations. For this reason, we performed a first-order modeling calibration for each HUC8 consistently through a computationally exhaustive algorithm to improve the overall model performance. This large-scale calibration was mainly targeted at narrowing the possible range of suitable parameter values in each HUC8. Depending on the needs of future research, further fine calibration can be performed efficiently. Following the sensitivity analysis by Demaria et al. (2007), five sensitive VIC parameters were selected for calibration, including the variable infiltration curve parameter (b infilt ), exponent of the Brooks-Corey drainage equation (exp), thickness of soil layer 2 (thick 2 ), fraction of the maximum velocity of base flow at which nonlinear base flow begins (Ds), and fraction of maximum soil moisture at which nonlinear base flow occurs (Ws). Although other VIC parameters could also be important (e.g. thickness of soil layer 3, thick 3 ), they were not considered in the current effort given the computational resource limitations. Although the soil parameters were obtained with a pre-specified soil depth, the thickness of soil layer 2 (root layer) was treated as a parameter and can be changed during calibration. Given that the USGS Wa-terWatch runoff can provide an estimate of local runoff at each HUC8, our calibration was performed by matching the simulated total monthly runoff (base flow + surface runoff) to the observed WaterWatch monthly runoff. In other words, the offline routing model was not used, a decision similar to that of Demaria et al. (2007). For each parameter, three combinations, including the upper and lower bounds and the suggested default values, were chosen for calibration (Table 1). A total of 243 (3 5 ) parameter scenarios were then prepared. The VIC model simulation was driven by Daymet daily meteorological forcing (precipitation and minimum/maximum temperature) and NARR daily wind speed from 1980 to 2008. Year 1980 was treated as the model startup period, 1981-2000 as the calibration period, and 2001-2008 as the validation period. The VIC model simulation was performed in 3 h time steps using the energy and water balance mode. VIC version 4.1.1 was used in the current study, and the recently released VIC 4.1.2 will be incorporated after the model development work has been stabilized.
To effectively manage the data flow, all forcing, soil, vegetation, global parameter, and output flux files were organized in separate HUC8 folders. Depending on the total watershed area, all grid points within a HUC8 were subdivided into 16, 32, or 48 computation units. Each computational unit had separate global, soil, vegetation, and elevation parameter files and could be executed independently. Computation was performed using Oak Ridge National Laboratory's Titan supercomputer, a Cray XK7 system with 18 688 computational nodes, each equipped with four quad-core CPUs and two GPU cards. The extensive simulation exhausted ∼ 1.5 million CPU-hours (i.e. the number of CPUs multiplied by the average hours used by each CPU), approximately 171 calendar years if done by a single-core desktop machine. Note that we had originally planned to use a 10layer elevation band during calibration, but it would have resulted in a huge increase in the required computational time (an estimated 15 million CPU hours, approximately) over our allowable resource. Given that the current calibration was targeted for total monthly runoff and was less affected by the elevation band, a one-layer elevation band was used. When hydro-climate projections are produced for future research, multiple elevation bands will be implemented. Four statistical matrices, the coefficient of determination (R 2 ), Nash-Sutcliffe model efficiency coefficient (Nash), mean absolute error (MAE), and root mean square error (RMSE), were used to evaluate model performance. The matrices are summarized in Table 2, in which O t and Y t represent the observed and modeled monthly total runoff from month 1 to n, and O and Y represent the mean of O t and Y t . For each HUC8, daily total runoff was computed by summing base flow and surface runoff at each 4 km grid and then aggregated up to calculate the HUC8 monthly runoff (Y t ). The observed monthly runoff (O t ) from the USGS Wa-terWatch was then used for model evaluation. In addition to runoff analysis, the simulated 1 April snow water equivalent (SWE) was compared with the snow course observations. The results are reported in Sect. 3.4.

Differences among forcing data sets
To understand the differences among four selected forcing data sets (Maurer, PRISM, Daymet and NARR), an overall comparison was performed (illustrated in Fig. 3). Average daily maximum temperature (T max ), daily minimum temperature (T min ), annual total precipitation (P ), and average wind speed (W ) from 1980 to 2008 for each of the 2107 HUC8s were computed for comparison. The correlation coefficients among the HUC8 average values were also computed. Given PRISM's high accuracy (Daly et al., 2008), it is placed on the x axis as the target for comparison. Figure 3a illustrates the difference for T max . Although both Maurer and Daymet are close to the PRISM observation, NARR seems to be much warmer in most of the HUC8s. The difference is not as significant for T min in Fig. 3b, where most of the data sets are similar to each other; NARR is slightly warmer than the Daymet and Maurer data sets. The warm bias of NARR was also reported by Royer and Poirier (2010). Additionally, Daly et al. (2008) showed a cold bias in the Daymet data set compared with the PRISM January minimum temperature, whereas differences between PRISM and Daymet July maximum temperatures are relatively small. For the difference of P , a consistent observation can be made in Fig. 3c. Both Daymet and Maurer are closer to PRISM, but NARR is more divergent than the other data sets. Daly et al. (2008) showed that the differences of P , T min and T max between Daymet and PRISM were more prominent in the western US than in the east. They also indicated that Daymet tended to be drier than PRISM in some locations and wetter in others; this was because Daymet did not resolve rain shadows well owing to an inability to recognized topographic facets. Given that wind speed is available only for Maurer and NARR, only one set of points is plotted in Fig. 3d. A significant difference can be seen in the two data sets, with a correlation coefficient of around 0.28.
A further comparison of high/low daily quantiles is shown in Fig. 4. The mean annual 95 % quantile of daily maximum temperature (T max,95 , Fig. 4a), 5 % quantile of daily minimum temperature (T min,05 , Fig. 4b), 95% quantile of daily precipitation (P 95 , Fig. 4c), and 95% quantile of daily wind speed (W 95 , Fig. 4d) from 1980-2008 are plotted for each HUC8. Since PRISM is available only in a monthly scale, it is not included in the comparison. For T max,95 , Daymet remains similar to Maurer, but the difference between NARR and others becomes more significant. The comparison of T min,05 is similar to that of T min (Fig. 3b), but some HUC8s seem to be much cooler in Maurer than in Daymet. For high precipitation P 95 , although Daymet, Maurer, and NARR are generally linearly correlated, Daymet was Hydrol. Earth Syst. Sci., 18, 67-84  found to consistently provide a higher value than Maurer and NARR. That is not a surprise, since precipitation is highly resolution dependent (Gao et al., 2006); therefore, the finer resolution Daymet should report precipitation extremes more accurately at the 4 km scale. In other words, even using the same ground-based observations, the precipitation extremes calculated at 12 km resolution will be smoothed further than those calculated at 4 km; and it is unlikely that such spatial variability can be faithfully reconstructed from coarser resolution to finer resolution. For W 95 , the comparison is very similar to that for W (Fig. 3d).
To understand the differences, the geographical wind speed patterns are plotted in Fig. 5. Clearly, the inconsistency should be from the differences in the original data sources. The wind speed provided in Maurer's data set was calculated from the coarser-resolution reanalysis data set and hence shows a smoother pattern in Fig. 5b. Given that NARR can provide a more delicate local wind speed pattern, the NARR wind speed was chosen as the default wind speed in this study.

Monthly and annual statistics of LAI
To examine the variability of the MODIS LAI, the monthly and annual average LAI are plotted in Fig. 6. For each UMD class, the mean monthly MODIS LAI values were calculated for the entire conterminous US from January 2003 to December 2008. The monthly average LAI values are shown in Fig. 6a, in which the highest LAI values are found for evergreen broadleaf, deciduous broadleaf and mixed forest, and the lowest LAI values for bare ground, open shrubland, and closed shrubland. In terms of seasonal pattern, the LAI values for evergreen broadleaf are consistently high across all seasons. Both deciduous broadleaf and mixed forest have the strongest seasonal variation and can be larger than evergreen broadleaf during summer. The annual averages are plotted in Fig. 6b and further broken down to 18 conterminous US hydrologic regions in Fig. 7. Given that the annual variability is not significant in both figures, it should be justifiable to calculate the required VIC monthly vegetation parameters by averaging the monthly MODIS LAI values from 2003 to 2008. This simplification is needed because the current VIC Hydrol. Earth Syst. Sci., 18, 67-84, 2014 www.hydrol-earth-syst-sci.net/18/67/2014/ model does not allow dynamic vegetation simulation. It is interesting to note that the LAI values reported in the urban and built land class are not among the smallest. Those results may be explainable by considering that the findings are based on 1 km grid resolution, and at that resolution, many suburban areas are still covered by vegetation.

Difference between runoff aggregation and routing
To simulate streamflow in the VIC model, a separate routing model developed by Lohmann et al. (1998) was required. The routing model simulated a channel network with a number of nodes, each of which represented information from a grid cell. A unit hydrograph was then used to route the simulated surface runoff and base flow through a channel network using a linearized St. Vennant's equation (Lohmann et al., 1998). The routing model required five types of input: flow direction, grid area fraction, flow velocity, watershed boundary mask, and gauge locations. Whereas flow direction, grid area fraction, and watershed boundary can be derived from digital elevation models, flow velocity and unit hydrograph involve larger uncertainty and cannot be easily estimated. As discussed in Sect. 2.1, the resolution of digital elevation models also has a significant influence on the accuracy of river networks.
Although the major purpose of a routing model is to account for the travel time of river flow, this step may reasonably be skipped in smaller watersheds (i.e. when travel time is short) by using a simpler runoff aggregation method (Demaria et al., 2007). To evaluate the difference, a comparative analysis was performed for two randomly selected USGS gauge stations. Both gauge 01047000 in HUC01030003 and 02342500 in HUC03130003 have experienced little or no anthropogenic disruption and have complete records from 1980 to 2008. Routing models were set up to calculate streamflow at these two gauge stations. The simulated total monthly runoff (surface runoff + base flow) and monthly average streamflow are illustrated in Fig. 8; runoff observations were taken from WaterWatch and gauge observations from NWIS. The correlation coefficient (ρ) between the observed and simulated runoff/streamflow was also calculated. Generally speaking, the model performance showed large similarities for both approaches, with correlation coefficients between simulation and observation varying from 0.8 to 0.9. Although the travel time was not modeled in the runoff aggregation approach, the extra uncertainty induced by the routing model was also avoided (e.g. flow speed, routing resolution), so pros and cons exist for both methods. Given that our main objective was to provide a first-order calibrated hydrological parameter data set to expedite further efforts at fine calibration, it is more efficient and consistent to calibrate VIC for each HUC8 through the runoff aggregation approach. Also, since it is extremely time-consuming to develop reasonable routing models for all HUC8s in the US, the runoff aggregation approach is an easier alternative than spatially examining the model performance for a great number of watersheds in the US.

Overall model performance
Overall model performance is illustrated in Fig. 9. For each HUC8, the WaterWatch-observed and VIC-simulated total annual runoff (base flow+surface runoff) were computed for both calibration and validation periods. The correlation coefficients ρ between observed and simulated HUC8 annual runoff are 0.954 in the calibration period and 0.940 in the validation period, which is satisfactory overall. The results represent an improvement from 0.906 using an uncalibrated 4 km data set (i.e. with default parameters) and from 0.877 (Fig. 10) using the Ashfaq et al. (2010) 12 km data set, both with a much larger spread. To examine the model performance in different seasons, Fig. 9 is further broken down into winter (December-January-February, DJF), spring (March-April-May, MAM), summer (June-July-August, JJA), and fall (September-October-November, SON) in Fig. 11. It can be observed that the current parameter set works best in winter, with ρ = 0.966 in the calibration period and ρ = 0.954 in the validation period. A strong dry winter bias is observed in some HUC8s, which require further calibration for future application. Following winter, the model performance in spring and fall remains good, with ρ greater than 0.929 in the calibration period and 0.908 in the validation period. The worst model performance is observed in summer, with ρ = 0.866 in the calibration period and 0.830 in the validation period. The worse performance in summer is not surprising. Since we are calibrating monthly time series across different seasons, the Nash-Sutcliffe coefficient will be controlled more by wetter months (winter) than drier months (summer). If the future focus is on summer months, different objective functions should be designed and used for further model adjustment.
In Fig. 9, it can be seen that the current parameter sets overestimate runoff in multiple drier HUC8s. To spatially examine the model performance for the entire conterminous US, the observed and simulated annual runoff, R 2 , Nash, MAE, and RMSE for each HUC8 are illustrated in Fig. 12. The median values of the HUC8 evaluation matrices in each hydrologic region are also summarized in Table 3. From Fig. 12a and b, it can be seen that VIC generally captures the spatial patterns of WaterWatch runoff. However, the simulated runoff is higher in many HUC8s, especially in very dry regions such as the Rio Grande (HUC 13), Lower Colorado (HUC 15), Texas (HUC 12), Great Basin (HUC 16), and Arkansas White-Red (HUC 11). To enable a closer look, both R 2 (Fig. 12c) and Nash (Fig. 12d) between the observed and simulated monthly runoff are illustrated. Clearly, while the current parameter sets may provide satisfactory results for wetter regions, it is challenging to capture the monthly runoff time series in drier regions. Further studies in dry regions are required, since the suitable parameter values and model setup may have exceeded the currently suggested ones. This issue may be also related to meteorological forcing. It was noticed that Daymet generally provided higher precipitation (∼ 25 % greater than PRISM) in dry HUC8s; and under such a situation, the VIC model cannot be further improved unless precipitation is reduced in proportion to the PRISM values. Other factors such as groundwater that could not be simulated by VIC, and/or managed flow that was not captured by WaterWatch runoff, could also affect this erroneous runoff simulation. Nevertheless, since these regions are fairly dry, they in fact have smaller MAE (Fig. 12e) and RMSE (Fig. 12f) compared with wet regions. Therefore, the overall impact of wet bias in drier regions may not be significant since it is on a small scale.
To evaluate the model performance for other variables, the simulated 1 April SWE was compared with the observed snow course data used by Mote et al. (2005). Focusing on the 1981-2000 period, 784 snow stations with complete annual 1 April SWE observations were selected. For each station, the simulated 1 April SWE at the nearest grid was looked up. Since the point observation may be on a different scale from the grid-based SWE, the correlation coefficient between observation and simulation is computed for evaluation. The results are summarized in Fig. 13. Figure 13a plots the histogram of ρ and shows that high correlation coefficients can be seen in most of the stations. To examine the statistical significance, the histogram of the p value is plotted in Fig. 13b. The p value is less than 0.05 for nearly 700 stations (i.e. correlation is statistically significant under 5 % significance level), which suggests that the simulation may capture the annual trend for most of the stations. To check the spatial pattern, ρ values are further plotted in Fig. 13c. Generally speaking, except for eastern Wyoming, northern Colorado, and northern California, the simulated SWE showed good correlation with observations. As mentioned, the current simulation was conducted using one elevation band (for efficiency of model calibration). With an increase in elevation bands in future simulations, the performance of snow simulation could be further improved.
The overall improvement from the computationally intensive calibration exercise is illustrated in Fig. 14. Focusing on Nash (Fig. 14a) and RMSE (Fig. 14b), the cumulative percentage of HUC8s is plotted. In terms of Nash, around 20 % of HUC8s (∼ 450 HUC8s) were improved from less than 0.5 to greater than 0.5. In terms of RMSE, the modeling errors were on average reduced by ∼ 5 mm yr −1 for most of the HUC8s. While the best parameters at each HUC8 were identified, the performance of all examined combinations of parameters was also recorded to support further assessment. Calculating the sensitivity and trends for each parameter could make it possible to achieve further  model improvements with fewer iterations and computational hours. In other words, the proposed hydrological data set provides not only the currently best available parameters but also the tested parameter sensitivity to support further fine calibration.

Summary and conclusion
This study introduces an effort to prepare a comprehensive hydrological model parameter data set for large-scale, high-resolution climate change impact assessment. Several key inputs for hydrologic simulation -including meteorologic forcings, soil, land class, vegetation, and elevationwere collected and organized in refined 4 km grids. Using high-performance computing, a spatially consistent calibration was performed for the VIC model. The VIC simulation was driven by Daymet daily meteorological forcing and was evaluated by USGS WaterWatch runoff observations for 2107 HUC8 subbasins in the conterminous US. Overall, 1.5 million CPU-hours were used to develop a postcalibrated model parameter data set to support fine-scale future hydro-climate assessments. The pre-organized model parameter data set will be provided to interested parties to support further hydro-climate impact assessment. Although model calibration may yet be required for particular model applications, it is hoped the pre-organized data set will help reduce the amount of effort needed for basic data preparation and organization. Depending on the specific needs, the parameter data set can then be further calibrated effectively.
As a result of this exhaustive calibration exercise, it is now possible to more accurately estimate the resources required for further model improvement across the entire conterminous US. Calibrating a hydrologic model consistently for various watersheds can also increase understanding of the strengths and limitations of a particular hydrologic model across different climate regions. It can help in determining the best combination of hydrologic model and meteorological forcing data set for specific regions. Although the extensive model calibration was performed for the VIC model, the computation and data framework were designed in a flexible manner so that other suitable hydrologic models could be incorporated in the future. By including multiple hydrologic model choices, we hope that it can provide flexibility for further application and increase understanding of the modeling uncertainty associated with different hydrologic models in hydro-climate impact assessment.