Physically based models provide insights into key hydrologic processes but are associated with uncertainties due to deficiencies in forcing data, model parameters, and model structure. Forcing uncertainty is enhanced in snow-affected catchments, where weather stations are scarce and prone to measurement errors, and meteorological variables exhibit high variability. Hence, there is limited understanding of how forcing error characteristics affect simulations of cold region hydrology and which error characteristics are most important. Here we employ global sensitivity analysis to explore how (1) different error types (i.e., bias, random errors), (2) different error probability distributions, and (3) different error magnitudes influence physically based simulations of four snow variables (snow water equivalent, ablation rates, snow disappearance, and sublimation). We use the Sobol' global sensitivity analysis, which is typically used for model parameters but adapted here for testing model sensitivity to coexisting errors in all forcings. We quantify the Utah Energy Balance model's sensitivity to forcing errors with 1 840 000 Monte Carlo simulations across four sites and five different scenarios. Model outputs were (1) consistently more sensitive to forcing biases than random errors, (2) generally less sensitive to forcing error distributions, and (3) critically sensitive to different forcings depending on the relative magnitude of errors. For typical error magnitudes found in areas with drifting snow, precipitation bias was the most important factor for snow water equivalent, ablation rates, and snow disappearance timing, but other forcings had a more dominant impact when precipitation uncertainty was due solely to gauge undercatch. Additionally, the relative importance of forcing errors depended on the model output of interest. Sensitivity analysis can reveal which forcing error characteristics matter most for hydrologic modeling.

Physically based models allow researchers to test hypotheses about the role
of specific processes in hydrologic systems and how changes in environment
(e.g., climate, land cover) may impact key hydrologic fluxes and states

There are fewer detailed studies focusing on forcing uncertainty relative to
the number of parametric and structural uncertainty studies

Previous work on forcing uncertainty in snow-affected regions has yielded
basic insights into how forcing errors propagate to model outputs and which
forcings introduce the most uncertainty in specific outputs. However, these
studies have typically been limited to (1) empirical/conceptual models

The purpose of this paper is to use global sensitivity analysis to assess how
specific forcing error characteristics influence outputs of a physically
based snow model. To our knowledge, no previously published study has
investigated this topic in snow-affected regions. It is unclear how
(1) different error types (bias vs. random errors), (2) different error
distributions, and (3) different error magnitudes across all forcings affect
model output. The impact of forcing errors on models can be tested by
corrupting forcings with specified characteristics (e.g., artificial biases
and random errors) and quantifying the impact on model outputs

In our view, it is important to clarify the relative impact of specific error
characteristics on modeling applications, so as to prioritize future research
directions, improve understanding of model sensitivity, and to address
questions related to network design. For example, given budget constraints,
is it better to invest in a heating apparatus for a radiometer (to minimize
bias due to frost formation on the radiometer dome) or in a higher quality
radiometer (to minimize random errors associated with measurement precision)?
Additionally, it is important to contextualize different meteorological data
errors, as these errors are usually studied independently of each other

The overarching research question is how do assumptions regarding forcing
error characteristics impact our understanding of uncertainty in physically
based model output? Using the

Basic characteristics of the snow study sites, ordered from left to right by increasing elevation.

We selected four seasonally snow covered study sites (Table

The sites had high-quality observations of model forcings at hourly time
steps. Serially complete published data sets are available at CDP, RME, and
SASP (see citations above). At IC, data were available from multiple
co-located stations

We considered only 1 year for analysis at each site (Table

Before conducting the sensitivity analysis, we adjusted the available
precipitation data at each site with a multiplicative factor to correct for
potential undercatch errors

The initial discrepancies between modeled and observed SWE (prior to applying the above precipitation multipliers) may have resulted from deficiencies in the measured forcings, model parameters, model structure, and measured verification data, and justification of our decision to apply precipitation multipliers was warranted. Manual observations of SWE (e.g., snow surveys, snow pits) generally supported the automatically collected SWE observations (no figures shown) and thus differences between observed and modeled SWE did not likely stem from issues in the verification data. Sites where we decreased the precipitation data (CDP and RME) were also the warmer sites and experienced more mixed rain–snow events in the winter. Hence, we considered multiple hypotheses to explain the SWE differences at these sites: (1) the choice of rain–snow parameterization, (2) the choice of parameters (e.g., threshold temperatures) for the rain–snow parameterization, and (3) the quality of the forcing data (e.g., precipitation). For these warmer sites, an exploratory analysis revealed that either (1) or (3) could explain the SWE differences, but auxiliary data (e.g., precipitation phase data) were not available to discriminate these hypotheses. Choosing a different rain–snow parameterization might minimize the SWE differences at the warmer sites but would not rectify the SWE differences at the colder sites (IC and SASP) where most winter precipitation falls as snow. Therefore, the most straightforward and consistent approach was to adjust the precipitation data and to leave the native UEB parameterizations intact. It was beyond the scope of this study to optimize model parameters and unravel the relative contributions of uncertainty for factors other than the meteorological forcings. Nevertheless, we suggest these precipitation adjustments minimally affected the sensitivity analysis, as we did not quantitatively compare the model outputs to the observed response variables (e.g., SWE).

The UEB is a physically based, one-dimensional snow model

With each UEB simulation, we calculated four summary output metrics: (1) peak
(i.e., maximum) SWE, (2) mean ablation rate, (3) snow disappearance
date, and (4) total annual snow sublimation. The first three metrics are
important for the timing and magnitude of water availability and
identification of the snowpack regime

UEB model parameters used in all simulations and sites.

Details of error types, distributions, and uncertainty ranges for the five scenarios.

Scenarios of interest and the type, distribution, and magnitude of
errors considered in each. NB considers normally (or lognormally) distributed
biases with error magnitudes found in the field. NB

To test how error characteristics in forcings affect model outputs, we
examined five scenarios (Fig.

Forcing data inevitably have some (unknown) combination of bias and random
errors. However, hydrologic sensitivity analyses have tended to focus more on
bias with little or no attention to random errors

Table

In their recent review of global sensitivity analysis applications in
hydrological modeling,

We designed the UB scenario with the naive hypothesis that the probability
distribution of biases was uniform for all six meteorological variables. In
contrast, error distributions (Table

We considered three magnitudes of forcing uncertainty (Table

Consideration of error magnitudes was achieved in each scenario by assigning
a range to each error probability distribution (see Sect.

We considered field uncertainties in all forcings in NB, NB

In contrast, scenario NB_lab assumed laboratory levels of uncertainty (i.e.,
measurement accuracy) for each forcing.

Numerous approaches that explore uncertainty in numerical models have been
developed in the literature of statistics

One can visualize any hydrology or snow model (e.g., UEB) as

Sobol' sensitivity analysis uses variance decomposition to attribute output
variance to input uncertainty. First-order and higher-order sensitivities can
be resolved; here, only the total-order sensitivities were examined (see
below) for clarity and because the resulting first-order sensitivity indices
were typically comparable to the total-order sensitivity indices (e.g.,
83 % of all cases had total-order and first-order indices within 10 % of
each other), suggesting minimal error interactions. The Sobol' method is
advantageous in that it is model independent, can handle non-linear systems,
and is among the most robust sensitivity methods

Within the Sobol' global sensitivity analysis framework, the total-order
sensitivity index (

Conceptual diagram showing methodology for imposing errors on the
forcings with error parameters (

A number of numerical methods are available for evaluating sensitivity
indices, and most adopt a Monte Carlo approach

To test the reliability of

Figure

Step 1: generate an initial (

Step 2: in each simulation, map the input factor sample of each forcing error
parameter (

Step 3: in each simulation, perturb (i.e., introduce artificial errors) the
observed time series of the

where

Step 4: input the

Step 5: save the model outputs for each simulation (Fig.

Step 6: calculate

Number of samples (

Observed (black line) and modeled SWE (color density plot)
at the four sites across the five uncertainty scenarios (see Fig.

Distributions of model outputs (rows) at the four study sites
(columns) arranged by scenario. For each scenario, the circle is the mean and
the whiskers show the range encompassing 95 % of the simulations (see
Table

Figure

Large uncertainties in SWE were evident, particularly in NB, NB

NB and NB

NB and UB yielded generally very different model outputs
(Figs.

Contrasting NB and NB_gauge, NB_gauge had a lower uncertainty range in
SWE and slightly higher mean peak SWE at all sites (Figs.

Relative to NB, NB_lab had smaller uncertainty ranges in all model outputs
(Figs.

Total-order sensitivity indices (

Model sensitivity as a function of forcing error type. Shown are the
total-order sensitivity indices (

We first focus on sensitivity to forcing bias, as this error type was common
to scenarios NB and NB

We hypothesized that the snow model outputs would have higher sensitivity to
biases than to random errors in the forcings. The results of our analysis
generally supported this hypothesis. Across all outputs and sites,

While there was general correspondence between NB and NB

We hypothesized that the assumed probability distribution of errors would
alter the relative hierarchy of forcing biases. However, the results did not
consistently support this hypothesis (Fig.

For a few specific forcings and outputs, the selected probability
distribution played a role in model sensitivity to that type of forcing bias.
For example, assumption of a uniform probability distribution (UB) for
forcing errors enhanced the sensitivity of sublimation to

Same as Fig.

We hypothesized that the relative magnitude of forcing errors would exert a
strong control on model sensitivity. Comparing NB to NB_gauge and to NB_lab
generally supported this hypothesis (Fig.

Same as Fig.

While

Whereas NB_gauge demonstrated that reducing the magnitude of forcing
uncertainty in one factor (i.e., precipitation) was sufficient to change
which factors were most and least important, NB_lab showed that changing the
magnitude of forcing uncertainty in all terms could yield a substantially
different pattern of model sensitivity (Fig.

Variation of daily SWE sensitivity to forcing bias based on
site (columns) and error scenario (rows). The normalized range (where
1

The above results sequentially compared sensitivity indices from different
error scenarios to NB in order to ascertain how different assumptions
regarding error types, probability distributions, and magnitudes translated to changes in model
sensitivity. To summarize the relative controls of these three forcing error
characteristics on model sensitivity, we calculated daily sensitivity indices
of modeled SWE to forcing biases at each site and scenario
(Fig.

Comparing the broad patterns in the time varying

After error magnitudes, the next most important determinant to model
sensitivity was the probabilistic distribution of forcing errors (compare
Fig.

Here we examined the sensitivity of physically based snow simulations to
forcing error characteristics (i.e., types, probability distributions, and
magnitudes) using the Sobol' global sensitivity analysis. A key result is
that among these three characteristics, the magnitude of biases had the most
significant impact on UEB simulations (Figs.

The results supported our hypothesis that the magnitude of biases strongly
influences the relative importance of forcing errors. The three magnitudes of
uncertainty considered (NB, NB_gauge, and NB_lab) all resulted in different
patterns in model sensitivity to forcing biases, and these patterns also
varied with the output of interest (Fig.

The dominant effect of

The contrast between scenarios NB, NB_gauge, and NB_lab highlights that
selection of the error ranges is a critical step in sensitivity analysis.
However, we recognize that there is some subjectivity in the specification of
these ranges. Quantification of errors in forcing estimation methods is best
achieved through comparisons with surface observations

The results did not universally support our hypothesis that the assumed
probability distribution of biases was important to the relative ranking of
forcing errors. The relative consistency in the dominant forcing errors
between NB and UB may have emerged because the probability distributions of
all six forcing biases varied together between these two scenarios (i.e., all
forcing biases were uniform in UB and either normal or lognormal in NB).
While we did not conduct additional tests, we suspect that changing the
probability distribution of just a single forcing error (e.g.,

The similarity of results between scenarios NB and UB conform to findings in
previous studies

The results were consistent with our hypothesis that the snow model is more
sensitive to biases than to random errors in the forcings. While previous
investigations supported this idea for shortwave and longwave forcings in
physically based snow models

Uncertainty ranges (95 % intervals) in

Our central argument at the onset was that forcing uncertainty may be
comparable to parametric and structural uncertainty in snow-affected
catchments. To support our argument and to place our results in context, we
compare our results at CDP in 2005–2006 to

Limitations of the analysis are that the impact of forcing error
characteristics on model behavior is evaluated through the lens of a single
sensitivity analysis method and a single snow model. It is possible that
alternative sensitivity analysis methods might yield different results than
the Sobol' method, as suggested in previous studies

Generalizing the relationship between model sensitivity and site climate is a
research topic of high interest. Although we found similarities in model
sensitivity to specific forcing errors across sites (e.g., high sensitivity
to

While the Sobol' method is often considered the “baseline” method in global
sensitivity analysis, we note the limitation is that it comes at a relatively
high computation cost (1 840 000 simulations across four sites and five
error scenarios) and it may be prohibitive for many modeling applications
(e.g., for models of higher complexity and dimensionality). For context, the
typical time required for a single simulation was 1.4 s, resulting in a
total computational expense of 720 h (30 days) across all scenarios.
Examination of the convergence rates indicated that most sensitivity indices
stabilized after one-third of the simulations completed and hence the same
results could have been found using significantly fewer simulations (no
figures shown). Ongoing research is developing new sensitivity analysis
methods that compare well to Sobol' but with reduced computational demands

The question remains “what can be done about forcing errors in hydrologic
modeling? First, the results suggest model-based hypothesis testing must
account for uncertainties in forcing data. The results also highlight the
need for continued research in constraining

Application of the Sobol' sensitivity analysis framework across sites in contrasting snow climates reveals that forcing uncertainty can significantly impact model behavior in snow-affected catchments. Model output uncertainty due to forcings can be comparable to or larger than model uncertainty due to model structure. Furthermore, this work demonstrates that sensitivity analysis can be applied to understand the role of specific error characteristics in model behavior. Key considerations in model sensitivity to forcing errors are the magnitudes of forcing errors and the outputs of interest. For the physically based snow model tested, random errors in forcings are generally less important than biases, and the probability distribution of biases is relatively less important to model sensitivity than the magnitude of biases.

The analysis shows how forcing uncertainty might be included in a formal sensitivity analysis framework through the introduction of new parameters that specify the characteristics of forcing uncertainty. The framework could be extended to other physically based models and sensitivity analysis methodologies and could be used to quantify how uncertainties in model forcings and parameters interact. Based on this framework, it would be interesting to assess the interplay between coexisting uncertainties in forcing errors, model parameters, and model structure, and to test how model sensitivity changes in relation to all three sources of uncertainty.

M. Raleigh was supported by a post-doctoral fellowship in the Advanced Study Program at the National Center for Atmospheric Research (NCAR). J. Lundquist was supported by NSF (EAR-838166 and EAR-1215771). The manuscript was improved thanks to thoughtful comments from F. Pianosi, J. Li, R. Essery, R. Rosolem, A. Winstral, and one anonymous reviewer. Thanks to M. Sturm, G. Shaver, S. Bret-Harte, and E. Euskirchen for assistance with Imnavait Creek data, S. Morin for assistance with Col de Porte data, D. Marks for assistance with Reynolds Mountain data, C. Landry for assistance with Swamp Angel data, and E. Gutmann and P. Mendoza for feedback. The authors also thank R. Essery for sharing model output for the comparisons in Fig. 9. For Imnavait Creek data, we acknowledge US Army Cold Regions Research and Engineering Laboratory, the NSF Arctic Observatory Network (AON) Carbon, Water, and Energy Flux monitoring project and the Marine Biological Laboratory, Woods Hole, and the University of Alaska, Fairbanks. Imnavait Creek data sets were provided in part by the Institute of Arctic Biology, UAF, based upon work supported by the National Science Foundation under grant no. 1107892. The experiment was improved thanks to conversations with D. Slater. The National Center for Atmospheric Research is sponsored by the National Science Foundation. Edited by: R. Woods