Role of forcing uncertainty and background model error characterization in snow data assimilation

Accurate specification of the model error covariances in data assimilation systems is a challenging issue. Ensemble land data assimilation methods rely on stochastic perturbations of input forcing and model prognostic fields for developing representations of input model error covariances. This article examines the limitations of using a single forcing dataset for specifying forcing uncertainty inputs for assimilating snow depth retrievals. Using an idealized data assimilation experiment, 5 the article demonstrates that the use of hybrid forcing input strategies (either through the use of an ensemble of forcing products or through the added use of the forcing climatology) provide a better characterization of the background model error, which leads to improved data assimilation results, especially during the snow accumulation and melt time periods. The use of hybrid forcing ensembles is then employed for assimilating snow depth retrievals from the AMSR2 instrument over two 10 domains in the Continental U.S. with different snow evolution characteristics. Over a region near the Great Lakes where the snow evolution tends to be ephemeral, the use of hybrid forcing ensembles provides significant improvements relative to the use of a single forcing dataset. Over the Colorado Headwaters characterized by large snow accumulation, the impact of using the forcing ensemble is less prominent and is largely limited to the snow transition time periods. The results of the arti15 cle demonstrate that improving the background model error through the use of a forcing ensemble enables the assimilation system to better incorporate the observational information.


Introduction
Land data assimilation (DA) methods combine observations of land surface conditions from remote sensing platforms or ground measurements with model forecasts to produce temporally and spatially continuous estimates of land surface fields.The merging of the observations and model forecasts is conducted by weighting them appropriately based on their respective sources of errors.As a result, the skill of the DA systems is critically reliant on the accurate specification of errors in observations and model background.
Despite their importance, the specification of input error covariances is challenging (Dee, 1995;Derber and Bouttier, 1999;Reichle, 2008;Reichle et al., 2008).The sources of errors in observations include instrument errors, deficiencies of the observation operators (such as radiative transfer models) and representativeness issues from differences in spatial scales (Kumar et al., 2012).Similarly, uncertainties in model parameters, forcing inputs and deficiencies in model physics contribute to the model background errors.The model error covariance specifications are often made through idealized experiments using analysis of assimilation increments and innovations (Kumar et al., 2008(Kumar et al., , 2009)).Comparison of model simulations against independent observations is another approach for developing these specifications.However, given the lack of representativeness of the point-scale in situ measurements and the significant heterogeneity of the land surface, developing spatially distributed estimates of these model error covariances are difficult.As noted in Reichle (2008), the specification of input error covariances remains a subjective process in current land data assimilation systems.
Ensemble data assimilation techniques such as the ensemble Kalman filter (EnKF) are widely used in land data assimilation applications (Crow and Wood, 2003;Reichle et al., 2007Reichle et al., , 2010;;Kumar et al., 2009Kumar et al., , 2014;;De Lannoy et al., 2012).The EnKF, a Monte Carlo variant of the Kalman filter, uses an ensemble of model trajectories to represent the model error structures.The model error covariance is diagnosed as the sample covariance of the ensemble of model forecasts.The ensemble is typically created by adding stochastic noise to the meteorological forcing, propagated to the model fields through the nonlinear land surface model (LSM).In addition, stochastic perturbations are also commonly applied to the model prognostic fields.
Perturbations are sampled from randomly generated noise and are directly applied to the forcing and model prognostic fields.The typical approach is to employ either normally distributed additive perturbations or lognormally distributed multiplicative perturbations, depending on the variable.For example, multiplicative perturbations are normally used for fields such as precipitation, since the use of additive noise could generate unphysical values (less than zero) or consistent positive biases during periods where precipitation is absent.In addition, to avoid introducing systematic biases in the perturbed fields, the ensemble mean of the perturbations are normally constrained to zero and one, for additive and multiplicative perturbations, respectively.
In this article, we examine how the reliance on ensemble perturbations of forcing fields to develop the background model error impacts the performance of data assimilation.Most land data assimilation systems use a single data source as the forcing input and the input forcing uncertainty is characterized by perturbing the meteorological fields from this single data source.The accuracy of the model error covariance therefore greatly depends on the accuracy of the forcing input (Reichle and Koster, 2003).For example, in a case where precipitation inputs are underestimated, the forcing uncertainty characterized by the resulting ensemble will lead to the underestimation of the model error covariance.In contrast, alternate strategies, such as the added use of the forcing climatology or multiple forcing data sources, are likely to provide better representations of the forcing uncertainty and a better characterization of the background model error.In this article, we examine the impact of such factors in the context of snow data assimilation case studies.
The article presents two sets of experiments: (1) an idealized experiment to demonstrate the impact of model error covariance underestimation and (2) a "real" data assimilation scenario, where snow depth retrievals (Oki et al., 2010;Kachi et al., 2013) from the Advanced Microwave Scanning Radiometer 2 (AMSR2) aboard the Global Change Observation Mission -Water (GCOM-W) satellite are used.The assimilation of AMSR2 data is conducted over two different domains in the continental USA with different snow evo-lution characteristics.The different nature of the snow evolution in these domains is used to investigate the impact of background model error representations in snow data assimilation.All experiments described in this article are conducted using the NASA Land Information System (LIS; Kumar et al., 2006), which is an observation-driven land surface modeling and data assimilation system.The data assimilation subsystem in LIS (Kumar et al., 2008) contains algorithms such as the EnKF and supports the assimilation of data from a variety of satellite sensors (Reichle et al., 2010;Liu et al., 2013Liu et al., , 2015;;Kumar et al., 2014Kumar et al., , 2015Kumar et al., , 2016)).

Ensemble Kalman filter and background error covariance representation
The filtering class of data assimilation algorithms seeks the best estimate of the posterior state conditioned on the past observations, using the statistics of the uncertainties in the model and observations.The Kalman filter (KF) is an optimal estimator for linear dynamical systems driven by Gaussian noise.The EnKF is a reduced-rank variant of the KF, which assumes normality of model and observation errors and typically requires the use of a small number of ensembles to represent these error structures (Reichle, 2008).
EnKF is a sequential data assimilation approach, where the algorithm alternates between a forecast step and an analysis step.In the forecast step, an ensemble of model states is propagated forward in time using the LSM.This is followed by an analysis step where the model forecast is updated based on observations.The analysis step is written in the general form as where x b is the background model state vector, x a is the analyzed state vector, y is the observation vector and H k is the observation operator that relates the model states to the observations.The subscript k indicates time and the superscripts b and a refer to the state estimates, before and after the update, respectively.K k is the gain matrix, which represents the weighting factor that determines the degree to which the model forecast is adjusted towards the observation.K k is expressed as where R k and P b k are the observation and forecast model error covariances, respectively (exponent T refers to the transpose of a matrix).The model error covariance is computed as the sample covariance of the model ensemble.
EnKF relies on the second-order statistics of the noise simulated by ensemble perturbations in the model and observations (drawn from Gaussian distributions), to characterize their probability density functions.The accuracy of the sampled model error covariance, in particular, is dependent on the size of the ensemble and the presence of model errors (Li et al., 2009).Prior studies have used techniques such as covariance inflation (Anderson and Anderson, 1999) to deal with the covariance underestimation.These techniques, however, require significant tuning and rely on the assumption that the observation error covariances are known (Miyoshi and Yamane, 2007).In addition, these inflation techniques are ineffective when the model errors are significant and the resulting model error covariances are close to zero.In the examples below, the impact of underestimating the background model error for snow data assimilation is examined.
Figure 1 shows a schematic of three strategies that are used to examine the issue of model covariance underestimation in this article.The first strategy (A), which is the typical practice in land data assimilation systems, is to use a single forcing dataset to drive the ensemble.The small perturbations applied to the input forcing variables help in simulating the ensemble spread.In the second strategy (B), the ensemble is forced with both the given forcing and a climatology of that forcing.The added use of the forcing climatology helps in incorporating the representation of average conditions within the ensemble and in reducing the covariance underestimation due to the reliance and limitations of a single dataset.In the third approach (C), the model ensemble is driven using an ensemble of forcing products from different sources, providing a more realistic representation of the input forcing uncertainty.Note that small perturbations to the forcing variables are also applied to (B) and (C) forcing data to augment the ensemble spread.

Assessing the impact of model error covariance underestimation through idealized experiments
In this section, we present an idealized snow depth DA experiment to demonstrate the importance of accurately characterizing the input model error covariances.We employ snow depth as the measurement variable as most passive microwave retrieval algorithms (Chang et al., 1987;Kelly et al., 2003;Kelly, 2009), compute snow depth first and derive the snow water equivalent (SWE) through a climatological snow density (Brown and Braaten, 1998;Krenke, 1998) assumption.In addition, most in situ observations of snow are also available as depth measurements, allowing for a more straightforward evaluation of the results from the model and DA integrations.The synthetic experiment is conducted at the Niwot Ridge site in Colorado (40.03 • N, 105.5 • W), which is part of the NRCS Snow Telemetry (SNOTEL) network.All model simulations are conducted using the Noah land surface model version 3.3 (Ek et al., 2003).The DA experiment is set up as an identical twin experiment (Kumar et al., 2009) with the following structure: first, the Noah LSM is run forced with meteorology from the North American Land Data Assimilation System Phase 2 (NLDAS-2; Xia et al. ( 2012)), and is assumed to represent the "true" state of snow depth evolution at this location.This model integration is termed as the control or "truth" simulation.Next, a set of synthetic snow depth observations is simulated from this control run by introducing realistic retrieval errors.Similar to the strategies used in previous studies (Kumar et al., 2008), to account for the limitations of the passive microwave sensors in retrieving snow depth under dense canopies, the observations are masked out when green vegetation fraction values used in the model are greater than 0.6.In addition, observations are degraded by introducing multiplicative random noise with standard deviation of 0.05 to simulate the errors in the snow depth retrievals.An open loop (OL) integration is conducted using the same LSM, but forced with a different meteorology from the Agricultural Meteorology (AGRMET) model of the US Air Force 557th Weather Wing (formerly the Air Force Weather Agency).A data assimilation integration is then conducted by incorporating the simulated observations into the OL configuration using a one-dimensional EnKF (Reichle et al., 2002).The modeled estimates from the OL and DA integrations are compared against the true fields from the control run to evaluate the impact of assimilation.An ensemble size of 20 is used in the integrations with perturbations applied to both meteorological forcing inputs and model prognostic fields to simulate the background model error.Multiplicative perturbations are applied to the precipitation and downwards shortwave fields with a mean of 1 and standard deviations of 0.3 and 0.5, respectively.Additive perturbations with a standard deviation of 50 W m −2 are applied to the long-wave radiation fields.The Noah LSM model fields of SWE and snow depth are perturbed with multiplicative noise of 0.01 and 0.02, respectively.Time series correlations are imposed via a first-order regressive model (AR(1)) with a timescale of 24 h for forcing variables and 12 h for the model fields.The perturbations to the forcing fields are applied hourly, whereas the model prognostic fields are perturbed at 3 h intervals, similar to the configurations used in Kumar et al. (2015Kumar et al. ( , 2014)).
Figure 2 shows a time series of the snow depth fields from model integrations for the 2012-2013 winter season, from the control, synthetic observations (OBS), open loop simulation forced with a single meteorological dataset (OL_FSNGL), and the data assimilation integration that as- similates observations into the OL_FSNGL configuration (DA_FSNGL).Note that the OL_FSNGL configuration includes the ensemble perturbations to the forcing and model state fields, to exclude any changes in model skill introduced by the perturbations in the evaluation of the DA results.The control and OL_FSNGL runs are significantly different in their simulation of snow depth for this winter season.The OL_FSNGL-based snow depth estimates vastly underestimate the snow evolution, likely due to the underestimation of precipitation in the AGRMET data at this location.The assimilation of the observations helps in significantly improving the OL_FSNGL representation, especially during the peak winter months of January through March.The simulation of snow depth during the snow accumulation time periods and the snowmelt time periods, however, shows significant differences relative to the control simulation, though synthetic observations of snow depth exist during these time periods.As shown in Fig. 2, the snow accumulation in the OL_FSNGL simulation is significantly delayed relative to the control.The input model error covariances (P b k ), therefore, remain close to zero until mid-December 2012, when non-zero snow depth estimates are observed in the OL_FSNGL configuration.These model errors result in the gain matrix (K k ) being zero when the background model error variances are zero.As a result, no non-zero analysis increments are generated from the DA analysis and no changes in the snow depth fields from DA are observed until mid-December, 2012.In contrast, during the peak winter months, the snow depth estimates from DA_FSNGL are closer to the control simulation, as the availability of a non-zero model error covariance allows DA to compute positive analysis increments.Further, the DA_FSNGL integration also fails to capture the late season snow events (late April and early May 2013), as the deficiencies in the background model error result in the inability of the analysis step to produce meaningful analysis increments.
In the above example, the main source of the model deficiencies is the errors in the forcing inputs, as the same model is used in the control and open loop integrations.Two variants of this experiment are conducted by (1) using the forcing climatology in combination with the input forcing to specify the ensemble (EXP-FCLIM) and (2) using an ensemble of forcing datasets to drive the ensemble (EXP-FENS).A climatological forcing dataset is developed by averaging the forcing inputs (used in OL_FSNGL) at each forcing time step across 4 years (2012 to 2015).In EXP-FCLIM, the forcing climatology is used to drive 10 of the 20 ensemble members with the remaining 10 driven by the OL_FSNGL forcing data.In EXP-FENS, four different forcing datasets (different from the data used in the control) are used to drive the model ensemble.The forcing products used in EXP-FENS include the Global Data Assimilation System (GDAS; Derber et al., 1991) operational outputs from NOAA/NCEP, the Modern Era Retrospective analysis for Research and Applications, version 2 (MERRA-2; Bosilovich et al., 2017) data, the European Center for Medium Weather Forecasting (ECMWF; Molteni et al., 1996) and AGRMET datasets.Each forcing data are used to drive five ensemble members within the 20 member ensemble.As before, perturbations are applied to both forcing and model states.These strategies assume that a better representation of the forcing uncertainty and model error covariance can be developed by augmenting the ensemble through the use of multiple data sources.
Panel (a) in Fig. 3 shows the time series of snow depth from open loop and DA integrations from the EXP-FCLIM and EXP-FENS experiments, panel (b) shows comparisons of the snow depth ensemble spread from DA integrations and panel (c) shows comparisons of the analysis increments in snow depth from DA.In the EXP-FCLIM experiment, it can be noted that the added use of forcing climatology with the OL_FSNGL forcing is helpful in increasing the ensemble spread in the DA integrations without a significant change to the mean snow-depth estimates.Subsequently, the improved background model error representation leads to improved DA performance, as the DA_FCLIM-based estimates are improved relative to the DA_FSNGL estimates.The improvements are more apparent during the snow accumulation (December-January) and melt (April-May) time periods, though they are significantly underestimated relative to the control.Quantitatively, the root mean square error (RMSE) in the OL_FSNGL and OL_FCLIM integrations for the Oct 2012 to Jun 2013 time period is 85 mm.The DA_FSNGL integration with a single forcing dataset has a RMSE of 55 mm and the added use of the forcing ensemble helps in further reducing the overall RMSE to 48 mm in the DA_FCLIM integration.
Comparatively, the use of an ensemble of forcing products provides significantly improved performance in the assimilation of synthetic observations.First, a significant portion of the bias in the snow depth estimates is reduced by the forcing ensemble-based open loop (OL_FENS).The cumulative RMSE of the OL_FENS integration is 56 mm.The use of the forcing ensemble then helps in improving the DA simulations (DA_FENS), as it shows a closer match with the control relative to all other DA integrations.In particular, DA_FENS shows improvements in the accumulation (November-December) and snowmelt (March-April) periods, and provides a low RMSE of 29 mm, for the time period of Oct 2012 to June 2013.
Comparisons of the analysis increments from DA integrations shown in panel (c) indicate the time periods where the impact of the background model error is more significant.Generally, the analysis increments from DA_FSNGL and DA_FCLIM are similar, except during the snow accumulation and melt-time periods.Comparatively, larger differences in the analysis increments between the DA_FSNGL and DA_FENS integrations are observed, with more prominent differences seen during the accumulation and melt periods.During these times, larger analysis increments are observed in the DA_FCLIM and DA_FENS integrations, reflective of the ability of these configurations to respond to observations due to the improved background model error.It can also be noted that the analysis increments during the peak snow season are generally smaller in DA_FENS and DA_FCLIM integrations compared to that of DA_FSNGL, indicating the contribution of the hybrid forcing inputs for reducing the significant biases in the assimilation system.

Impact of forcing ensemble in the assimilation of AMSR2 snow depth retrievals
The idealized experiments presented in the previous section demonstrate that the use of hybrid forcing ensemble strategies is helpful in providing a better characterization of the forcing uncertainty and the background model error.We extend this approach to a "real" data assimilation scenario where passive microwave snow depth observations from the AMSR2 instrument are employed.These retrievals, available from 2012 July onwards, are obtained from the Japan Aerospace Exploration Agency (JAXA; http://suzaku.eorc.jaxa.jp/GCOM_W/data/data_w_index.html).In all the integrations assimilating AMSR2 retrievals, the standard deviation of the observation error is assumed to be 50 mm.Note that we use a higher value of observation error standard deviation than that reported by Kachi et al. (2013), based on the previous snow DA studies (Liu et al., 2013(Liu et al., , 2015;;Kumar et al., 2014Kumar et al., , 2015) ) that generally assume low skill for passive microwave snow depth retrievals.Land surface model simulations using the Noah LSM (version 3.3) are conducted over two regional model domains in the continental USA (Fig. 4) at 25 km spatial resolution: (1) a region centered around the Great Lakes (GL) and (2) a domain centered around the Colorado headwaters (CH).The snow evolution in the GL region tends to be ephemeral, wet and shallow, whereas the CH region is a high-terrain domain with complex topography and large seasonal snowpacks.The impact of different background model error representations on the assimilation of AMSR2 data is examined over these two domains with contrasting snow development and melt characteristics.Similar to the synthetic data assimilation experiment presented in Sect.3, the model simulations are conducted with a single meteorological forcing dataset, a single meteorological forcing dataset and its climatology, and an ensemble of forcing datasets.The AGRMET data are used as the single meteorological forcing data.In the forcing ensemble-based runs, in addition to AGRMET, three other forcing datasets are used, which include the GDAS, NLDAS-2 and MERRA- We focus first on the GL region by comparing the snow evolution from various model and data assimilation integrations.Figure 5  mation (especially during the early snow season), but fails to capture the late season snow events.The AMSR2 retrievals at this location are primarily available in the late snow season and help in improving the snow depth simulation through DA.Overall, the limitations of the OL_FSNGL prevents DA from making a significant impact in the DA_FSNGL simulation.The availability of the improved background in DA_FCLIM and DA_FENS enables them to provide a better match to the relatively large snow events in March and April, compared to other simulations.Table 1 shows a summary of the cumulative RMSE from various simulations at these locations.The cumulative RMSE from the OL_FSNGL is 381 mm, which reduces to 275 and 169 mm with the OL_FCLIM and OL_FENS, respectively.The cumulative RMSE in the DA integrations is 266 mm for DA_FSNGL, 262 mm in DA_FCLIM and 244 mm in DA_FENS.Note that the cumulative RMSE does not reflect the obvious improvement during the late season snow periods in DA_FENS (over OL_FENS), as the early season underestimation dominates these statistics.
Figure 6b shows a similar time series comparison at point B with larger snow evolution.Similar to point A, OL_FSNGL underestimates the snow evolution throughout the season (RMSE of 252 mm) and is improved by the use of the hybrid forcing ensembles.During the snow accumulation time periods (up to early February 2013), the OL_FCLIM (RMSE of 201 mm) and OL_FENS (RMSE of 167 mm) estimates show better agreement with the GHCN measurements.The AMSR2 retrievals show significant underestimation relative to GHCN during the peak snow season, though they are helpful in improving the snow depth simulations in  the late snow season (March-May).The impact of the improved model background can be noted in the DA_FCLIM and DA_FENS simulations in their ability to provide a better match with the GHCN observations in the late snow season.The single-forcing-based DA estimate (DA_FSNGL), on the other hand, does a poor job in this time period despite the availability of AMSR2 retrievals that are consistent with GHCN.The cumulative RMSE of the DA_FSNGL integration at this location is 206 mm and it improves to 156 and 162 mm in the DA_FCLIM and DA_FENS integrations.
A similar set of evaluations are conducted over the CH domain, an area with deeper seasonal snow accumulation compared to the GL region.Figure 7 presents the RMSE improvement map for the CH domain (similar to Fig. 5).Compared to the improvements observed in the GL domain, the patterns of improvements and degradations are more mixed in the CH domain.In addition, larger improvements and degradations are observed in the DA_FCLIM and DA_FENS integrations relative to DA_FSNGL.To examine these patterns, the time series of snow evolution from various integrations is compared at two locations in the CH domain (point C at 40.375, 106.875 and point D at 45.125, 109.875) and are shown in Figure 8. OL_FSNGL underestimates the snow evolution in both locations (RMSE of 424 and 276 mm at C and D, respectively, as shown in Table 1).The added use of the climatology (OL_FCLIM) marginally improves the snow simulation at location C (RMSE of 402 mm) and provides more significant improvements at location D (RMSE of 142 mm).The use of the forcing ensemble (OL_FENS) provides a better match to the observations at location C (RMSE of 179 mm), but overestimates the snow accumulation at location D (RMSE of 215 mm).At location C, the assimilation of AMSR2 improves the snow depth estimates in DA_FSNGL (RMSE of 316 mm) and DA_FCLIM (RMSE of 309 mm) integrations relative to their respective OL, whereas DA leads to degradations in the forcing ensemble configuration (RMSE of 285 mm), compared to OL_FENS.At location D, the assimilation of AMSR2 retrievals leads to increased RMSE in the DA integrations (RMSE of 327, 312 and 309 mm for DA_FSNGL, DA_FCLIM and DA_FENS, respectively).These trends are reflective of the fact that the AMSR2 observations underestimate the snow evolution in the peak winter months (January-March) and overestimates snow estimates in the spring melt-time periods (April-May), at location C. At location, D, however, the AMSR2 snow observations are generally underestimated.The underestimation of snow at both these locations is likely due to the fact that passive microwave-based retrievals saturate for thick snowpacks (Dong et al., 2005).
In general, the DA integrations (DA_FSNGL, DA_FCLIM and DA_FENS) have comparable performance at both these locations and they mostly follow the snow evolution patterns in the AMSR2 data.Note that though AMSR2 observations capture the seasonality of snow observations, they show significant underestimation compared to in situ observations of snow depth.The influence of undersampling the background model error can be observed in the early part of the snow season at location C and during late season at location D, where the DA_FSNGL integrations fail to match the snow events captured by AMSR2.During the peak snow time periods, however, the undersampling of background model error in OL_FSNGL is less of a problem over this domain, as the non-zero model snow states provide an adequate background for subsequent data assimilation updates.Thus, the evaluation of the snow DA integrations at these two regions provide valuable insights on the importance of accurately characterizing the background model error.The use of the hybrid forcing ensemble and improved model background is more helpful over the GL domain, where snow evolution is ephemeral.Over regions with large snowpacks such as the CH region, the representation of the model background is more impor- tant during the early accumulation and spring melt-time periods.

Summary
Accurate specification of input model and observations error covariances in data assimilation systems is challenging though these error specifications are critical in the development of a skillful data assimilation system.In offline ensemble land data assimilation systems, the model ensemble and background model error representation are typically generated by applying small perturbations to the model prognostic states and input meteorological forcing fields.Most Land DA studies are reliant on the use of a single forcing dataset to derive their driving meteorology.
In this article, the limitations of using a single forcing dataset as the basis for developing background model error  is examined in the context of snow data assimilation.When significant errors are present in the forcing fields (e.g., precipitation), the resulting model and ensemble estimates will have significant errors.In such instances, the use of an ensemble of forcing datasets, either based on climatology or a suite of independent datasets, is likely to provide a better representation of the forcing uncertainty and the background model error.This article demonstrates these issues through both idealized and real data assimilation experiments.
The idealized experiment presents a case where the snow depth estimates are significantly underestimated due to the presence of precipitation biases.The application of stochastic perturbations using this biased precipitation input is inadequate in providing a realistic background model error in the assimilation system.As a result, the snow depth fields in the DA system remain biased, especially during the snow evolution and spring melt periods.In contrast, when an ensemble of forcing datasets is used to drive the model, the representation of the background model error is more realistic.As a result, the assimilation system performs better in incorporating the impact of observations during the snow evolution and ablation periods.
The impact of using a forcing ensemble for developing the background model error is examined for the assimilation of snow depth retrievals from the AMSR2 instrument, over two domains in the Continental USA with different snow evolution characteristics.Over the region near the Great Lakes, the snow evolution tends to be shallow, with transitions between snow and no-snow conditions during each snow season.In this region, the added use of the forcing climatology to drive the ensemble leads to improved DA performance, when compared to the in situ ground observations of snow depth.The DA performance is further enhanced with the use of an ensemble of forcing inputs, partly aided by the enhanced skill of the precipitation inputs.Over the Colorado headwaters, an area with large seasonal snowpacks, the impact of precipitation biases on the simulation of snow states is largely limited to the snow evolution and ablation time periods.As the occurrences of transitions between snow and no-snow states are less common during the peak winter months in this region, the underestimation of the background model error is less problematic in the DA integrations during these time periods.As a result, the positive impact of the use of forcing ensemble is mostly prominent during the accumulation and ablation time periods.
As noted above, the evaluation of snow depth estimates over CH region shows mixed results, with several locations indicating a worse performance with the use of the forcing ensemble compared to the use of a single forcing dataset.In regions with large snow accumulation (such as the CH region), passive microwave retrievals such as those from AMSR2 are known to have low skill due to issues such as saturation in deep snowpacks, signal loss in wet snow and overestimation in the presence of large snow grains (Dong et al., 2005;Foster et al., 2005;Durand et al., 2011).Such limitations contribute to the mixed results seen in these results, especially in the CH domain.In such instances, the poorer performance from the use of the forcing ensemble is a result of the poor skill of the retrievals.To improve the skill of the retrievals themselves, prior studies (Kumar et al., 2014;Liu et al., 2015) have successfully employed objective analysis techniques such as optimal interpolation to blend in situ measurements with satellite retrievals prior to assimilation.These prior studies and the results of this article suggest that a strategy that combines the use of hybrid forcing inputs (to improve background model error) and in situ data-based correction of observations to be assimilated (to enhance the satellite retrievals) is likely to provide a robust configuration for optimal DA performance.
It must be stressed that in the experiments presented in the article, the OL_FSNGL configurations purposely employ an inferior forcing dataset so that the differences between the OL_FSNGL, OL_FCLIM and OL_FENS simulations are more magnified.If the single forcing dataset being used is of high skill, then the added benefit of using the forcing ensemble is likely to be less, consistent with the results of more recent studies to employ an ensemble of forcing data for generating an ensemble of internally consistent model uncertainty representation for applications such as DA (Newman et al., 2015;Huang et al., 2017).Overall, the results in this article indicate that use of a forcing ensemble is helpful in providing better representations of background model error and more positive and consistent improvements in data assimilation.Note also that the use of an ensemble of forcing products www.hydrol-earth-syst-sci.net/21/2637/2017/ Hydrol.Earth Syst.Sci., 21, 2637-2647, 2017 may be practical in operational assimilation environments for centers with ensemble prediction systems.Where not available, the combined use of the forcing climatology along with the single, operational forcing input may be an appropriate strategy to improve the skill of the data assimilation system, as validated by the results in this paper.
Data availability.The citations to the datasets are included in the manuscript, which also provide information about their access and distribution.The NLDAS-2 forcing data used in this effort were acquired as part of the activities of NASA's Science Mission Directorate, and are archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC).
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.Schematic of the three strategies used to specify forcing uncertainty in the data assimilation integrations: (a) a single forcing dataset, (b) a single forcing dataset and its climatology and (c) an ensemble of forcing products.In all three cases, perturbations are applied to the forcing inputs to generate the ensemble.

Figure 2 .
Figure 2. Snow depth time series for the water year of 2012-2013 from the open loop (OL_FSNGL) and data assimilation (DA_FSNGL) integrations using a single forcing dataset, for the synthetic snow data assimilation experiment.The control simulation and the simulated observations are also shown.

Figure 3 .
Figure 3. Similar to Fig. 2, with the time series of model simulations from EXP-FCLIM and EXP-FENS included (a).The FCLIM experiments employ the use of a single forcing dataset and its climatology to force the ensemble and the FENS experiments employ the use of an ensemble of forcing datasets.The time series in (b) of the top and bottom figures compares the ensemble spread from the DA_FCLIM and DA_FENS integrations to the ensemble spread of DA_FSNGL integration, respectively.Panel (c) shows comparison of the analysis increments from DA integrations.

Figure 4 .
Figure 4. Two study domains with the 1 km terrain elevation (m) as the background: (a) GL domain and (b) CH domain.The yellow circles indicate the locations of the grid cells used for time series comparisons.
presents a "RMSE improvement" map (RMSE of DA with the single forcing (DA_FSNGL) minus the RMSE of DA with the hybrid forcing ensemble (DA_FCLIM or DA_FENS)) by comparing to the in situ snow depth measurements at the Global Historical Climate Network (GHCN; Menne et al., 2012) sites.The available station observations are aggregated up to the model resolution through simple averaging in these comparisons.The warm colors indicate locations where the DA_FCLIM or DA_FENS has a reduced RMSE compared to DA_FSNGL and the cool colors indicate locations where DA_FSNGL has an increased skill relative to DA_FCLIM or DA_FENS.As the figure indicates, the DA integrations employing hybrid forcing inputs are systematically better than the DA_FSNGL simulation in most parts of the domain.Comparatively, the RMSE improvements are larger in the DA_FENS integration than the DA_FCLIM simulation.Note that the improved skill of DA_FENS in particular, is benefited by both the improved model background and the skill of the precipitation data sources that constitute the forcing ensemble, though it is hard to separate their contributions.This is demonstrated by comparing the time series of model and DA simulations at two locations: point A, at 45.875 • N, 89.375 • W and point B, at 48.875 • N, 97.625 • W. As shown in Fig. 6a, the OL_FSNGL simulations significantly underestimate the snow evolution throughout the winter period of 2012-2013.The added use of the forcing climatology (OL_FCLIM) leads to overestimating the peak season snow (Feb-Mar) and marginally improves the late season snow.Similarly, the use of the forcing ensemble (OL_FENS) marginally improves the OL_FSNGL underesti-Hydrol.Earth Syst.Sci., 21, 2637-2647, 2017 www.hydrol-earth-syst-sci.net/21/2637/2017/ RMSE (DA_FSNGL) -RMSE (DA_FCLIM) RMSE (DA_FSNGL) -RMSE (DA_FENS) °°°° °°°°°°°°F igure 5. RMSE (mm) differences of snow depth fields from DA integrations using hybrid ensemble forcing strategies (DA_FCLIM and DA_FENS) relative to the DA integration using a single forcing (DA_FSNGL) over the Great Lakes domain, using GHCN data as the reference, for the time period of 2012 to 2015.Warm colors indicate locations where DA_FCLIM or DA_FENS provides a lower RMSE than DA_FSNGL and cool colors indicate locations where DA_FSNGL has a lower RMSE than DA_FCLIM or DA_FENS.
Same as Fig.5, but for the Colorado headwaters domain.

Table 1 .
Cumulative RMSE (mm) from various model and DA integrations at the four locations in the Great Lakes and Colorado headwaters domains used in the Figs.6 and 8.