Combining satellite data and appropriate objective functions for improved spatial pattern performance of a distributed hydrologic model

. Satellite-based earth observations offer great op-portunities to improve spatial model predictions by means of spatial-pattern-oriented model evaluations. In this study, observed spatial patterns of actual evapotranspiration (AET) are utilised for spatial model calibration tailored to target the pattern performance of the model. The proposed calibration framework combines temporally aggregated observed spatial patterns with a new spatial performance metric and a ﬂexible spatial parameterisation scheme. The mesoscale hydrologic model (mHM) is used to simulate streamﬂow and AET and has been selected due to its soil parameter distribution approach based on pedo-transfer functions and the build in multi-scale parameter regionalisation. In addition two new spatial parameter distribution options have been incorporated in the model in order

Abstract. Satellite-based earth observations offer great opportunities to improve spatial model predictions by means of spatial-pattern-oriented model evaluations. In this study, observed spatial patterns of actual evapotranspiration (AET) are utilised for spatial model calibration tailored to target the pattern performance of the model. The proposed calibration framework combines temporally aggregated observed spatial patterns with a new spatial performance metric and a flexible spatial parameterisation scheme. The mesoscale hydrologic model (mHM) is used to simulate streamflow and AET and has been selected due to its soil parameter distribution approach based on pedo-transfer functions and the build in multi-scale parameter regionalisation. In addition two new spatial parameter distribution options have been incorporated in the model in order to increase the flexibility of root fraction coefficient and potential evapotranspiration correction parameterisations, based on soil type and vegetation density. These parameterisations are utilised as they are most relevant for simulated AET patterns from the hydrologic model. Due to the fundamental challenges encountered when evaluating spatial pattern performance using standard metrics, we developed a simple but highly discriminative spatial metric, i.e. one comprised of three easily interpretable components measuring co-location, variation and distribution of the spatial data.
The study shows that with flexible spatial model parameterisation used in combination with the appropriate objective functions, the simulated spatial patterns of actual evapotranspiration become substantially more similar to the satellitebased estimates. Overall 26 parameters are identified for calibration through a sequential screening approach based on a combination of streamflow and spatial pattern metrics. The robustness of the calibrations is tested using an ensemble of nine calibrations based on different seed numbers using the shuffled complex evolution optimiser. The calibration results reveal a limited trade-off between streamflow dynamics and spatial patterns illustrating the benefit of combining separate observation types and objective functions. At the same time, the simulated spatial patterns of AET significantly improved when an objective function based on observed AET patterns and a novel spatial performance metric compared to traditional streamflow-only calibration were included. Since the overall water balance is usually a crucial goal in hydrologic modelling, spatial-pattern-oriented optimisation should always be accompanied by traditional discharge measurements. In such a multi-objective framework, the current study promotes the use of a novel bias-insensitive spatial pattern metric, which exploits the key information contained in the observed patterns while allowing the water balance to be informed by discharge observations.

Introduction
Reliable estimations of spatially distributed actual evapotranspiration (AET) are useful for various sustainable water resources management practices such as irrigation planning, agricultural drought monitoring and water demand forecasting in large cultivated areas (Wei et al., 2017). Distributed hydrologic models can potentially provide this insight since evapotranspiration (ET) is a major part of the water cycle. In spite of their ability to simulate detailed spatial patterns of a range of hydrological state variables and fluxes, distributed model evaluation remains focused on temporal aspects of the aggregated streamflow variable (Demirel et al., 2013;Schumann et al., 2013). We are interested in including spatial AET patterns in the model calibration using spatial parameterisations and complementary objective functions. Different methods exist that utilise satellite-based land surface temperature data to derive spatially detailed estimates of latent heat fluxes from the land surface and canopy on a scale relevant for catchment modelling (Kalma et al., 2008). Since AET cannot be measured directly by satellite, surface energy balance models are developed to estimate AET based on data from a range of spectral and thermal bands (Guzinski et al., 2013;Norman et al., 1995;Su, 2002). While these satellite-based estimates are usually employed as a tool to understand and improve the model parameterisations (Conradt et al., 2013;Hunink et al., 2017;Schuurmans et al., 2011), they can also be used to calibrate models (Crow et al., 2003;Immerzeel and Droogers, 2008;Zhang et al., 2009). Therefore, adding satellite-based observations to model calibration is not novel; however, specifically evaluating spatial patterns in the calibration has rarely been done (Stisen et al., 2011b). Interesting examples exist where model calibration could benefit from the spatial pattern information of actual evapotranspiration (Githui et al., 2016;Li et al., 2009;Zhang et al., 2009) and satellite-based recharge patterns (Hendricks Franssen et al., 2008). This paper utilises monthly patterns of AET first to understand and organize ET-related model spatial parameterisations and then to pursue a calibration. This is because adding only temporal aspects of the spatial observations to the objective function is not sufficient for achieving significant improvements in simulated spatial patterns if model parameterisation is not flexible enough to physically adjust to the observed pattern. Besides, the model structure, parameterisations and calibration schemes have usually been designed for streamflow optimisations (Vazquez et al., 2011;Velázquez et al., 2010). In order to ensure compatibility between the spatial pattern calibration target and model parameterisation, the flexibility of the spatial model parameterisation needs to be reconsidered. Recently, inadequate representation of spatial variability and hydrologic connectivity of a well-known distributed model (VIC) has been reported by Melsen et al. (2016). The mesoscale hydrologic model (mHM) has the flexibility to alter the spatial patterns via pedo-transfer function (PTF) pa-rameters and by including a multi-scale parameter regionalisation (MPR) scheme (Kumar et al., 2013;Samaniego et al., 2010). Mizukami et al. (2017) incorporated this MPR approach with VIC to estimate parameters for large domains based on geophysical data for 531 basins. The multi-basin calibration results using MPR revealed physically meaningful parameter fields without patchiness (discontinuities). The study by Loosvelt et al. (2013) is one of few other examples that incorporate PTFs for soil texture and moisture components of a hydrologic model.
All calibration strategies rely on the selection of performance metrics indicating the goodness of fit of the model to be optimized. Choosing an appropriate set of objective functions is crucial to build a robust calibration strategy, since there will be trade-offs between different objective functions or redundant information. In the hydrology literature, there are a range of different temporal metrics for hydrograph matching while metrics designed for spatial pattern matching are less common Rees, 2008). For distributed models, spatial metrics usually evaluate cell-to-cell correlation and deviations (e.g. Pearson's R and bias). The use of multi-component metrics as described for discharge by Gupta et al. (2009) is, however, rare for spatial pattern evaluation. An essential feature of our study is introducing a new spatial efficiency (SPAEF) metric that contains three components, i.e. correlation, variance and histogram intersection, providing reliable bias-insensitive pattern information unlike other traditional metrics focusing on only one aspect like correlation, mean squared error or bias.
Prior to model calibration, sensitivity analysis is usually conducted to attribute response of the model outputs to the changes in model parameters (Shin et al., 2013), which can enhance our understanding of both temporal and spatial model behaviour (Berezowski et al., 2015). In the context of spatial model calibration, the sensitivity analysis should not only identify the parameters that affect the water balance and hydrograph dynamics but also the parameters that shape the spatial patterns of the simulated states and fluxes. To achieve this, we have to design objective functions that reflect the spatial pattern of the models and utilise these in model parameter sensitivity analysis.
In light of the well-known equifinality problems in model calibration (Beven and Freer, 2001) spatial pattern evaluation can be useful for selecting the most appropriate parameter set from a group of sets leading to both reasonable streamflow performance and physically meaningful AET pattern. Immerzeel and Droogers (2008) showed how a semi-distributed model of a basin in southern India could be constrained by using spatially distributed observations with a monthly temporal resolution. Cornelissen et al. (2016) highlighted the need to identify which model parameters influence the simulated spatial pattern and showed that spatial patterns of simulated evapotranspiration were most sensitive to the landuse parameterisation, whereas precipitation was the most sensitive input data with respect to temporal dynamics of The main objectives of this study are to incorporate spatial patterns of satellite-based actual evapotranspiration data in the model calibration and validation. In order to improve AET simulations, we use transfer functions in the spatial model parameterisation that combine a priori maps of soil and vegetation properties with few global calibration parameters in order to enhance the spatial parameterisation flexibility and allow the parameter field to adjust to an observed spatial patterns of AET from the catchment. We also design a new multi-component metric specifically suited for comparing spatial patterns of two continuous variables. Here, we prioritise three main data properties, which are co-location, variation and distribution. The calibration is conducted using three strategies for objective function selection. First, streamflow metrics and spatial pattern metrics are used in isolation during calibration and subsequently they are combined in a more balanced model optimisation. In this way we can investigate the trade-offs and robustness of the different approaches by evaluating the performances regarding both streamflow and spatial patterns during calibration and validation.
2 Study area and data

Study area
The Skjern river basin is one of the most popular research basins in Denmark as it is highly instrumented for hydrological monitoring, including eddy-flux towers, a dense soil moisture network and other state-of-the-art monitoring of hydrological variables (Jensen and Illangasekare, 2011). The basin area is approximately 2500 km 2 , containing mostly sandy soils (Fig. 1). The river is the largest in Denmark by flow volume and located in the western part of the Jutland peninsula, a region dominated by agriculture and forests together covering ∼ 80 % of the domain (Larsen et al., 2016). The basin is mostly flat with a maximum altitude of 130 m and it receives a mean annual precipitation of around 1000 mm (Stisen et al., 2011a). The mean annual streamflow is around 475 mm and monthly mean temperatures vary from 2 up to 17 • C (Jensen and Illangasekare, 2011).

Satellite-based data
The Moderate Resolution Imaging Spectroradiometer (MODIS) polar orbiting platforms, Terra and Aqua, observe mid-latitude regions 4 times per day at a spatial resolution of approximately 1 km × 1 km. The two-source energy balance (TSEB) model proposed by Norman et al. (1995) based on the Priestley-Taylor approximation (Priestley and Taylor, 1972) is used in this study to calculate AET based on MODIS data under cloud-free conditions. The model inputs are land surface temperature (LST), solar zenith angle (SZA), and albedo and height of canopy, all derived from MODIS observations . Additional inputs such as climate variables of air temperature and incoming radiation are obtained from ERA-Interim reanalysis data (Dee et al., 2011). The main motivation of preparing a new AET dataset based on land surface temperature is that most other available products are  Greve et al. (2007) based mainly on vegetation index data which may not be sufficient to assess the complicated interplay among climate, soil and vegetation dynamics on the AET patterns, especially during the growing season. For more details on our newly produced AET data for Denmark, including equations, parameterisation, calibration and validation, please refer to the recent study by Mendiguren et al. (2017).
In this study, all remote-sensing-based AET data were averaged for each month during the growing season across all years for the model calibration period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008), resulting in six monthly mean maps from April to September representing AET under cloud-free conditions. This ensures that in spite of uncertainty in the individual instantaneous midday estimates of AET, the monthly maps represent the general spatial pattern for each month under cloud-free conditions. The individual daily AET patterns are evaluated for temporal consistency by calculating the Pearson correlation between each daily pattern and the monthly mean pattern for the given month. This analysis showed that the overall average correlation between an individual day and the monthly mean was 0.82. The satellite-based monthly AET maps are validated against eddy-covariance measurements for three different land cover types (forest, cropland and wetland) within the Skjern catchment and display good agreement on the monthly timescale . Despite not being pure observations but rather estimates from an energy balance model based on satellite observations, we will refer to these AET maps as reference observations. Based on the sensitivity analysis in Mendiguren et al. (2017), which showed that the TSEB is largely controlled by the satellite input of LST, which can be considered an observation, it is assumed that the TSEB AET estimates represent spatial patterns of AET that are suitable for pattern evaluation of the hydrological model.

Hydrologic model
The mesoscale hydrologic model is a distributed model providing various simulated spatial outputs, fluxes and states at different spatio-temporal model resolutions (Samaniego et al., 2010. The model includes pedo-transfer functions for soil parameterisation and originally contains 53 global parameters that can be adjusted during calibration. In this study, some parameters are fixed at a default value and others have been added from the new spatial model parameterisations resulting in a total of 48 global parameters for further analysis. The model simulates major components of the hydrologic cycle, i.e. interception, infiltration, snow accumulation and melting, evapotranspiration, groundwater storage, seepage, and runoff generation. The readers are referred to the study by Samaniego et al. (2010) for full model description, assumptions, limitations and process formulations. Table 1 provides a summary of the modelling data used in this study. As shown in the table, meteorological data can be on a different spatial scale than both morphological data and the model scale. This flexibility arises from the fact that mHM incorporates a multi-parameter regionalisation technique to swap between different scales while calculating all fluxes and routing streamflow on a preferred model scale. We run the model on 1 km × 1 km spatial scale and at daily time step. Some processes like ET are calculated at an hourly time step then the final results are aggregated to daily values. All morphological data are prepared on 250 m × 250 m scale. All three meteorological datasets, i.e. P , ET ref and T avg , were originally at 10-20 km resolution. We re-sampled them to 1 km × 1 km using cubic interpolation. This interpolation method is used to avoid patchiness in model simulations due to coarse grids on the native scale of the metrological data. We use 12 monthly leaf area index (LAI) maps to represent the climatology for both interception and PET correction for the entire period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) and the model warm-up period (1997)(1998)(1999)(2000).

Spatial model parameterisation
In order to facilitate a meaningful spatial-pattern-oriented calibration of a distributed model, we need to compromise between comprehensive (each cell in the basin) and lumped (one cell -one basin) parameterisations, as the first approach may require immense computer resources during calibration and the latter approach usually results in a uniform pattern. For instance, in a detailed calibration study by Corbari et al. (2013), each pixel in the catchment is represented by a parameter whereas, in a coarse parameterisation, a uniform parameter represents the entire catchment .
In this study, we follow an intermediate level of parameterisation comprised of several flexible spatial parameters and nonlinear equations, allowing us to stretch the spatial contrast of simulated actual evapotranspiration based on soil and vegetation properties. This level of parameterisation is still physically meaningful as the parameters are tied to the land surface characteristics of the basin via transfer functions.

Distributed root fraction coefficient
Root distribution with depth is generally perceived as being a function of vegetation type (Jackson et al., 1996), and our spatial parameterisation of root fraction distribution is initially separated based on land covers of forest and agricultural crops. However, following the site-specific soil and plant physical literature (Jensen et al., 2001;Madsen and Platou, 1983), we subdivide the root fraction coefficient for agricultural crops as a function of field capacity (FC). Here, spatial model parameterisation is implemented to the root fraction calculation in the original mHM structure which follows the asymptotic equation for vertical root distribution (Eq. 1) proposed by Jackson et al. (1996).
where Y is the cumulative root fraction from soil surface to depth d (cm), and β c is the root fraction coefficient. We substituted the root fraction coefficient for agricultural crops (non-forest) with two new root fraction parameters, i.e. one root fraction for maximum FC (clay) and one for minimum FC (sand), which allow for full spatial distribution of root fraction with varying FC. This relation between soil characteristics and effective rooting depth is based on a sitespecific database with more than 100 soil and root profiles collected in Denmark (Table 19.4 in Jensen et al., 2001) and the literature focusing on soil texture and effective rooting depths in Denmark (Madsen, 1985(Madsen, , 1986Madsen and Platou, 1983). The approach is not necessarily globally valid, but designed for the specific region of western Denmark where very sandy soils ( Fig. 1) are cultivated for agricultural purposes even though the soil properties influence root development. These parameters are used to form the root fraction coefficient for soil with agriculture (β agriculture ) based on field-capacity-dependent root fraction in Eqs. (2) and (3).
where FC norm is the normalised field capacity ranging from 0 to 1.
where β agriculture is the new root fraction for soil with agriculture comprised of root fraction for clay (β max ) and root fraction for sand (β min ).

Dynamic ET ref scaling function
As a second spatial parameterisation step, we incorporated remotely sensed vegetation information, to downscale coarse climatological reference evapotranspiration (ET ref ) to the model scale. This was done to emphasise the effect of vegetation on the simulated spatial patterns of AET. The original scaling factor in mHM is based on a lumped minimum correction and an aspect-driven additional term. Using aspect ratio for ET ref correction makes sense in mountainous areas; however, this is found to be irrelevant for the Skjern basin which is characterized by a low topographical variation. The dynamic scaling function introduced here allows the modeller to superimpose the imprint of LAI on the simulated AET patterns via a downscaling of the ET ref . The concept of a dynamic scaling function (DSF) is similar to the concept of a crop coefficient used to convert ET ref to a potential evapotranspiration (ET pot ) for a given vegetation that differs from the reference crop. Our implementation follows the equation for estimating the crop coefficient for natural vegetation originally proposed by Allen et al. (1998). Similarly, Hunink et al. (2017) compared different applications of crop coefficients based on remotely sensed vegetation indices in hydrologic modelling. They found that the effect of crop coefficient parameterisations on the water balance is trivial and constant throughout the year; however, it has a major effect on seasonal evapotranspiration and soil moisture fluxes, showing the role of crop coefficients in spatial calibration. The DSF, shown in Eq. (65), is simply a time-space variable implementation of the crop coefficient for natural vegetation, parameterized through a spatio-temporal LAI (no unit) component accounting for the effects of characteristics that separate the actual vegetation from a reference grass (well-watered 10 cm height and albedo of 0.23). These characteristics include specific land cover, albedo and aerodynamic resistance (Allen et al., 1998;Liu et al., 2017). This ensures a physically meaningful downscaling from a coarse (here 20 km) ET ref grid to the model resolution (here 1 km).

Methods
In this study, we applied a recently developed sequential screening method (Cuntz et al., 2015) to select important parameters for calibration. Since different parameters can be sensitive to different hydrologic processes, we tested three different performance metrics to evaluate process-parameter relationships. Two of these metrics are derived from the hydrograph, i.e. Kling-Gupta efficiency (KGE, Gupta et al., 2009) and KGE of only below-average streamflow (KGE low ), whereas the spatial efficiency metric focuses on the spatial pattern of actual evapotranspiration.

Objective functions
As an objective function for streamflow performance, we chose the Kling-Gupta efficiency, shown in Eq. (6) , and applied it to both the entire time series and to the low-flow part of the hydrograph (below mean discharge).
where α Q is the Pearson correlation coefficient between observed and simulated discharge time series, β Q is the relative variability based on the fraction of standard deviation in simulated and in observed values, and γ Q is the bias term normalised by the standard deviation in the observed data. Since comparison of two spatial pattern maps is of obvious importance, a bias-insensitive spatial performance metric is developed and used in this study. In this context, we adopted the structure of the Kling-Gupta efficiency while substituting the standard deviation term by a term based on the coefficient of variation σ O /σ S and replacing the bias term with a histogram comparison index to compare the intersection percentage of two histograms of observed and simulated spatial maps. The histogram intersect is performed after normalisation of the observed and simulated maps to a mean of 0 and standard deviation of 1 (z score). This ensures that the histogram comparison is unaffected by any bias or variance differences and solely reflects the agreement in distribution of the variable in space. The main utility of the histogram comparison is that it distinguishes between different soil and vegetation groups reflected in the spatial pattern results. This unique feature of being sensitive to clusters in the data compliments the other two components in the equation, in particular the correlation coefficient (α in Eq. 7) since α is highly vulnerable to very distinct clusters of points aligned on a diagonal axis. This can result in high correlation coefficient values in spite of low correlation inside the individual clusters inevitably misleading the model calibration. The separated clusters often occur in environmental models where different land-use classes and soil classes etc. can produce patchy spatial patterns. The new spatial efficiency metric (optimal value equals to 1) is defined as follows: where α is the Pearson correlation coefficient between the observed AET map (A) and simulated AET map (B) for a particular month, β is the fraction of coefficient of variations representing spatial variability, and γ is the percentage of histogram intersection (Swain and Ballard, 1991). The gamma (γ ) is calculated for a given histogram K of the observed AET map (A) and the histogram L of the model simulated AET map (B), each one containing n bins, i.e. herein 100 bins. The maps are standardised to a mean of 0 and a standard deviation equal to 1 (z score) to avoid the effect of different units. In this study, we compare AET from TSEB (in W m −2 ) based on instantaneous satellite data with daily averaged AET (mm day −1 ) simulated by the model and regard the satellite-based AET maps as the "observation" even though they are more accurately AET "estimates" based on satellite observations. Attempts to use numerous other spatial metrics including Mapcurves, fractions skill score (FSS), Goodman and Kruskal's lambda, Theil's Uncertainty, empirical orthogonal functions (EOFs) and Cramér's V (Cramér, 1946;Koch et al., 2015;Rees, 2008) did not distinguish the general AET patterns or the spatial efficiency metric. The strength of the spatial efficiency metric is that each component contains different and non-overlapping information. Moreover, the components are straightforward compared to the aforementioned metrics. While the correlation term (α) expresses only the spatial correlation of AET values, the coefficient of variation term (β) expresses only the range or contrast in the image while the histogram term (γ ) only expresses the agreement on histogram shape without considering either variation or correlation. Since all three terms are bias-insensitive, the spatial efficiency only constrains the model simulations with the pattern information in the satellite data while leaving the water balance (bias) to be constrained by streamflow metrics.

Sequential screening of the model parameters
We applied the variance-based sequential screening (SS) method introduced by Cuntz et al. (2015)  For this approach the parameters are sampled in trajectories as initially described by Morris (1991) and improved by Campolongo et al. (2007). Each trajectory consists of (N + 1) parameter sets, assuming that N is the total number of model parameters. The first parameter set in each trajectory is sampled randomly while all the subsequent sets i (i > 1) differ to the prior set (i − 1) in exactly one parameter value. Therefore, the whole trajectory is a path through the parameter space. Trajectories allow us to sample the whole parameter space efficiently and consider parameter interactions to certain extents. In the approach of Cuntz et al. (2015), only a small number (M 1 ) of such trajectories are sampled to lower the computational burden. The resulting (M 1 × (N + 1)) model outputs are derived and the elementary effects (EEs) are computed for each parameter. The EEs are then used to identify the most informative parameters by deriving a threshold splitting the parameters into a set N u of uninformative and a set N i of informative ones. In the following, the first parameter set is again sampled randomly but then only the uninformative parameters are perturbed meaning that the new trajectory only consists of (N u + 1) parameter sets. The derivation of model output and calculation of EEs is repeated. The major step is to determine whether one of the previously uninformative parameters is now above the threshold and if so it is added to the set of informative parameters N i . These steps are repeated until no further parameter is added to the set N i . At the end M 2 trajectories are sampled to confirm that the set of uninformative parameters N u is stable and no further parameter would be found to be informative.

Model calibration and validation
We calibrated the 1 km daily mHM for the Skjern basin in Denmark using the well-known global search algorithm Shuffled Complex Evolution method from the University of Arizona (SCE-UA) (Duan et al., 1992). The SCE-UA algorithm is configured with two complexes running in parallel with 53 (2n + 1) parameter sets in each complex and 27 (n + 1) parameter sets per sub-complex. Moreover, the maximum relative objective function change is set to 1 % over five iterations as the model convergence criterion. This criterion was usually reached after 3500 runs; in rare cases up to 8000 runs were necessary. We evaluated the differences between monthly AET estimates from the TSEB reference data and simulated AET from the hydrologic model for the calibration period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) and validation period (2009)(2010)(2011)(2012)(2013)(2014).
The two streamflow stations are defined separately to follow the improvements in each metric throughout the calibrations. After testing different combinations of streamflow and spatial metrics, we chose two streamflow metrics (KGE and KGE low ) and one spatial efficiency metric, given by Eqs. (6) and (7), respectively. These objective functions are used individually or combined in three model calibration cases based on (i) only streamflow using equally weighted KGE and KGE low , (ii) only spatial patterns of AET using spatial efficiency, (iii) both equally weighted streamflow and spatial pattern matching using all three metrics. It should be noted that the case 2 calibration is designed as a benchmark to explore how good the pattern match can get when not considering streamflow performance, even though the solution might not be interesting from a hydrological perspective, since the bias insensitive spatial pattern metric does not secure a reasonable water balance. To test the overall robustness of the calibration framework we use an ensemble of nine calibrations for case 1 and nine calibrations for case 3, each started from a different seed number. In order to fairly weigh the objective functions, we retrieve the residuals (ε) from the three objective functions based on a random initial model run (Eqs. 8-10). We calculate the new weights which will result in equal contribution to the total error ( total ), i.e. 50 % from spatial metric and 50 % from the two streamflow metrics. Ideally, if it exists the optimiser searches a parameter set resulting in zero total otherwise the closest point to zero will be considered as the optimum solution.
where Q is the total for streamflow of the two streamflow gauges and Spatial is the total for spatial performances of six summer months. For Q-only calibration, the weight for SPAEF (ω SPAEF ) becomes zero whereas for spatial-only calibration the weights for KGE and KGE low become zero. Table 2 shows the sequential screening results based on KGE, KGE low and SPAEF. Each objective function reflects on different spatio-temporal dynamics of the catchment. While KGE and KGE low evaluate high and low streamflow dynamics and biases, the bias-insensitive SPAEF focuses on only spatial patterns of AET. From the results it is clear that some of the highly sensitive parameters for streamflow dynamics, especially interflow-related parameters, groundwater-related geology parameters and single routing parameters, have minor to zero influence on the spatial patterns of AET. The new ET parameters, ET ref -a (non-forest), -af (forest), -b andc are identified to be informative based on all objective functions. The root fraction coefficient for forest (rotfrcoffore) appeared to be not very important for streamflow metrics whereas it is crucial for SPAEF. Similarly, the two newly introduced parameters, i.e. root fraction coefficient for sand and clay (i.e. rotfrcofs and rotfrcofclay) soil, are informative based on all three objective functions. Organic matter for forest (orgmatforest) is especially important for low flows whereas organic matter for impervious areas (orgmatimper) has zero influence on spatial patterns of AET. The exponent slow interflow (expslwintflw) parameter is found to be most informative for low flows while recharge coefficient (rechargcoef) is most informative for streamflow and ET refaf is most informative for calibrating spatial patterns of AET. On average 475 model evaluations are required to split the total number of 48 parameters into informative and uninformative ones. However, the number of iterations is dependent on objective function; therefore, 449 model runs were required for KGE, 431 model runs for KGE low and 544 model runs for SPAEF. This is in close agreement with the computational budget of 10N model evaluations already reported by Cuntz et al. (2015). This also makes the sequential screening method computationally very attractive compared to other global search methods. However, the computational advantage is at the cost of exploring a larger part of the parameter space, and hence the sequential screening is mostly valuable for identifying informative and non-informative parameters prior to calibration or further assessment of the parameter be-haviour. Overall, these results show that there are 26 parameters above the threshold of 1 % of at least one case (Table 2). In principal the parameters with zero sensitivity (SPAEF column) can be fixed at some value during calibration, which may lead to faster convergence, with a lower degree of freedom. However, we include the same set of 26 parameters in all three calibration cases for consistency.

Model calibration and validation
The mHM model is calibrated using streamflow records (gauges A and B in Fig. 1) from an 8-year period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) and validated for a recent period (2009)(2010)(2011)(2012)(2013)(2014). A preceding 4-year period (1997-2000) is used for model warmup. We prepared remotely sensed monthly averaged AET pattern maps calculated for these years considering only cloud-free days from summer months. AET patterns of winter months are not considered since it is mostly cloudy and ET is very low and uniform (energy limited) in winter.
The 26 selected parameters from SS are used in the following three calibration strategies: (1) only streamfloworiented (Q-only) calibration using equally weighted KGE and KGE low , (2) only spatial-pattern-oriented calibration using SPAEF, and (3) streamflow and spatial patterns of AET together using all three objective functions with equal weights of 50 % on spatial metric and 50 % for the two Hydrol. Earth Syst. Sci., 22, 1299-1315, 2018 www.hydrol-earth-syst-sci.net/22/1299/2018/  Table 3 provides the overall picture of the three different calibration strategies where two of these strategies are based on an ensemble of nine calibrations. Therefore, the basic descriptive statistics are also given as robustness indicators. The results show that the combined calibration (Q and Spatial) produces similar results to both Q-only and Spatial-only calibrations focusing on streamflow and spatial patterns of AET respectively. whereas the singlemetric calibrations gave very different results for the opposite objective functions, e.g. SPAEF versus streamflow metrics. It is interesting that when comparing the calibration ensemble with the median performance there is very limited trade-off between the Q-only and the combined Q and Spatial calibrations, which have very similar average KGE values. When looking specifically at the best-performing ensemble member with lowest total , there is a more pronounced trade-off between the Q-only and Q and Spatial together calibrations, as the streamflow performance is poorer when SPAEF is included in the group of objective functions. The differences in the streamflow metrics indicate that each objective function carries relevant but slightly conflicting information. Moreover, the results show that the hydrologic model simulates the best AET patterns in different months for different ensemble calibrations. In other words, while one ensemble member has the best performance for April, other calibrations may have the best performance for May and June. This is a secondary trade-off which illustrates that the calibration might benefit from temporal variability in the parameters controlling the spatial parameterisation scheme. It should be noted that ranking of the calibrations within the two ensembles is based on the overall that is comprised of all objective functions for the corresponding calibration. For that reason, the best member of Q and Spatial calibration holds the lowest total comprised of the highest possible KGE, KGE low and SPAEF at the same time but not necessarily the highest SPAEF alone.
This resulted in a slightly lower SPAEF mean of 0.395 for the best member compared to the median member with a SPAEF mean of 0.396 (Table 3).
The results of the Q-only model calibration using only KGE and KGE low reveal very poor simulated patterns of AET, with negative SPAEF for all months. This is not surprising since this calibration is not constrained regarding the spatial patterns, but also illustrates that discharge observations alone contain no spatial pattern information of AET. In contrast, the spatial-only calibration using only SPAEF shows a very poor water balance, with negative KGE and a large bias. We are aware that spatial-only calibration is not applicable or meaningful for hydrologic studies.
The model performance development through the calibrations (9 + 9) and optimum points are shown using scatter plots in Fig. 2, which displays all model runs with values inside the specified plot ranges. The scatter plots illustrate trade-offs between objective functions and consistency among calibration ensemble members. The performance regarding spatial patterns ( Spatial ) displays a high degree of trade-off with all combined calibrations achieving Spatial values around 0.8 whereas the Q-only calibrations achieve Spatial values ranging from 2.8 to 4.4. There are two main clusters in the Q-only calibrations: one around 0.11 Q and the other around 0.25 Q , whereas all nine Q and spatial calibrations follow a similar level on the y axis ( Spatial ). It is surprising to see that SCE-UA did not always find the same optimum solution with varying seed number, which is the case mainly for the Q-only calibration. Perhaps more consistent optimum solutions for the Q-only calibrations could have been achieved with tighter stopping rules and the same initial parameter sets.
Similarly, the grey shades in Fig. 3 show the ensemble range of simulated hydrographs for the Q-only and Q and Spatial calibrations. From the hydrographs it is clear that the ensemble range for station A is generally larger than that for station B, indicating larger uncertainty for sub-basin A. Interestingly, Fig. 3 also illustrates that the Q and Spatial calibration constrains the solution better, not only in AET simulations, but also in streamflow simulations, as indicated by the slightly narrower range in simulated streamflow for the Q and Spatial calibrations. However, even though the range of hydrographs is slightly narrower the simulations are also further from the observed measurements during summer months.
The corresponding simulated AET maps for the results presented in Table 3 are shown in Fig. 4. This figure illustrates the monthly mean maps across all years of actual evapotranspiration for the cloud-free days available for the remote sensing estimates. Only the best-performing members from the two ensembles are presented in this figure. The maps are normalised with their mean value to use one represen-tative colour bar in the legend. As indicated in Table 3, the resultant maps from Spatial-only (third row in Fig. 4) and Q and Spatial calibrations (fourth row in Fig. 4) are obviously more similar to the reference monthly maps (first row in Fig. 4) than the maps of Q-only calibration (second row in Fig. 4). The results clearly show that the model can simulate month-to-month variations in AET patterns reasonably well. The poor AET performance in the Q-only maps is obvious in the second row of Fig. 4, where we see only a uniform simulated AET pattern except for the forest areas revealing very little information about variability in AET and the influence of soil and vegetation. This is due to the fact that the KGE and KGE low objective functions contain no information on the patterns of AET resulting in an unconstrained optimisation regarding spatial pattern and variability. Therefore, the optimiser randomly moves in the SPAEF solution space and picks the best streamflow performance with no regard to Hydrol. Earth Syst. Sci., 22, 1299Sci., 22, -1315Sci., 22, , 2018 www.hydrol-earth-syst-sci.net/22/1299/2018/ AET patterns. Although not perfect (average SPAEF = 0.46 and 0.40), the simulated pattern match in the last two rows of Fig. 4 is quite good compared to the remote-sensing-based estimate since the simulation is able to represent the general pattern influenced by soil, vegetation and land cover while maintaining a similar variance and smoothness. Table 4 shows the same results as Table 3 but for the validation period spanning from 2009 until 2014. Obviously, the results are somewhat poorer than those for the calibration period. A drop in performance for spatial-only and combined metrics is mainly seen for KGE low and the total bias, whereas the SPAEF for Spatial-only and Q and Spatial remains similar to the calibration periods with average SPAEF around 0.4. Interestingly, there is no real trade-off for streamflow metrics between Q-only and Q and Spatial calibrations for the validation period, even for the best-performing ensemble member. Although a better streamflow performance could be achieved by Q-only calibration during calibration, this cannot be sustained during validation, indicating some overfitting when using streamflow metric only. In contrast, the SPAEF performance does not drop during validation for the combined Q and Spatial optimisation, indicating less overfitting and a more robust model parameterisation.

Discussion
In the initial phase of the study numerous flawed calibrations were carried out in an attempt to produce simulated spatial patterns of AET similar to the satellite-based reference patterns. However, the inability to produce similar patterns was found to be caused by limitations in spatial model parameterisation and spatial performance metric choice. Regarding the spatial parameterisation, the initial model was based on a spatially uniform parameterisation of root fraction coefficient and PET correction factor, two parameters with major control on the simulated AET. Therefore, more flexible yet physically meaningful parameterisations were implemented where full spatial variability was enabled by combining 2-3 calibration parameters to initial spatial distributions of soil type and LAI. Regarding the use of appropriate spatial performance metrics, the initial attempts using standard metrics of correlation coefficient, Mapcurves (Hargrove et al., 2006), coef-  Calibrations are evaluated for monthly averages from April to September using cloud-free days. Note that these maps are normalised with their mean to use one representative colour bar and highlight the pattern information.
ficient of variation, Goodman and Kruskal's lambda (Goodman and Kruskal, 1954), agreement coefficient (Ji and Gallo, 2006), Theil's uncertainty, EOF, and Cramér's V (Cramér, 1946;Koch et al., 2015;Rees, 2008) proved to be inadequate in a calibration framework, since undesired visual patterns were achieved, e.g. with high correlation, but toolow standard deviation or highly separate clusters. Therefore, we developed the SPAEF metric which proved to be very efficient for calibrating the model to a satisfying spatial pattern by combining correlation coefficient, coefficient of variation ratio and histogram overlap in a robust metric that guides the model calibration well. It is our experience and recommendation that incorporating the spatial dimension in all aspects of the distributed hydrological model development from model structure, parameterisation, metric selection, sensitivity analysis and calibration is essential in order to achieve significant improvement in the spatial pattern performance of a model. We believe that traditional downstream discharge measurements contain much more accurate and robust information on the overall water balance compared to the non-continuous remotely sensed estimates, and therefore, the model constraint on biases should only originate from these streamflow observations. Conversely, it is well-known that aggregated streamflow measurements contain no information on spatial patterns upstream of the measurement (Stisen et al., 2011b). Therefore, the combination of satellite-derived patterns and aggregated streamflow measurements are an ideal way of constraining distributed hydrological models. In fact, spatial patterns should always be considered when evaluating distributed models. Even if detailed satellite estimates are not available, expert judgments and land cover information should be used to select the most appropriate parameter set (producing the most likely spatial patterns) among equally likely solutions obtained through discharge-only calibration. When a distributed model is applied, ideally it should not only produce satisfying discharge simulations, but at the same time it should also produce realistic spatial patterns of states and fluxes such as AET and soil moisture. White et al. (2017) also highlighted the importance of getting the spatial patterns right in their study since constraining the model against streamflow alone did not secure robust land cover change scenario modelling. The monthly spatial maps are built based on the AET patterns from cloud-free days. Here, we ignore the temporal aspect and focus only on the consistent spatial patterns for each month of the growing season. The advantage of this approach is that only the main information content of the satellite data, their spatial patterns, are utilised while the uncertainty associated with the absolute values of the AET estimates are not influencing the calibrations. In addition, the simulated monthly mean AET maps reflect mainly the model parameterisation and to a lesser degree the day-to-day variation in climate forcing. This is desirable since the aim of the model calibration is to optimise the model parameterisation with a given climate forcing dataset. The current calibration frame-work builds on the assumption that the satellite-based estimate of AET patterns approximate an observed pattern that is suitable for model optimisation. In general, the calibration approach is deterministic by nature and does not consider error or uncertainties in either observed discharge or AET patterns. Future work could add this component to the approach. However, assessment of the uncertainties in the observed spatial patterns are far from straightforward, since the uncertainties of interest with the proposed approach are solely related to the uncertainties related to the spatial patterns and not to biases. Therefore, quantification of pattern uncertainties would require a very dense network of actual evapotranspiration measurements.
The calibration results obtained in the current study, where three strategies were tested with varying combinations of objective functions, showed that with an appropriate metric design, limited trade-offs can be achieved when combining streamflow and spatial pattern metrics in a joint calibration framework. This is largely attributed to the nature of the metric, as the spatial performance metric is bias-insensitive whereas the streamflow metrics have very little sensitivity to spatial redistribution of AET patterns as long as the spatial averages remain unchanged. Bias and temporal variability of satellite-derived AET estimates could also be useful for model optimisation; however, in this study, we deliberately limited the information content of the satellite data to address the spatial patterns. This was done because even though the satellite-based AET estimate is validated against eddycovariance stations  they only represent specific cloud-free days, limiting their value to assess the long term water balance of the catchment. The calibration results using only streamflow metrics revealed that this traditional calibration target cannot guarantee satisfying spatial pattern performance even though the model structure and parameterisation framework enables this without much compromise, as illustrated by the performance of the combined Q and Spatial calibrations which resulted in very similar performance of both streamflow and spatial patterns as the single objective calibrations individually.
The spatial model parameterisation applied in Skjern catchment can be site-specific due to the uniform land use (agricultural cropland) across soils ranging from very coarse sandy soil to more loamy soils whereas the calibration framework and SPAEF metric can be applied to any other river basin in the world. Regarding the dynamic scaling function, developed for incorporating remotely sensed LAI in ET ref scaling, it should be noted that the use of LAI to describe the deviation of each grid cell from the assumed reference grass is a simplification. Albedo could also have been included in the dynamic scaling function; however, one could argue that albedo and LAI are somewhat correlated and including one of them is already contributing the information about the other (Chen et al., 2005;Liu et al., 2017;Stisen et al., 2008). Moreover, we limit this study to temporally averaged spatial patterns of AET and deliberately choose to ignore the day-to-day dynamics of AET. In this study, spatially varying but temporally constant field-capacity-dependent root fraction is utilised; however, it would be more elegant and physically more sound to represent the seasonality in root-growth dynamics more realistically by implementing a seasonally varying root fraction coefficient (beta) that is similar to the concept of LAI-based PET correction using the DSF module.

Conclusions
Our study aimed at parameterising a distributed hydrologic model for simulating distributed actual evapotranspiration patterns before an ensemble calibration using satellite-based data. This order is crucial for progressive hydrologic modelling with flexible model structure based on open-source philosophy. All these steps should be suitable for the catchment to give the model enough flexibility to adjust to pattern observations. The calibration efforts will have a limited effect on spatial patterns if the model parameterisation has not been investigated with pattern performance in mind. Ideally, the models should offer different parameterisation schemes or at least have room for development based on open-source philosophy so that we can test different spatial parameterisations for a particular calibration goal. Here, we implemented a field-capacity-dependent root fraction coefficient determining the root profile over depth for different soil and vegetation types. We introduced a dynamic scaling function which imprints the leaf area index in the potential evapotranspiration. After organising the spatial parameterisation of the model in a parsimonious manner, we also reduced the number of parameters using sequential screening. Only the informative parameters from the sequential screening are used in the subsequent ensemble calibration exercise. We then assessed the effect of different calibration strategies including monthly spatial patterns of actual evapotranspiration in combination with traditional streamflow observations. In the spatial calibration, the agreement between observed and simulated spatial patterns is added as a part of the objective function used for model optimisation. For that a multi-component bias-insensitive spatial efficiency metric is used to evaluate the simulated AET maps. The following conclusions can be drawn from our results: -Preparing the model parameterisation for spatial calibration is a key element for achieving the calibration objectives. More specifically, the model parameterisation needs to be designed to allow the spatial parameter distribution to be optimized through calibration.
-The newly proposed spatial efficiency metric (SPAEF) has proven to be robust and easy to interpret due to its three distinct and complementary components of correlation, variance and histogram matching.
-Based on the multi-component calibration results, including spatial pattern information in calibration sig-nificantly improves the spatial model simulations while maintaining similar streamflow performance. For the combined calibration, there is a limited trade-off between streamflow and spatial patterns for the bestperforming calibration ensemble compared to the Qonly calibration. However, this trade-off disappears in the validation test, indicating that a more robust parameter set is achieved during the combined Q and Spatial calibration.
Overall, the hydrological modelling community can benefit from building familiarity with several aspects of spatial model evaluation, including spatial parameterisation and multi-component spatial performance metrics.