HESS Opinions : The need for process-based evaluation of large-domain hyper-resolution models

A meta-analysis on 192 peer-reviewed articles reporting on applications of the variable infiltration capacity (VIC) model in a distributed way reveals that the spatial resolution at which the model is applied has increased over the years, while the calibration and validation time interval has remained unchanged. We argue that the calibration and validation time interval should keep pace with the increase in spatial resolution in order to resolve the processes that are relevant at the applied spatial resolution. We identified six time concepts in hydrological models, which all impact the model results and conclusions. Process-based model evaluation is particularly relevant when models are applied at hyper-resolution, where stakeholders expect credible results both at a high spatial and temporal resolution.


Introduction
One of the famous paradoxes of the Greek philosopher Zeno of Elea ( ∼ 450 BC) concerns a shot arrow (Fearn, 2001): "If one shoots an arrow, and cuts its motion into such small time steps that at every step the arrow is standing still, the arrow is motionless, because a concatenation of non-moving pieces cannot create motion."Only ages later, this reasoning could be refuted by the invention of integral and differential calculus by Newton and Leibniz (Stillwell, 1989), accepting infinitely small rates of change.Motion is a change of location over time, thus motion links time and space.
In hydrology, it is essential to understand and predict the motion of water within the Earth system, which implies that both space and time have to be considered.In hydrological models space can be accounted for by using distributed (spatially explicit) models, where space is "cut in small pieces", to paraphrase Zeno.Different types of distributed hydrological models exist; Todini (1988) distinguished roughly two different classes.The first class consists of distributed differential models.These models explicitly simulate lateral fluxes by means of differential equations.The second class are the distributed integral models, which consist of onedimensional columns and ignore lateral fluxes between the columns (lateral fluxes can be accounted for with an extra routing scheme, although this does not allow for lateral redistribution).These models have a wide application in land surface modelling (Clark et al., 2015).In this discussion we focus on the latter.
The constant development in computational power, the increased understanding of physical processes, and the increased availability of high spatial resolution hydrological information stimulated the development of increasingly complex and distributed hydrological models (Boyle et al., 2001;Liu and Gupta, 2007).Increasing the spatial resolution of global hydrological models (GHMs) has been labelled as one of the current "grand challenges" in hydrology by Wood et al. (2011) and Bierkens et al. (2014), who call for global modelling at the so-called spatial hyper-resolution (∼ 1 km and smaller).Arguably, there is a growing societal need for hydrological information at the (sub-)kilometre scale.Whereas model products at the 1 or 0.5 • resolution may provide rel-L.A. Melsen et al.: Process-based evaluation of hyper-resolution models evant information for policy makers at the (inter)national level, hyper-resolution results will become relevant for local water managers or even individual farmers (see e.g.Bastiaanssen et al., 2007).The scientific challenge is not to simply provide information based on a model with default parameters, but to provide credible information that matches the actual situation in the field at a temporal resolution, which is consistent with the spatial resolution of the model.The temporal and spatial scales are linked through the characteristic speed (including both velocity and celerity; see McDonnell and Beven (2014)) of the involved hydrological processes (Blöschl and Sivapalan, 1995), the so-called process scale; see Fig. 1.The Figure shows that there is a general tendency for the temporal process scale to decrease with the spatial process scale, although there is quite a broad bandwidth and local changes might occur stepwise.Policy makers might be able to deal with model products at a monthly resolution, whereas resource managers and farmers expect, at the spatial hyper-resolution, credible model products with a daily or hourly resolution.
Although increasing the spatial resolution of hydrological models is claimed to provide the opportunity to improve physical process representation (Bierkens et al., 2014;Bierkens, 2015), almost every hydrological model requires calibration of the model parameters (Beven, 2012).Models can contain conceptual parameters, which have no directly measurable physical meaning and thus need calibration.In addition, the measurement scale of parameters which do have a physical meaning often differs from the model scale, making calibration necessary to determine the effective parameter values to account for sub-grid variability (Kim and Stricker, 1996).Beven and Cloke (2012) responded to the hyper-resolution challenge by emphasizing that the focus of hydrologic modelling should be on determining and accounting for epistemic uncertainty and appropriate parameterizations at different spatial resolutions, rather than on maximizing the spatial resolution.Increasing the spatial resolution of the model (towards hyper-resolution) is not a solution to sub-grid variability, since many of the relevant processes take place on even smaller scales (Wood et al., 1992;Kim and Stricker, 1996;Arora et al., 2001;Montaldo and Albertson, 2003;Beven and Cloke, 2012;Clark et al., 2015).Hence, despite their increasing spatial resolution, also GHMs require calibration in order to obtain effective parameters, and validation to determine model credibility.Even if a correct physical representation of hydrological processes is impossible, the goal of the model should be to mimic realism and hydrological processes as closely as possible (Wagener and Gupta, 2005;Kirchner, 2006;McDonnell et al., 2007).This implies that the models should be subject to a process-based calibration and validation procedure (Gupta et al., 1998(Gupta et al., , 2008;;Clark et al., 2011).Since different hydrological processes dominate at different scales (Fig. 1), the temporal and spatial scales are linked.Because the spatial resolution of GHMs is currently being increased to meet societal needs (Wood et al., 2011), the temporal resolution should decrease accordingly to meet these needs.This should be reflected in the calibration and validation time interval of the model, in order to guarantee model credibility at the required temporal and spatial resolution.

Timescales
A short review of scientific literature about scaling issues provides the impression that the focus has mostly been on the spatial scale and/or resolution rather than on its temporal counterpart (Klemeš, 1983;Dooge, 1986;Gupta et al., 1986;Dooge, 1988;Feddes, 1995;Kalma and Sivapalan, 1995;Sposito, 1998;Beven, 1995;Bierkens et al., 2000;Gentine et al., 2012).Many concepts have been developed to describe representative areas and volumes (Gray et al., 1993).In soil physics, the representative elementary volume (REV) is an often used concept, which describes the volume for which a measurement can be considered representative (Whitaker, 1999).Wood et al. (1988) explored a similar concept with applications in hydrology, namely the representative elementary area (REA), the critical area at which the pattern of small-scale heterogeneity becomes unimportant.Reggiani et al. (1998) proposed the representative elementary watershed (REW), allowing for closure of the balance equations averaged over time and space.Similar concepts, which statistically integrate temporal variations, have not been reported in the literature.The lack of attention for the temporal scale, however, is remarkable because hydrological states and fluxes are mostly studied as a function of time.As an illustration of the lack of attention for the aspects of temporal scale, it should be noted that in the recent papers by Wood et al. (2011) and Bierkens et al. (2014) on spatial hyper-resolution modelling, the temporal resolution of these models is referred to only once.One of the reasons why the development of a representative elementary time step (RET) is more complex is that several different time concepts play a role in hydrological modelling.
As a guideline and first step for the discussion on time dimensions in hydrological models, we identify six time concepts, which in practice are often mixed up and misinterpreted.A distinction is made between scale, which is defined as a continuous variable, resolution, defined as discrete variable being a model property, and time interval, which is a discrete variable independent of the used model.The six concepts are 1. the process timescale

the interpretation time interval.
First, the process timescale is defined, as the characteristic timescale of the hydrological process considered.This is the typical time period over which the process takes place.Infiltration excess overland flow, for instance, has a relatively short timescale, whereas regional groundwater flow has a longer timescale.The end user determines which process is most relevant in the modelling procedure.
Second, the temporal resolution of the input data or input resolution is relevant for the modelled process.The input resolution of the forcing data can differ from the output resolution of the model, and this can impact the results of the model.An example is given in the upper panels of Fig. 2, showing an application of the Green-Ampt (Green and Ampt, 1911) infiltration model.
The numerical resolution (or the time step) of the model is the time interval over which the model calculates the states and the fluxes internally.A model can only deterministically resolve a process if the numerical resolution is higher than the characteristic timescale of the process.The panels in the second row of Fig. 2 show how the numerical resolution impacts model output for the process of ponding, which leads to different conclusions about ponding, based on the model output.
The output resolution (often referred to as simply temporal resolution) is the time interval at which the model output yields the states and fluxes.This time interval can be equal to the numerical resolution of the model, or aggregated from the numerical resolution.The modelled process can only be identified if the output time interval is shorter than the characteristic timescale of the process, which is shown in the lower panels of Fig. 2.
The calibration and validation time interval of the model is defined here as the time interval at which the model output is being confronted with observations.Calibration and validation of the model output can be conducted at another time interval than the output resolution, by aggregating the model output.Calibration and validation should be performed at a time interval smaller than or equal to the timescale of the process that is relevant for the end user.Application of the Green-Ampt infiltration scheme for different input resolutions (upper row), different numerical resolutions (middle row), and different output resolutions (lower row).For each set-up, the model was fed with the same extreme precipitation event of 32 mm of rain in 30 min (4 mm in first 5 min, 5 mm in 5-10 min, 7 mm in 10-20 min, 5 mm in 20-25 min and 4 mm in 25-30 min).The model parameters have been kept constant; saturated hydrologic conductivity K s = 0.044 cm h −1 , initial soil moisture θ i = 0.1, saturated soil moisture θ s = 0.5, matric pressure at wetting front = 22.4 cm.Each of the three time concepts impacts the conclusions that are drawn from the model results, which shows that calibration and validation at the appropriate time interval is essential to resolve the processes taking place.
Finally, the interpretation time interval is defined as the time interval at which the model output is eventually analysed or interpreted.This can be equal to the calibration time interval, or the model output can be further aggregated resulting in a larger interpretation time interval (e.g. from daily to monthly).Since the model has not been validated or calibrated on time intervals smaller than the calibration time interval, the credibility of the results will be unknown for time interval smaller than the calibration time interval.
It is critical to note that some of these time concepts are necessarily equal to or larger than related time concepts, sometimes for logical reasons (the output resolution cannot be higher than the numerical resolution) and sometimes for model credibility reasons (the interpretation time interval should not be smaller than the calibration time interval).It is also important to note that the first time concept, the process scale, explicitly links the temporal and the spatial scale (Stommel, 1963;Blöschl and Sivapalan, 1995;Brutsaert, 2005).Conversely, the spatial resolution of a model will set a minimum temporal resolution determining which processes need to be resolved.

Example for VIC model studies
To illustrate the development of calibration/validation time interval and spatial resolution in large-domain hydrological modelling, we carried out a meta-analysis on the use of GHMs.The variable infiltration capacity (VIC) model (Liang et al., 1994) was chosen for this analysis because it is widely used and therefore enough studies were available for a metaanalysis.The VIC model is mentioned explicitly in Bierkens et al. (2014) as a type of model being run at the spatial hyperresolution.Sub-grid variability is parameterized as a distribution of responses without explicit treatment of the pattern.We believe this model is representative of the much larger class of global hydrological models.
The VIC model was initially constructed to couple climate model output to hydrological processes: it is capable of solving both the energy and the water balance.Lohmann et al. (1996) developed a horizontal routing model to couple the individual grid cells of the VIC model.This facilitated the distributed application of VIC for rainfall-runoff processes at large domains.No explicit definition of a spatial derivative or scale appears in the equations of the VIC model, the spatial resolution of the model only appears in the routing scheme through the horizontal flow velocity (see Kampf and Burges (2007) for a description of space-time representation in other distributed hydrologic models).
In our analysis we assembled 242 peer-reviewed studies that used the VIC model.Of these, 192 studies used the model in a distributed way and performed a calibration or validation on the model output (see Table A1 in Appendix A). Figure 3 presents a space-time perspective on the application of the VIC model during the past 2 decades.As expected, the spatial resolution at which the model is applied has increased steadily over the years (Fig. 3a).While the model was initially constructed for spatial resolutions of the order of 0.5 to 2 • , it is now mostly applied at 1/8 • and smaller.The main driver for the increase in spatial resolution is the availability of high-resolution spatial data sets, such as that presented by Maurer et al. (2002).The increase in resolution, however, does not apply to the employed calibration and validation time interval.Figure 3b shows that the time interval at which the model has been calibrated and validated has remained steady over the years.Therefore, while the spatial resolution of the model has increased, the model output is still calibrated and validated at the original coarse time interval.Processes with a short timescale, which become more important when the spatial resolution increases, will likely be overlooked during the calibration and validation of the model if the time interval is too coarse.Several studies have already shown that calibration on a coarser time interval does not guarantee credible results for shorter time intervals (Melsen et  Figure 1 indicates the initial development scale of the VIC model (A), the scale where it is heading to right now (B), and the direction where it should go in order to resolve relevant hydrometeorological processes (C).Therefore, the VIC model with a high spatial resolution should be calibrated and/or validated at a time interval short enough to catch the processes relevant at those particular spatial scales.
Two causes for the discrepancy in the joint development of spatial resolutions and calibration time intervals come to mind: lack of computational power, or a lack of (using) observations with a high temporal frequency.Figure 3c shows that the total number of grid cells that was used in the studies has on average increased over time.This is as expected: computational power has increased significantly over the years.According to Moore's law (Moore, 1965), computational power roughly doubles every 2 years.The grey lines in Fig. 3c indicate the corresponding slope in computational power on a log-log scale.The largest numbers of grid cells per year likely indicate the limit of technical capability.Overall, the trend in the studies, even in the higher quantiles, is much lower than the computational limit, suggesting that computational power is not a constraint for most studies.This implies that, presently, the main constraint for calibration and validation of distributed hydrological models at a certain time interval (Fig. 3b) is not the computational power, but the lack of (using) observations with a high temporal frequency.A possible explanation for this may be that many (global) studies rely on data from the Global Runoff Data Centre (GRDC), which are often available only at the monthly time interval.Also important is that for large basins, the typical application scale of VIC and other GHMs, flow is often regulated by dams for hydropower and flood control.Naturalized flows for these basins are often estimated at the monthly time interval.Our results reinforce the conclusion of Kirchner (2006) that field observations should account for the spatial and temporal heterogeneity of hydrometeorological processes, and the statement from Kavetski et al. (2011) that in most cases, temporal resolution is fixed by the data collection procedure.

Problem statement and outlook
The meta-anlysis on VIC studies showed that the spatial resolution at which the model is applied has increased over the years, while the calibration time interval has remained steady (Fig. 3).The examples are shown for the VIC model only, but we have the impression that the obtained trends apply for all GHMs.There is a general tendency to move towards higher spatial resolution in large-domain hydrological models (induced by e.g.Wood et al., 2011;Bierkens et al., 2014), whereas the available data for calibration and validation are model independent.
Although coarse temporal resolution data can be used to constrain model uncertainty, the ambition to move towards show the slope of computational power increase according to Moore's law (Moore, 1965).The point size is proportional to the number of studies that were published in a certain year with a certain spatial or temporal resolution.
If the spatial resolution was given in kilometres, it was assumed that 1 • = 100 km.For the total number of grid cells, catchment size was divided by cell size, assuming that 1 • = 100 km, unless the number of grid cells was explicitly given.Statistics (the mean and the standard deviation) have been obtained per year on logarithmically transformed data.With linear regression a line was fitted through the mean and the standard deviation.spatial hyper-resolution hydrological models with predictive capabilities should keep pace with the data that are required to run, calibrate, and validate the models.Increasing the spatial resolution of the model implies modelling different relevant hydrometeorological processes (there are some interesting developments concerning parameter transferability over spatial resolutions; see e.g.Samaniego et al., 2010, Kumar et al., 2013, and Rakovec et al., 2015), which in turn requires calibration and validation to be performed on a smaller time interval.It requires a community effort to increase the availability of high temporal resolution data for calibration and validation of large-domain hydrological models.Especially for large-domain studies, where data collection from all the separate basins at different institutes and countries is very time consuming (explaining the success of the GRDC), the data need to be gathered at and accessible from one point.It should also be recognized that discharge data only, especially at a monthly timescale, do not provide sufficient information for a process-based model evaluation at the spatial hyperresolution scale.Possible paths forward are the use of tracer data to identify different flow paths (Tetzlaff et al., 2015), the use of multiple objectives (Gupta et al., 1998), and the use of satellite and remote sensing data (Pan et al., 2008), all at a representative spatial and temporal resolution.
We acknowledge that calibration and validation at the appropriate time interval is only one of the many challenges of spatial hyper-resolution hydrological modelling.Even with enough observations available for calibration and validation, disinformative data (Beven and Westerberg, 2011), correct subgrid parameterizations (Beven et al., 2015), and model structural uncertainty (Clark et al., 2015) remain outstanding challenges.However, we believe that all these challenges can only be tackled if the models are subject to critical and process-based evaluation and validation (Gupta et al., 2008;Clark et al., 2011).In the end, the goal is to model hydrological processes in an appropriate way (Beven, 2006;McDonnell et al., 2007).
Along with an increased spatial resolution of the model products, there will be a shift in users' expectations of those products.Whereas coarse-scale (0.5 to 1 • ) products may provide relevant information for policy makers at the national or state level, products at the spatial hyper-resolution (0.1 to 1 km) are potentially of interest to a much wider range of users, including for instance farmers that want to schedule their irrigation.At the sub-kilometre scale, new processes such as infiltration excess overland flow and ponding can (and should) be resolved, but at the same time these processes cannot be explicitly resolved at a daily or monthly time interval.Thus, the recent call for increasing the spatial resolution of distributed hydrological models (Wood et al., 2011;Bierkens et al., 2014) should not focus solely on the spatial resolution, but should aim to increase the evaluation time interval simultaneously, at a balanced rate consistent with the characteristic timescales and space scales of the relevant hydrological processes (Fig. 1).We believe that such a balanced approach will serve societal needs best.

2 .Figure 1 .
Figure 1.The timescales and space scales of several hydrometeorological processes.Adapted from Brutsaert (2005) and Blöschl and Sivapalan (1995), who based it on Orlanski (1975), Dunne (1978), Fortak (1982), and Anderson and Burt (1990).The blue areas indicate the temporal and spatial resolution at which the VIC model has been applied, when it was initially developed (A) and presently (B).The dashed arrow pointing downwards shows the ambitions of spatial hyper-resolution modelling, whereas the dashed arrow pointing towards (C) shows the temporal and spatial resolution of hyper-resolution modelling if it follows the direction of characteristic velocity of hydrometeorological processes.

LFigure 2 .
Figure2.Application of the Green-Ampt infiltration scheme for different input resolutions (upper row), different numerical resolutions (middle row), and different output resolutions (lower row).For each set-up, the model was fed with the same extreme precipitation event of 32 mm of rain in 30 min (4 mm in first 5 min, 5 mm in 5-10 min, 7 mm in 10-20 min, 5 mm in 20-25 min and 4 mm in 25-30 min).The model parameters have been kept constant; saturated hydrologic conductivity K s = 0.044 cm h −1 , initial soil moisture θ i = 0.1, saturated soil moisture θ s = 0.5, matric pressure at wetting front = 22.4 cm.Each of the three time concepts impacts the conclusions that are drawn from the model results, which shows that calibration and validation at the appropriate time interval is essential to resolve the processes taking place.

Figure 3 .
Figure3.The year of publication versus the highest spatial resolution of the VIC model that was used in the study (a), the smallest time interval on which the calibration and/or validation of the VIC model was performed (b), and the total number of grid cells in the study (c) based on 192 peer-reviewed studies.The grey lines in (c) show the slope of computational power increase according to Moore's law(Moore, 1965).The point size is proportional to the number of studies that were published in a certain year with a certain spatial or temporal resolution.If the spatial resolution was given in kilometres, it was assumed that 1 • = 100 km.For the total number of grid cells, catchment size was divided by cell size, assuming that 1 • = 100 km, unless the number of grid cells was explicitly given.Statistics (the mean and the standard deviation) have been obtained per year on logarithmically transformed data.With linear regression a line was fitted through the mean and the standard deviation.