Diagnosing hydrological limitations of a Land Surface Model: Application of JULES to a deep-groundwater chalk basin

Land Surface Models (LSMs) are prospective starting points to develop a global hyper-resolution model of the terrestrial water, energy and biogeochemical cycles. However, there are some fundamental limitations of LSMs related to how meaningfully hydrological fluxes and stores 5 are represented. A diagnostic approach to model evaluation is taken here that exploits hydrological expert knowledge to detect LSM inadequacies through consideration of the major behavioural functions of a hydrological system: overall water balance, vertical water redistribution in the unsaturated 10 zone, temporal water redistribution and spatial water redistribution over the catchment’s groundwater and surface water systems. Three types of information are utilised to improve the model’s hydrology: a) observations, b) information about expected response from regionalised data, and c) infor15 mation from an independent physics-based model. The study considers the JULES (Joint UK Land Environmental Simulator) LSM applied to a deep-groundwater chalk catchment in the UK. The diagnosed hydrological limitations and the proposed ways to address them are indicative of the challenges 20 faced while transitioning to a global high resolution model of the water cycle.


Introduction
Guidance to support adaptation to the changing water cycle is urgently required, yet the ability of water cycle models to represent the hydrological impacts of climate change is limited in several important respects. Climate models are an essential tool in scenario development, but suffer from fun-damental weaknesses in the simulation of hydrology. Hydrology (as well as other soil-vegetation-atmosphere interactions) in climate models is represented via land surface models (LSMs) that partition water between evapotranspiration, surface runoff, drainage, and soil moisture storage. The deficiencies in hydrological processes representation lead to incorrect energy and water partitioning at the land surface (Oleson et al., 2008) that propagates into precipitation and near-surface air temperature biases in climate model predictions (Lawrence and Chase, 2008). Furthermore, improving the representation of hydrology is a step towards the development of a global hyper-resolution model for monitoring the terrestrial water, energy, and biogeochemical cycles that is considered as one of the grand challenges to the community (Wood et al., 2011).
The most recent third generation LSMs operate in a continuous time and distributed space mode, and simulate exchanges of energy, water, and carbon between the land surface and the atmosphere using physics-based process descriptions (Pitman, 2003). The physics-based nature of third generation LSMs allows widely available global data sets, such as soil properties, land use, weather states, etc., to be used as model parameters and inputs, thus making predictive modelling with LSMs very appealing.
A significant body of literature exists on LSM hydrology assessment and inter-comparison, including comparison with observed point scale evapotranspiration fluxes, soil moisture, observed river flow rates and depths to groundwater (Balsamo et al., 2009;Blyth et al., 2011;Boone et al., 2004;Lohmann et al., 2004;Maxwell and Miller, 2005). Blyth et al. (2011)  10 FLUXNET observation sites covering the major global biomes as well as river flows from seven large rivers to assess the performance of the JULES (Joint UK Land Environmental Simulator) model. The evaluation used monthly average fluxes, over a period of 10 years, and demonstrated a number of model weaknesses in energy partitioning as well as in water partitioning and routing, thus providing a direction for further model improvements. Balsamo et al. (2009) revised the soil representation in the TESSEL LSM (used by the European Centre for Medium-Range Weather Forecasts) and showed better agreement of the new H-TESSEL model with soil moisture point observations from the Global Soil Moisture Bank. Lohmann et al. (2004) evaluated four LSMs coupled to a surface runoff routing model over 1145 small and medium size basins in the USA, and found that "the modeled mean values of the water balance terms are of the same magnitude as the spread of the models around them". The authors name both parameter selection and model structure improvements as the key factors to achieve better model performance for hydrological predictions.
LSMs focus on modelling processes in the near-surface layer (typically, the top three metres). Typically, a unit gradient (free drainage) or other simple lower boundary condition is generally assumed in place of explicitly representing the groundwater boundary (e.g. Best et al., 2011;Kriner et al., 2005;Yang and Niu, 2003). However, in permeable basins the depth to the water table is often much deeper, for example in the Kennet case study, introduced below, this can be as much as 100 m (Jackson et al., 2008), calling into question the adequacy of a relatively shallow lower boundary condition. This can result in unrealistically dry lower soil layers (e.g. Li et al., 2008). To address this problem, the NASA Seasonal-to-Interannual Prediction Project Catchment Model (NSIPP) model relies on an approximate relationship (derived from detailed simulations) to estimate the soil moisture transfer rate between the root zone and water table at a catchment scale (Koster et al., 2000). In contrast, Community Land Model (CLM) uses the hydraulic gradient between the bottom of the soil column and the water table to approximate the drainage rate from the soil column (Oleson et al., 2008). Another approach is to use the location of the water table as a lower boundary condition. The soil, water, atmosphere and plant (SWAP) model uses a variable depth soil column, whose base is located at the water table (Gusev and Nasonova, 2003). Maxwell and Miller (2005) developed this further by coupling CLM to a physics-based 3-D groundwater model ParFlow (Kollet and Maxwell, 2008) at the land surface, replacing the soil column/root-zone soil moisture formulation in CLM with the ParFlow formulation. They concluded that the resulting model provided reasonable predictions for runoff rates and shallow groundwater levels on monthly time steps. However, the explicit inclusion of the deep unsaturated zone requires the estimation of hydraulic properties that are generally not included in existing soil databases.
The tendency for LSMs to use relatively shallow soil column depths and a simplistic or non-existent representation of groundwater also questions their applicability to catchments with deep-groundwater systems (where an average water table is tens of metres deep). Such systems represent a major storage of water and their interaction with the unsaturated zone can influence river flows, soil moisture, and evapotranspiration rates (Maxwell and Miller, 2005). Consequently, the addition of a groundwater modelling capability into LSMs will not only address these issues, but also be a step forward for multi-purpose modelling (e.g. representing groundwater levels for water resources) (Wood et al., 2011).
Most LSMs assume a 1-D vertical flow in a soil column neglecting lateral flow (e.g. Kriner et al., 2005;Gusev and Nasonova, 2003). Although this assumption is sufficiently accurate only for soils that are relatively homogeneous in horizontal and vertical directions (Protopapas and Bras, 1991), it is a fairly common feature for LSMs that employ a gridded surface representation. A further complicating factor is that 1-D flow is usually described in physicsbased LSMs using Richards' equation, which was derived at the point scale and used to represent single permeability, single porosity soils. The validity of this is questionable for a wide variety of soils, particularly at larger scales (Beven and Germann, 2013). Chalk is an example of a soil-rock system that consists of both matrix and fractures, whose properties are significantly distinct from each other, forming a dual porosity, dual-permeability system . Therefore, a traditional single domain soil water representation is unsuitable to adequately characterize its properties (Ireson et al., 2009). To the best of our knowledge, there is no currently operational LSM that is capable of realistically representing such dual porosity, dual porosity-dual permeability.
Another important challenge in improving hydrological fluxes in LSMs is the representation of surface and nearsurface heterogeneity, in particular how it affects partitioning between surface runoff, evaporation and infiltration. For example, 15 LSMs and a two-layer conceptual hydrological model were used to represent river discharge in the Rhone, one of Europe's major basins, in the Rhone-Aggregation Land Surface Scheme Inter-comparison Project (Boone et al., 2004); it was concluded that an LSM's ability to provide a good performance for daily discharge simulation is linked to their ability to generate sub-grid runoff, that is, to the representation of top-soil heterogeneity.
In light of these concerns, the scope of the study is to assess the hydrological behaviour of a typical third-generation LSM, the JULES, in a comprehensive and consistent way and adapt the model accordingly. For this, an evaluation strategy focusses on the primary functions of a hydrological system in a hierarchical way. While other alternatives exist (Black, 1997;Wagener et al., 2007), the following four hydrological functions are considered (Yilmaz et al., 2008): (1) to maintain an overall water balance (i.e. water partitioning between different water cycle components), (2) to redistribute water vertically through the soil, (3) to redistribute water in time, and (4) to redistribute water spatially over the catchment's groundwater and surface-water systems. The hierarchical evaluation strategy (or diagnosis) allows for inferences to be made about the specific aspects of the model structure that are causing the problems via targeted evaluations of the model response. The diagnostic evaluation makes use of multiple measures of model performance that are relevant for each of the four functions evaluated. When model performance is poor in a particular hydrological aspect, model modifications are based on hydrological expert knowledge that, whilst subjective, is the only currently available way to adjust the model. The Kennet catchment in southern England is chosen as a complex case study that represents a number of the modelling challenges; however, the methodology and the results are of interest beyond this study due to the similarities across the hydrological modules of different third-generation LSMs, and also the broad importance of chalk aquifers and deep-groundwater systems (Brouyere et al., 2004;Downing et al., 1993;Kloppmann et al., 1998;Pinault et al., 2005;Dahan et al., 1998Dahan et al., , 1999Nativ and Nissim, 1992;Nativ et al., 1995).

The Joint UK Land Environmental Simulator
JULES is a community land surface model, based upon the established UK Met Office Surface Exchange Scheme (MOSES) (Cox et al., 1999). In addition to representing the exchange of fluxes of heat and moisture between the land surface and the atmosphere, the model also represents fluxes of carbon and some other gases, such as ozone and methane . It includes linked processes of photosynthesis and evaporation, soil and snow physics as well as plant growth and soil microbial activity. These processes are all linked through a series of equations that quantify how soil moisture and temperature govern evapotranspiration, energy balance, respiration, photosynthesis, and carbon assimilation Clark et al., 2011). JULES includes multi-layer, finite-difference models of subsurface heat and water fluxes, as described in Cox et al. (1999). There are options for the specification of the hydraulic and thermal characteristics, the representation of soil moisture and the subsurface heterogeneity of soil properties (for more details see Best et al., 2011). JULES can be used as a stand alone land surface model driven by observed forcing data, or can be coupled to an atmospheric model (for example, the UK Met Office Unified Model). The model runs at a sub-daily time step, using meteorological drivers of rainfall, incoming radiation, temperature, humidity, and wind speed. When meteorological data have coarser temporal resolution than required by the model, the standard model (version 2.2) disaggregates the data as constant values. JULES is typically employed with a 3 m fixed depth of soil, a unit hydraulic head gradient lower boundary condition, and no groundwater component. Shallow groundwater can be optionally represented via the (topography-based model) TOPMODEL approach (Beven and Kirkby, 1979;Clark and Gedney, 2008). Further, top-soil heterogeneity can be optionally represented via a probability distributed model (PDM) (Moore, 2007;Clark and Gedney, 2008). Both options require specification of parameters that are conceptual in nature and are not directly related to the existing data on soil/vegetation properties. JULES is able to generate an infiltration-excess (when PDM is used, or when rainfall intensity exceeds the near-surface infiltration rate) as well as saturation-excess (when the TOPMODEL is used, or through the upward movement of water from saturated soil layers) surface runoff.
The study uses and implements modifications to JULES version 2.2, termed the standard JULES. The standard set-up is used with a 3 m depth of soil, four soil layers: 0.1, 0.25, 0.65, and 2 m deep, starting from the surface. The model is spun-up over 3 years, repeating the weather inputs for the first year of available data 3 times (one of the model warming-up options provided), and initialising soils with saturated conditions.

Case study catchment
The Kennet is a groundwater-dominated catchment in southern England (Fig. 1). The topographic catchment has an area of 1030 km 2 with an annual average rainfall of 759 mm . It is predominantly a permeable catchment (Upper Cretaceous Chalk). The western and northern parts of the catchment have exposed bedrock with only a thin, permeable soil. However, in the southern and eastern parts of the catchment there is significant drift cover, and, in its lowest quarter, it is largely impermeable due to overlying Palaeogene deposits (Fig. 1). It is a primarily rural catchment with scattered settlements. The flow regime is domi- nated by the slow response of the groundwater held within the chalk aquifer (the base flow index, that is, the proportion of total flow as base flow, is 0.87). Where the chalk outcrops, there is generally little surface runoff. At the Chalk-Palaeogene boundary, surface runoff from low permeability deposits gives rise to focussed recharge into the chalk. As a consequence, there are a number of swallow holes in the area (West and Dumbleton, 1972) that serve as surface-water sinks. The flow at the catchment outlet at Theale is monitored using a crump profile weir, where bypassing of the weir occurs above 29 m 3 s −1 . The unsaturated zone of the chalk has two characteristic behaviours: slow drainage over summer, and bypass flow during rainfall events (Ireson and Butler, 2013). Both behaviours are important under extreme conditions (i.e. droughts or extreme rainfall) for sustaining river flows and rapid water table response.

Case study data sets
A number of gridded data types are required for JULES parameterization and forcing (Table 1), including land cover and soil profile data, and meteorological drivers. Using a 50 m resolution topographic map, the Kennet catchment is discretized into 1 km 2 grids, which matches the resolution of the soil and meteorological data. Soil property data are provided by the National Soil Resources Institute (NSRI). Most soil profiles from the NSRI database extend as deep as 1.5 m for the basin (about 70 % of the profiles) and are provided with vertically variable Brooks and Corey soil moisture retention parameters. At the surface, the NSRI database differentiates between soil hydraulic parameters depending on land use (arable, permanent grassland, ley grassland, and other). Land use cover is provided from data collected by the International Geosphere-Biosphere Programme (IGBP). The IGBP 2007 data are utilized to determine land cover types from 17 IGBP classes. These are re-classified to the 9 JULES land use types (Smith et al., 2006). The outcome is that cropland and mosaic/natural land use are the dominant land use types in the area (97 %). Meteorological inputs to JULES were provided by the data from the Climate, Hydrology and Ecology research Support System (CHESS) project. The data set, produced by the Centre of Ecology and Hydrology (CEH), UK (E. Blyth, personal communication, 2012), includes 1 km gridded daily rainfall amounts derived from the UK rain gauge network measurements for the period 1971-2008 (Keller et al., 2006). In addition, air temperature, vapour pressure, long and short wave downward radiation, and wind speed, derived by downscaling the observed meteorology from the Meteorological Office Rainfall and Evaporation Calculation System (MORECS) 40 km × 40 km data set (Hough and Jones, 1997) accounting for the effects of topography, are also included. Daily observations of river flow at a number of gauging stations, along with groundwater levels at various observation frequencies (daily to monthly) from boreholes in the catchment, are used to evaluate model performance (Fig. 1). Groundwater levels at the same observational boreholes were previously examined by Jackson (2012), who used a conceptual model to estimate recharge rates to groundwater.
Chalk hydraulic properties are not available from standard national/global soil data sets (in the NSRI data set it is classified as a rock). Instead, these properties are estimated using soil moisture and matric potential observations at Warren Farm in the Kennet along with data from an on-site automatic weather station (Ireson et al., 2006) (Fig. 1). Soil moisture was measured between May 2003 and February 2006 us-ing neutron probe measurements at different depths between 0.1 and 4.1 m taken fortnightly, on average. Either pressure transducer tensiometer (wet conditions) or equitensiometer (dry conditions) readings were taken for the same period of time to measure soil matric potential at 1 m depth every 15 min (Ireson et al., 2006;Ireson, 2008). Weather data include hourly observations of rainfall, downward short-wave solar and downward long-wave radiation, air temperature, specific humidity, and wind speed for the period between October 2002 and January 2009. The sub-daily weather data are used to account for any soil moisture sensitivity to the rainfall timing and intensity.

Method
The hydrological process representation in JULES is assessed with respect to the four primary functions of a hydrological system (Yilmaz et al., 2008): (1) overall water balance, (2) vertical redistribution, (3) temporal redistribution, and (4) horizontal spatial redistribution. Table 2 lists the assessment metrics for each of the four functions: the examined model assumptions/simplifications, the implemented model modifications, and the information sources used to inform the model modifications. Each of these information sources is described in the following sub-sections. The implemented model modifications considered below consist of a sub-daily weather generator, representation of sub-grid scale heterogeneity, dual Brooks and Corey curve representation of chalk hydraulic properties, change of the lower boundary condition, and coupling to a groundwater model.

Sub-daily weather generator
The daily CHESS weather data are downscaled in time (15 min) by a weather generator (WG) (D. Clark, personal communication, 2013). The code provided by CEH uses a cosine variation for sub-daily temperature defined by the average daily temperature and temperature variation range (defined as 7 • C based on the local automatic weather station -AWS). Sub-daily incoming long-wave radiation is calculated using the same phase of the cosine function as that used for the temperature disaggregation. Sub-daily downward shortwave radiation is calculated as a product of the daily average downward shortwave radiation and a normalized fraction of a daily total solar radiation defined by a geographical location, time of year and day. Sub-daily specific humidity is assumed to be equal to the minimum of the saturated specific humidity (for a given sub-daily temperature) and the average daily specific humidity. Wind speed and air pressure are assumed to be constant throughout the day. Sub-daily precipitation is divided into large-scale rainfall, convective rainfall and largescale snow. This differentiation is based on the mean daily temperature. Precipitation is defined as snow if the temperature is below 0 • C; convective if the temperature is above 20 • C; and large-scale rainfall, otherwise. It is set to start at a random time during a day and to continue for a specified number of hours over the entire corresponding model grid: 2 h for a convective storm, and 5 h for large-scale precipitation. The model configuration that includes the weather generator is referred to as JULES + WG (see Table 2).

Representation of sub-grid scale heterogeneity of near-surface soil hydraulic properties
A statistical approach is chosen to represent sub-grid scale heterogeneity of soil hydraulic properties; therefore, the upper soil layer storage capacity is assumed to be heterogeneous and to have a Pareto probability distribution with shape parameter b and upper soil layer depth dz (Bell et al., 2009). This representation is available in the standard version of JULES, but there is no guidance on selection of the two parameters. This approach limits the amount of water available for infiltration according to the soil moisture state, with the rest of the rainwater becoming surface runoff. The infiltrated water is then routed vertically through the soil using Richards' equation.
Since there is limited information to constrain both parameters, the effective upper layer soil depth dz is fixed to the JULES default value of 1 m. A regionalized base flow index (BFI) from the HOST soil classification (BFIHOST) (Boorman et al., 1995;Bulygina et al., 2009) is used to specify the Pareto distribution shape parameter b for each soil type in the catchment. The parameter is calibrated using water partitioning between surface runoff and drainage by JULES. The parameter value that results in the drainage-tototal-runoff ratio closest to the expected BFIHOST for that soil classification is chosen to be representative of the soil heterogeneity. Due to the high computational requirement of JULES, only 21 regularly spaced values between 0 and 2 are considered. The considered parameter b range is found to provide suitable drainage-to-total-runoff ratios for the catchment soils and meteorological conditions. The model configuration that includes both the weather generator and the PDM model is referred to as JULES + WG + PDM (Table 2).

Chalk hydraulic properties estimation
Modelling vertical soil water flow in JULES using Richards' equation requires the following descriptors: air entry pressure head, Brooks and Corey exponent (Brooks and Corey, 1964), saturated hydraulic conductivity, soil moisture at saturation, residual soil moisture, soil moisture at the critical point when transpiration starts to decrease, and soil moisture at wilting point. Due to the two distinct flow domains in chalk-matrix and fractures, two intersecting Brooks and Corey curves are employed when fitting a chalk soil moisture retention curve. The effective soil moisture at the curves' intersection is estimated using available observations. This leads to a double curve representation of hydraulic conduc- Relative bias in subsurface runoff 2 : Observed flows and groundwa-ter level hydrographs at internal catchment points tivity dependence on soil moisture. The JULES soil module is modified accordingly to allow for a dual curve soil moisture retention representation. Although preferential bypass flow can occur in chalk (Ireson et al., 2012), it is considered to be relatively rare in the Kennet (Ireson and Butler, 2011). Consequently, it is not a major component in groundwater recharge and JULES has not been modified to include this effect.
The residual soil moisture content cannot be readily observed in the field, as chalk never dries out sufficiently to reach this state (Ireson, 2008). Therefore, the residual soil moisture content is estimated as a difference between maximum observed soil moisture and the effective porosity. The effective porosity (which includes matrix and fractures) is fixed at 0.36 (i.e. matrix porosity of 0.35 and fracture porosity of 0.01) (Bloomfield, 1997;Price et al., 1993). While fracture porosity tends to be higher at the soil surface due to the chalk weathering process (Ireson, 2008), this is not represented here due to the lack of comprehensive observations of soil moisture dynamics at multiple vertical levels; and the effects of the assumption are discussed in Sect. 4. Two sets of Brooks and Corey parameters are estimated by fitting the dual curve soil moisture retention representation to measurements of soil moisture and matric potential at 1 m depth obtained from field data collected at Warren Farm, Berkshire (Lowland Catchment Research -LOCAR -experiment data described in Ireson, 2008). Mean Square Error (MSE) is used to measure goodness of fit. Then, using the derived soil moisture retention curve, soil moisture at the critical point is calculated using the wet end curve at −40 kPa matric potential, while soil moisture at wilting point is calculated using the dry end curve at −1500 kPa.
Chalk saturated hydraulic conductivity is estimated by fitting the simulated soil moisture profiles to the available soil moisture neutron probes at multiple depths down to 4 m. For calibration purposes, 100 random values of saturated hydraulic conductivity are sampled logarithmically between 0.001 and 10 m day −1 . Mean square relative error (MSRelE; see definition in Table 2) for soil moisture between modelled and observed soil moisture for all observation depths is used as an objective function. The objective function increases error weights for the deeper layers that have less variable soil moisture, which is deemed to be important for drainage evaluation purposes. The model configuration that includes the weather generator, the PDM model and the chalk representation is referred to as JULES + WG + PDM + CHALK (Table 2).

A detailed physics-based model of a chalk hillslope
A physics-based model for 2-D flow in chalk (Ireson and Butler, 2013) represents a hillslope transect through unconfined chalk in the Pang catchment, located in close proximity to the river Kennet. While this model is an approximation itself and can only represent one set of hillslope properties, it is built upon the best current knowledge of the hydrology of chalk hillslopes and is the best available test bed for simpler approximations. Flows in the 2-D model are governed by Richards' equation in both the saturated and unsaturated zones; and the properties of the chalk matrix and fractures are represented using an equivalent continuum approach (Peters and Klavetter, 1988;Doughty, 1999;Ireson et al., 2009). The Richards' equation is solved using a finite volume method.
Fluxes and states of the chalk hillslope model for the period 1970-2000 are examined to assess the following two assumptions underlying the JULES hydrology: (a) there is no hydrological interaction between neighbouring vertical soil columns, and (b) a unit gradient flow is a satisfactory approximation of the lower boundary condition at the 3 m base of the soil column on a hillslope location with a typically deep unsaturated zone. Further, the hillslope model is used to evaluate the nature of coupling between the unsaturated zone and groundwater, as well as the nature of water transport in the deep unsaturated zone located between the base of the JULES soil column and the water table. For these purposes, lateral fluxes in the unsaturated and saturated zones, hydraulic gradients and drainage rates at the soil column base, transpiration volumes extracted from the saturated and unsaturated zones by plants, and recharge rates at the groundwater table are extracted from the model. To reduce boundary condition effects at the upper and lower ends of the hillslope, the above variables are considered in the middle of the hillslope.

ZOOMQ3D distributed groundwater model
Groundwater flow in the Kennet is simulated using the ZOOMQ3D finite difference code (Jackson and Spink, 2004). The groundwater model is set up to simulate fluctuations in groundwater level, river baseflow, and spring discharge on a daily time step. The model uses gridded catchment representations at two scales; a 2 km base grid is locally refined to 500 m over the central part of the catchment. Rivers are simulated using an interconnected set of river reaches that exchange water with the aquifer according to a Darcian type flux equation. The vertical variations in rock hydraulic properties are represented using a three-layer model based on geological models of the hydrostratigraphy within the London Basin. The model is assessed to be a relatively good representation of the processes in the region in comparison with other chalk modelling examples (Jackson et al., 2011;Power and Soley, 2004).
ZOOMQ3D requires a significant number of parameters including horizontally and vertically distributed hydraulic conductivity and storage coefficient values. The parameters were zonally regularized and calibrated to approximate regional water table elevations (Jackson et al., 2011). For parameter estimation purposes, recharge has been modelled using a distributed recharge model ZOODRM (Mansour and Hughes, 2004) based on a conceptual Penman-Grindley soil moisture deficit model (Penman, 1948;Gridley, 1967). As a result, it needs to be understood that the calibrated groundwater parameters are only representative for the ZOODRM recharge field and are not, therefore, adjusted for recharge fluxes obtained using JULES.
The model configuration JULES + WG + PDM + CHALK is coupled to ZOOMQ3D based on the findings from the detailed 2-D model (Sect. 3.4) and is referred to as JULES + WG + PDM + CHALK + GW (Table 2). When groundwater model parameters are adjusted to examine sensitivity of the model response, the configuration is referred to as JULES + WG + PDM + CHALK + GWadj.

Surface runoff routing
The standard JULES configuration (version 2.2) does not have a surface-water routing option. Therefore, given the catchment size, flows are averaged over 10-day intervals to reduce the impact of routing effects. For the chosen flow averaging interval, any inaccuracy in the estimated river discharge due to the lack of surface routing is believed to be minor when compared to the total flow magnitude (groundwater contributes 87 % of the flow, on average), and inaccuracies in both actual baseflow index estimation (when BFIHOST is used) and in groundwater routing representation. Further, swallow holes in the catchment (West and Dumbleton, 1972) and river-soil water exchange for surface runoff (i.e. infiltration of surface runoff into the river bed) are not represented in the model, and possible consequences of this are discussed later in the Results and discussion, Sect. 4.

Other JULES parameters
The remaining JULES parameters are assigned as follows. Land use fractions are taken from the IGBP 2007 data set, and re-classified into the nine land use types commonly used for JULES applications (Smith et al., 2006). Soil hydraulic parameters are taken from the NSRI soil database with the exception of soil layers that are classified as chalk. Soil hydraulic properties below the deepest NSRI horizon, typically at 1.5 m, are assigned the deepest horizon properties. Chalk hydraulic properties derived in this study are assigned to soil horizons that are classified as chalk in the NSRI database. The dominant agricultural crop for the area is spring barley (Limbrick et al., 2000). The root depth for the crop was chosen as 1 m (average value based on Breuer et al., 2003) and canopy height was chosen as 0.8 m (Hough and Jones, 1997;Mauser and Schadlich, 1998). Leaf area index (LAI) changes seasonally, with maximum of LAI = 3 (Mauser and Schadlich, 1998;Petr et al., 2002). The maximum interception capacity per unit leaf area is fixed at 0.2 mm, so that the upper limit to interception is 0.2 × LAI (Hough and Jones, 1997). Other vegetation parameters are set at their recommended default value for JULES (Cox et al., 1999).

Results and discussion
Observations of water fluxes, soil moisture and groundwater levels in the Kennet catchment are compared with the simulated values derived using the sequentially modified JULES model structure to represent the four hydrological functions of a catchment.

Water balance
The long-term water balance is calculated for the period 1972-2007 from observations and various model configurations, and three metrics are calculated -relative bias for total runoff (RBias Q ) and surface runoff (RBias SR ), and MSRelE ( Table 3). The unmodified JULES (version 2.2) is found to overestimate the total runoff by 24 % and, correspondingly, underestimate the evapotranspiration (ET) by 15 %. This is attributed to the constant temporal disaggregation of weather variables that is hard-coded into the model. When weather variables are temporally disaggregated using the WG, described in Sect. 3.1, the total runoff is only 2 % lower than the observed value. However, neither configuration is capable of producing any surface runoff. This is because the hydraulic conductivities of the catchment soils (derived from the NSRI parameter database), even for relatively clayey soils, are sufficiently high to enable virtually all the instantaneous rainfall rates obtained using temporal disaggregation to infiltrate into the soil.
The parameter b of the PDM model is selected based on regionalized information from BFIHOST (Sect. 3.2) and ranges, mainly, between 0 and 0.4, except for two grids where b is set as 0.7 and 1. Further, the parameter b is assigned 0 value over approximately 60 % of the catchment for the locations with permeable (chalky) top-soils. Addition of the PDM model (JULES + WG + PDM configuration) with parameters selected based on regionalized information from BFIHOST (Sect. 3.2) generated, on average, 70 mm yr −1 of surface runoff (compared to 39 mm yr −1 derived by baseflow separation at the catchment outlet). This is likely to originate from the regionalization error -the catchment average regionalized BFIHOST value (0.78) is lower than the BFI value calculated from observed flow at the catchment outlet (0.87). This difference may arise from a number of locally relevant soil properties and processes that are not represented in the regionalized BFIHOST, for example there is a focussed recharge into sink or swallow holes of runoff from the Palaeogene deposits in the lower reaches of the Kennet catchment (West and Dumbleton, 1972). Such localized processes could, in principle, be explicitly represented in the land surface model, but this would be difficult in practice due to the scales involved; for example, representing the sink holes would require fine-scale data (at 0.1 to 1 m resolution) describing the land surface features.
It is to be noted that the proposed model modification with PDM and its parameterization is not the only possible model modification. An alternative, which potentially leads to increased surface runoff production, includes spatial and, perhaps, further temporal downscaling of rainfall to produce more intense events over parts of the 1 km discretization grids. Table 3 shows that the model modifications used to improve the representation of the additional processes observed in the catchment (and outlined in Table 2) do not compromise the simulated water balance.

Vertical redistribution through the soil
Both JULES + WG + PDM and JULES + WG + PDM + CHALK configurations use 4.5 m long soil columns with 0.1 m thick soil layers to facilitate the comparison with the observed soil moisture. The JULES + WG + PDM configuration results in overly dry soils between 1 and 4.1 m depth when compared to the observations ( Fig. 2; a representative subset of the soil moisture time series is shown); and the corresponding MSRelE metric equals 3.64. This soil dryness is attributed to incorrect representation of chalk soil hydraulic properties. Figure 3 shows two Brooks and Corey soil moisture retention curves fitted to the pairs of soil moisture and matric potential observations at 1 m depth in chalk; the curves intersect at an effective soil moisture of 0.31 (effective soil moisture equals soil moisture with subtracted residual soil moisture). The figure illustrates a threshold change in the chalk soil moisture retention curve and consequently, through the Brooks-Corey-Mualem model, the unsaturated hydraulic conductivity relationship. This change in properties is related to the dual porosity-dual permeability nature of the chalk soil (Ireson et al., 2009). Estimated chalk hydraulic properties are given in Table 4. Further, the time varying vertical distribution of soil moisture estimated by the JULES + WG + PDM + CHALK configuration is shown in Fig. 2; this corresponds to an MSRelE metric of 1.12. This value stays approximately the same throughout further model modifications to include additional functions of the hydrological system. The inclusion of chalk properties into the model produces a better simulation of soil moisture content at the Warren Farm site than that from the JULES + WG + PDM configuration. This corresponds well with the observed soil moisture below ∼1 m depth. However, the upper soil tends to be wetter than the observed moisture levels. This is attributed to the chalk's vertical heterogeneity; fractures appear more frequently and are larger in the upper chalk. Depth-variable soil hydraulic properties are required to capture the phenomenon. This is not attempted here due to the lack of soil moisture-matric potential observational pairs at multiple vertical levels to define entire soil moisture retention curves. Effective soil moisture at the two curves intersection 0.31 Calibration to soil moisture and matric potential at 1 m * K sat is fitted using JULES + WG + PDM + CHALK as well as JULES + WG + PDM + CHALK + GW configurations. The value for the latter is shown in the parenthesis.

Temporal redistribution
The original as well as the modified model configurations are only capable of partitioning water fluxes at the point/grid scale, and do not have a mechanism for further routing to provide temporal water redistribution at the catchment outlet. Here, some assumptions about the nature of such water redistribution (Sect. III in Table 2) are assessed, and lateral routing through the saturated zone is achieved through coupling the model to a distributed groundwater model.
Fluxes extracted from the physics-based 2-D model of a chalk hillslope imply that there are two simplifications that can be made with regards to the 2-D nature of hillslope hydrological processes. First, lateral fluxes in the chalk unsaturated zone are found to be insignificant when compared to the net (vertical and lateral) fluxes in the unsaturated zone. Hence, the simplifying assumption about inter-soil-columnindependent hydrological behaviour is a reasonable and sufficiently accurate approximation for the area. Second, evapotranspiration losses from the chalk saturated zone are found to be negligible compared to those from the unsaturated zone. It is therefore assumed that evapotranspiration processes can be restricted to the unsaturated zone when coupling the unsaturated zone to groundwater for the study area investigated herein.
Extracted vertical hydraulic gradients from the 2-D hillslope model are compared to the unit gradient lower boundary condition along with a number of alternative lower boundary conditions (using mean absolute difference as an objective function). Of these, it is found that a persistent gradient condition is the most consistent and accurate approximation of the lower boundary condition for the area. The persistent gradient condition assumes that hydraulic gradient is time varying but almost constant with depth at the soil column base. The condition can be approximated using the hydraulic gradient between soil column nodes just above the column base, requiring a relatively fine node mesh at the column base. The persistent gradient condition can be seen as a general conse-quence of the following lower boundary condition ∂ 2 h ∂z 2 = 0, where the unit gradient lower boundary condition ∂h ∂z = 1 is a special case. Note that only the hydraulic gradient at the soil column base is approximated herein. This gradient is used to substitute the unit gradient in the formula for the drainage flux in JULES. This implies that hydraulic conductivity at the base of the soil column is based on the nearest to the bottom node state.
Further, the persistent gradient approximation is evaluated for multiple soil column depths to optimize its applicability. The mean absolute difference between the persistent hydraulic gradient at the lower boundary and hydraulic gradients extracted from the hillslope model at a number of depths is used as an objective function. It is found that the objective function improves with increasing depth of soil column but less significantly after 6 m. As a trade-off between the soil column depth and the lower boundary approximation accuracy, an optimal depth to apply the persistent gradient lower boundary condition is chosen to be 6 m. Figure 4 compares hydraulic gradients at a 6 m column base extracted from the 2-D model to the unit gradient as well as to the gradient just above 6 m (approximately at 5.5 m depth, based on the model mesh), representing the persistent boundary condition approximation.
Lastly, to draw a connection between the modelled potential recharge at 6 m depth and the modelled actual recharge at the water table, temporally averaged vertical fluxes extracted from the 2-D hillslope model are considered for 6 hourly (the model step), daily, weekly and 30-day periods. The correlations between the time series of actual and potential recharges for the averaging periods are 0.75, 0.8, 0.89, and 0.94, respectively. Total actual and total potential recharges for the 1970-2000 period are found to be less than 1 % different. Average daily (the regional groundwater time step) potential simulated recharge at 6 m and actual simulated recharge at the water table extracted from the 2-D model are shown in Fig. 5. It can be seen that the potential recharge is widely spread at low actual recharge rates (below about 2 mm day −1 ). However, the potential recharge becomes quite a consistent predictor for the actual recharge at mid-to high actual recharge rates.
Based on the above findings, JULES + WG + PDM + CHALK is used with a 6 m deep soil column and 0.1 m thick soil layers, and is coupled via a weak two-way coupling to the groundwater model ZOOMQ3D implemented through the lower boundary condition (persistent gradient). The weak coupling assumes that the drainage flux from JULES is used as an upper boundary condition by ZOOMQ3D, and any upward water fluxes from the saturated zone to the upper unsaturated zone are calculated based on the (persistent gradient) lower boundary condition. Note, the saturated hydraulic conductivity for chalk soil is re-calibrated following the procedure given in Sect. 3.3 (Table 4) as the new persistent gradient lower boundary condition impacts the soil moisture dynamics.
The resulting 10-day-averaged river flow at the catchment outlet (Theale) for the period 1994-2006 is shown on Fig. 6. The period includes two droughts in the region (1995-1998 and 2003-2006) as well as substantially wet 1999-2001 period that led to groundwater flooding. Figure 6 also shows model performance measures for the total simulation period of 1972-2007, with a Nash-Sutcliffe efficiency for simulated flow (NS = 0.82) and log-transformed simulated flow (NS log = 0.81), as well as a relative bias for the total flow (RBias Q = 0.01). Note, the relative bias is calculated using the flow at the catchment outlet, not the sum of surface and drainage fluxes produced by the land surface model component of the configuration. This explains the slight difference between the RBias Q values in Table 3 and Fig. 6 for the model configuration.

Spatial redistribution over the groundwater and surface-water systems
Due to the distributed nature of the coupled model configuration, flows (Fig. 6) and groundwater levels (Fig. 7) can be examined at the internal catchment points shown in Fig. 1. It can be seen that total flow tends to be underestimated in the smaller sub-catchments such as the Kennet at Marlborough and the Lambourn at Shaw. Inspection of water movement patterns inside the groundwater model ZOOMQ3D offers a possible explanation. The Lambourn groundwater catchment area is found to be underestimated by ZOOMQ3D when compared to the groundwater catchment area extracted from local observational boreholes and spring head data (Parker, 2011;Parker et al., 2015). Further, the model tends to direct some water from the Lambourn to the middle part of the Kennet (Parker, 2011, S. Parker, personal communication, 2013, which helps to explain the total flow overestimation  at the Marlborough, Newbury, and Knighton gauges. Further, during wet years, peak flows appear to be underestimated at all gauging stations; meanwhile, low flows are slightly overestimated for the Kennet at Theale and the Kennet at Newbury, and underestimated for the Lambourn at Shaw. Treating potential recharge from the land surface model as actual recharge to ZOOMQ3D might partly explain the low flow overestimation. Because of the mismatch of scales between an observation borehole (order of 1 m) and JULES and ZOOMQ3D grid scales (1 km and 500 m, respectively), only a visual assess-ment of the predicted water levels is attempted. Figure 7 illustrates simulated water levels at four selected boreholes for September 2000-August 2001 representing an unusually wet year leading to a groundwater flooding in the area. Similar to the results from Jackson (2012), who considered the same period and boreholes, water levels are mainly overestimated at the Marsh Benham and Bradley Wood boreholes. Moreover, the modelled response at Marsh Benham and Bradley Wood is more attenuated than the observed response. At the model scale (1 km), the estimated groundwater levels are indicative of the boreholes partly due to soil heterogeneity. For exam- ple, the PDM model parameter b (as well as soil hydraulic properties) is chosen based on a dominant soil type; therefore, the recharge to total runoff ratios are 0.52 and 0.17 for Bradley Wood and Marsh Benham, correspondingly. However, other soils present in the model grids have very different recharge to total runoff ratios, for example, 0.98 and 0.88 for Bradley Wood and Marsh Benham, respectively. Incorporating the hydrology of these soils can potentially lead to more responsive water level behaviour at the boreholes as more water will infiltrate.
As indicated in Sect. 3.5, the parameters of the ZOOMQ3D groundwater model are derived using recharge from a different near-surface model, and thus are likely to be sub-optimal when recharge produced by JULES is used. A manual sensitivity analysis of model parameters showed that tuning values of specific yield or hydraulic conductivity leads to better agreement with the observed data. For example, Fig. 8 shows flows generated by the coupled model when ZOOMQ3D specific yield parameters are halved over the whole Kennet area. This results in a better representation of high flows, but mixed outcomes for low flows (according to the NS and NS log performance measures). Groundwater levels at the selected boreholes become slightly more responsive, but do not change significantly (Fig. 7). As the primary research objective is to diagnose the hydrological limitations of a land surface model, a formal recalibration of an auxiliary groundwater model is not pursued here.

Conclusions
The paper is motivated by the goals of using land surface models as a basis for global hyper-resolution modelling of the terrestrial water, energy, and biogeochemical cycles, including application to a range of complex hydrological prediction problems. This comes alongside the recognition that there are significant limitations in the accuracy with which hydrological fluxes and storages are represented in general in LSMs due to their focus on supporting large-scale climate modelling problems (Oleson et al., 2008;Wood et al., 2011). The paper uses a case study of the JULES LSM model applied to the Kennet catchment in southern England, which represents the challenging problem of hydrological modelling in a chalk dominated catchment with a predominantly deep unsaturated zone. A diagnostic approach is taken to identify the model inadequacies with respect to the four functions of a hydrological system: overall water balance, vertical redistribution of water through the soil, temporal redistribution of water, and spatial redistribution over the catchment's surface-water and groundwater systems. The approach facilitates a sequential model improvement using hydrological expert knowledge about model assumptions and simplifications relevant for each hydrological aspect considered. The following model modifications are presented and assessed in the paper: -overall water balance: introduction of a weather generator and statistical description of top-soil heterogeneity via regionalized information; -vertical redistribution through the soil: approximation of the dual permeability -dual porosity hydraulic soil behaviour; -temporal redistribution: change of the lower boundary condition and approximation of coupling to a groundwater model; -spatial redistribution over the catchment: alteration of groundwater model parameters.
It is noted that improving the model physics in sequence preserves model performance quality with respect to the other previously considered functions. For example, improving vertical distribution does not corrupt the water balance achieved at a previous model modifications stage. This might be explained by the physical basis of both the model and reasoning for model modifications. The improvements are illustrative of the potential outcomes of a diagnosis approach, and alternative or additional improvements are possible. These include: the representation of the temporal and spatial distribution of precipitation; inclusions of point/small-scale features such as sink holes; and more physics-based inclusion of the vertical and horizontal distribution of soil hydraulic properties. As a procedural improvement, uncertainty analysis could be used to indicate if output errors can be explained by estimates of particular input uncertainties.
For some applications, the intermediate model configurations might be sufficient. For example, while JULES + WG + PDM configuration cannot provide flow/groundwater level hydrographs as it lacks surface and subsurface water routing, the configuration can still be used to represent the water balance over an area. This is useful for regions where no groundwater model and/or detailed geology data are available. It is to be noted that the findings are catchment specific and result from a weak surface-groundwater coupling, and as such cannot be readily generalized to other environments with shallower water tables.
Diverse sources of information were used to guide the model assessment and include remotely sensed data (topography, land use), spatially extrapolated point data (soils, weather conditions), point measurements (soil moisture and matric potential, flow, groundwater level), regionalized hydrological information (BFIHOST), and states/fluxes extracted from an auxiliary physics based hillslope model (Ireson and Butler, 2013). Fewer data might result in a less detailed representation of the water cycle depending on the specifics the hydrological system being investigated.
Whilst this application of JULES to the Kennet catchment is highly specific, it conveniently illustrates the type of challenges and parameterization of complex and distributed hydrological processes, model coupling using simplified boundary conditions, and assimilation of different sources of information to model identification, which will be encountered in almost any attempt to improve the utility of LSMs for catchment-scale water cycle modelling, arising due to the uniqueness of place problem. The paper has demonstrated the considerable accuracy gains that can be achieved using a sequential model error diagnosis strategy and expertlead model adjustments. These can be taken forward to develop a general comprehensive guidance for transitioning to high resolution land surface modelling.