A critical assessment of the JULES land surface model hydrology for humid tropical environments

Global land surface models (LSMs) such as the Joint UK Land Environment Simulator (JULES) are originally developed to provide surface boundary conditions for climate models. They are increasingly used for hydrological simulation, for instance to simulate the impacts of land use changes and other perturbations on the water cycle. This study investigates how well such models represent the major hydrological fluxes at the relevant spatial and temporal scales – an important question for reliable model applications in poorly understood, data-scarce environments. The JULES-LSM is implemented in a 360 000 km 2 humid tropical mountain basin of the Peruvian Andes–Amazon at 12-km grid resolution, forced with daily satellite and climate reanalysis data. The simulations are evaluated using conventional discharge-based evaluation methods, and by further comparing the magnitude and internal variability of the basin surface fluxes such as evapotranspiration, throughfall, and surface and subsurface runoff of the model with those observed in similar environments elsewhere. We find reasonably positive model efficiencies and high correlations between the simulated and observed streamflows, but high root-mean-square errors affecting the performance in smaller, upper sub-basins. We attribute this to errors in the water balance and JULESLSM’s inability to model baseflow. We also found a tendency to under-represent the high evapotranspiration rates of the region. We conclude that strategies to improve the representation of tropical systems to be (1) addressing errors in the forcing and (2) incorporating local wetland and regional floodplain in the subsurface representation.


Introduction
The humid tropics host extremely biodiverse ecosystems, which are subject to changing climate and land use patterns and potentially changing water cycles. With an area of approximately 6 million km 2 (Latrubesse et al., 2005), Amazonia hosts a significant part of the world's remaining rainforest and is an important supplier of atmospheric moisture (Salati and Vose, 1984). Many hydrological studies have continued to focus on the continental and lower Amazon (e.g. Vörösmarty et al., 1989;D'Almeida et al., 2006;Paiva et al., 2011;Guimberteau et al., 2012), while the upper Andes-Amazon system receives far less attention, despite being subject to increasing human impacts such as deforestation, oil exploitation, mining, and hydropower production. The potential impact of a changing climate and land use on the hydrological regime of the Amazon headwaters is a serious concern, not only because of its influence on the downstream basin, but also because of its link to local ecosystem services.
Hydrological models are a common approach to understanding and predicting the impact of change, but modellers of tropical environments face issues with process understanding and data availability for parameterization and validation of models (Giertz et al., 2006). Models for tropical basins in the literature tend to be conceptual models requiring few inputs (e.g. Darko, 2002;Campling et al., 2002;Bormann and Diekkrüger, 2004), but depending strongly on local calibration. This can be complicated by the low availability and high degree of uncertainty introduced by the streamflow data. This is especially true for the Amazon where the river morphology Z. Zulkafli et al.: JULES-LSM hydrology for humid tropical environments continuously changes due to active erosion in the uplands (see Aalto et al., 2006). Another disadvantage of conceptual models is the unidentifiability of their parameters (Ebel and Loague, 2006), which complicates scenario analysis.
Distributed physics-based modelling has been attempted in smaller catchments with some success (Vertessy and Elsenbeer, 1999;Legesse et al., 2003;Bekoe, 2005). More recently, land surface models (LSMs) have been used for physics-based hydrological modelling at the global scale (e.g. Arora and Boer, 2003;Alkama et al., 2011) and the continental-scale Amazon (Decharme and Douville, 2006;Guimberteau et al., 2012). LSMs, also referred to in the literature as land surface schemes (LSS) and land surface parameterizations (LSP), were originally developed by the climate modelling community to provide the land-atmospheric boundary condition in operational weather forecasting and global climate simulations. An LSM operates in continuous time and fully distributed mode, as it simulates the exchanges of energy, water and carbon between the land surface and the atmosphere by accounting for processes in the ground and vegetation canopy. A particular strength of LSMs is the biophysical consistency between various modelled processes such as photosynthesis, carbon and nutrient cycles, irrigation, and crop growth. This provides an opportunity for hydrologists to study the impact of change on hydrology in interaction with the other land surface processes.
However, despite their sophistication, LSMs are newcomers to hydrological modelling, and while there have been point scale validation exercises in flux tower sites (Baldocchi et al., 2001) representing various environments (e.g. Blyth et al., 2011), none has thoroughly assessed their performance in tropical upland basins. First, there is the issue with model structure that stems from universalization of locally observed processes. Such a simplifying assumption may not hold in topographically complex environments where there may be multiple interacting factors controlling the hydrological response. Additionally, there are established weaknesses in the LSMs even for temperate climates that may be more problematic over the humid tropics. For instance, LSMs are often criticized for an absence of a groundwater model and therefore unsuitable for basins with a shallow water table as it assumes free gravity drainage from the soil column (Yeh and Eltahir, 2005). Moreover, lateral runoff processes reported in the humid tropical literature (Dunne, 1978;Campling et al., 2002;Giertz et al., 2006;Chappell, 2010) such as saturation excess surface runoff, interflows in the organic layer, and natural pipes are seldom modelled explicitly due to the LSM vertical structure.
Parameter uncertainty is a second issue. In comparison with the temperate regions, the tropics have lagged in soil characterization with respect to runoff producing mechanisms (Giertz and Diekkrüger, 2003). In this context, an LSM offers an advantage in that its worldwide application means global datasets of soil and vegetation data are available. While these exist, they may not be direct field-measurements.
For instance, soil hydraulic parameters are derived from soil texture maps using pedotransfer functions (PTF). As these functions were constructed through analyses of water retention data from temperate soils (e.g. Cosby et al., 1984), they may not reliably reproduce the hydraulic behaviour exhibited by humid tropical soils, for example, low water retention even at high clay contents (Hodnett and Tomasella, 2002). As such, the PTFs derived using solely tropical soils (Tomasella and Hodnett, 1998;Tomasella et al., 2000;Hodnett and Tomasella, 2002) may be more applicable for tropical soils, yet remains limited in their representativeness for the younger volcanic soils in the uplands (Hodnett and Tomasella, 2002).
The number of parameters required per LSM pixel further introduces issues of data collection and scaling. Consequently, the majority are effective parameters that are not well constrained and applied homogeneously in space. This may not be a serious issue in land surface modelling, which is mainly concerned with representing fluxes at large spatial scales. However, it becomes problematic for hydrological modelling, where local heterogeneity and non-linear responses to perturbations need to be simulated, and especially in mountainous environments where a high degree of heterogeneity may be expected over a small scale.
In spite of the limitations, LSMs are increasingly used to simulate hydrological fluxes such as discharge at basin and global scales. This paper studies the implications for modelling humid tropical upland basins. The case study is the upper Amazon River (Marañón River) basin in Peru, which hosts half of the Pacaya Samiria National Reserve, the largest floodable forest reserve in the Peruvian Amazonia, where the hydrological system provides important ecosystem services for unique species of fish and freshwater turtles that are vulnerable to extinction, as well as approximately 10 000 km 2 aguaje (Mauritia flexuosa) palm forests of high economic importance (Kahn, 1988).
Our approach moves beyond the traditional model evaluation using strictly observational data to decide whether a model is acceptable. Instead, our aim is to evaluate a "fit for purpose" simulation system that "shadows" the natural system as closely as possible (Beven et al., 2012). This does not necessarily mean that internally all fluxes are correct at the pixel scale and at each time step, which is infeasible given the lack of observations. Rather, we intend to present a simulation that most closely resembles the reality in terms of the main statistical properties (especially the mean and variation of the basin's hydrological fluxes), under the assumption that such a system will be the most robust in representing the impact of perturbations at a basin level.  Best et al. (2011), while a brief description is provided for completeness. Land surfaces are modelled as tiles consisting of vegetated and nonvegetated surfaces with distinct parameters for radiation balance, resistance to heat and momentum transfer, canopy interception, plant photosynthesis, respiration and growth, etc. Land-atmospheric heat and moisture exchanges for each grid are calculated by area-weighted averaging of the tile fluxes, and these are exchanged with a shared soil column. The LSM requires time series of meteorological data, i.e. incoming short-wave and long-wave radiation, temperature, specific humidity, wind speed, and surface pressure. These are used in a full energy balance equation that includes components of radiation, sensible heat, latent heat, canopy heat, and ground surface heat. The potential evaporation estimation is based on the Penman-Monteith (Penman, 1948) approach. Canopy evaporation is assumed to occur at the potential rate, while plant transpiration and bare soil evaporation are restricted by canopy resistance and the soil moisture state, respectively.
In JULES the local throughfall rate is proportional to the local precipitation rate by the fraction of occupied canopy storage, C/C max , where C max is a vegetation parameter and is a linear function of the leaf area index (LAI). On the ground surface, throughfall is partitioned into surface runoff and infiltration into the soil moisture pool based on the Hortonian infiltration excess mechanism, enhanced by a vegetationspecific factor to account for macroporosity in the soil.
In the subsurface, an instantaneous redistribution of moisture is assumed and water is exchanged between the soil layers using a finite difference approximation to the Darcy-Richards diffusion equation, with infiltration and gravity drainage as the upper and lower boundaries respectively, and root uptake as a sink. The soil water retention characteristics follow the model of Brooks and Corey (1964) or the alternative van Genuchten (1980) formulation. In the alternate soil hydrology model of JULES (Clark and Gedney, 2008), a grid-based implementation of TOPMODEL calculates the local saturation excess runoff based on a time-moving surface partial contributing area. In this configuration the model applies an exponential decay to the soil hydraulic conductivity with depth, assumes a null-flux lower boundary, and applies an anisotropic factor to generate lateral flows for the subsurface.
For the study basin, we evaluated both the basic JULES (JULES-BASE) and the JULES-TOPMODEL parameterizations in distributed mode. The study basin is divided into pixels of 0.125 • latitude-longitude (∼ 14 km) resolution. Each pixel is assigned a set of soil parameters, the distribution of the land cover types, and time series of meteorological variables from global datasets. The model goes through a warming up period to initialize the internal states.

Runoff routing
The runoff generated by JULES consists of local surface and subsurface runoff that needs to be routed for a meaningful assessment against streamflow measurement data. Our model is a simple delay function, with the delay for each pixel being the distance between the pixel and the outlet divided by the flood wave velocity (C). The flood wave velocity for the surface and subsurface runoff are the two parameters of the model that are optimized through a Monte Carlo simulation.
The lag time t to the outlet from pixel i will vary by its distance d to the outlet along the stream network: where the subscripts 1 and 2 represent surface and subsurface components respectively. Finally, the simulated flow at the outlet (Q sim ) is the sum of all contributing local hydrographs in the basin, lagged in time.

Study area
The study area is the Marañón (Peruvian Amazon) River basin upstream of the confluence with the Ucayali River that forms the proper Amazon River (Fig. 1). On average, the yearly discharge in the Marañón River is 14 900 m 3 s −1 , as measured at San Regis station (74 • W, 4.4 • S) (Espinoza-Villar et al., 2009a). The climate of the region has been discussed extensively by various works, see for example Kvist and Nebel (2001), Garreaud et al. (2009) and Espinoza-Villar et al. (2009b). The average yearly temperature in the Marañón River Basin is about 23-27 • C, and the austral winter (JJA) is warmer than austral summer (DJF), during which cloud formation and rainfall effectively reduces the temperature (Kvist and Nebel, 2001;Garreaud et al., 2009). Precipitation characteristics vary across latitudes as influenced by synoptic meteorological phenomena operating at varying time scales (see Table 1). Precipitation is also orographically controlled, with the wettest band found at an altitude of 1.3 m a.s.l. on the eastern face of the Andes (Bookhagen and Strecker, 2008). The Andean ranges, with elevations exceeding 4000 m, are effective hydrometeorological controls separating their west and east and connecting regions at lower and higher latitudes (Garreaud et al., 2009). The basin is underlain by silty Gleysols in the river floodplains, and young unstructured Cambisols in the Andean foothills and Leptosols and Regosols further upland (FAO, 2009). The tropical wet climate supports expansive lowland, montane, and floodable forests as well as wet grassland (locally known as Páramos) on the Andes. In the southwest, shielding by the eastern Andean range generates a drier climate and grassland above the tree line.

Land surface data
The watershed boundary for the study basin is delineated based on HydroSHEDS 90-m resolution hydrographic dataset (Lehner et al., 2008). A gridded map of mean and standard deviation of the topographic indices for TOP-MODEL was also generated from this dataset.
Two sets of land surface parameters governing the soil hydraulics were created using the pedotransfer functions developed by Tomasella and Hodnett (1998) and Cosby et al. (1984), the latter for use in a sensitivity analysis. The functions take values of silt and clay fraction in the soil, which are available through the Harmonized World Soil Database (FAO, 2009) at 1-km resolution (FAO, 2009). The HWSD sources the ISRIC World Soil Information Soil and Terrain Database (SOTER), which is considered highly reliable in South America (FAO, 2009), although a more recent mapping study by Quesada et al. (2011) is more likely to provide more accurate information across the Amazon Basin. The global soil textural map provides information for the topsoil (top 30 cm) and subsoil that was assigned to the top 2 (total depth of 35 cm) and bottom 2 (total depth of 265 cm) layers in JULES. The land cover was parameterized based on multiple land cover maps. The primary source is the Digital Ecological Systems Map of the Amazon Basin of Peru and Bolivia (Josse et al., 2007b). This is a 90-m resolution field-verified mapping based on satellite imagery. In areas outside of this map's coverage, the 1-km IGBP-DIS land cover classification map of Loveland et al. (2000) was used.
The model default vegetation parameters (see Tables 5  and 6 in Best et al., 2011) are applied uniformly in space for each of the 5 plant functional types, with the exception of canopy height and LAI, whose spatial distribution were obtained from the UK Met Office version 7.7 Central Ancillary Program and are based on satellite-derived normalized difference vegetation indices (NDVI). Additionally, canopy heights of broadtree leaves were mapped from the local digital ecological systems database (Josse et al., 2007a).

Meteorological data
The global land surface model driving data developed by Sheffield et al. (2006) were used and included meteorological variables such as long-and short-wave radiation, temperature, pressure, specific humidity, wind, and precipitation. The dataset is the first generation NCEP (US National Center of Environmental Predictions Kalnay et al., 1996) climate reanalysis product merged with ground data (the data will be henceforth referred to as NCEP). The dataset at 1.0 • latitude-longitude (111 km) grids were further disaggregated to 0.125 • latitude-longitude (14 km) grids using the lapse rate interpolation method described in the same paper. The nearest neighbour interpolation method was used to disaggregate precipitation. An alternate precipitation dataset from the TRMM 3B42 (version 6, 0.25 • resolution, Huffman et al., 2007) remote sensing product was bias corrected with TRMM 2A25 climatology (0.1 • resolution) of Nesbitt and Anders (2009) (the data will be henceforth referred to as TRMM). The simulation of JULES was performed over the entire basin for a period of 11 yr between 1998 and 2008 to coincide with the periods of available data from NCEP and the TRMM precipitation.

Streamflow data
Daily streamflow data for four hydrological stations at San Regis, Borja, Santiago, and Chazuta were obtained through HYBAM (geodynamical, HYdrological and Biogeochemical control of erosion alteration and material transport in the AMazon Basin) from the Servicio Nacional de Meteorología e Hidrología, Peru (SENAMHI) and the Nacional de Meteorología e Hidrología, Ecuador (INAMHI) monitoring networks (for station locations and synthesis, refer to Fig. 1 and Table 2). Missing days from the time series were excluded from the analysis, and the 95th percentile lower and upper uncertainty bounds (L.U.B. and U.U.B.) were calculated adapting the method in Daren-Harmel and Smith (2007).
For this study, the probable error ranges (PER) were estimated using the standard deviation of the errors between the gauged and field-measured discharges during the calibration campaigns. The water balance closure was assessed and 4 additional stations, i.e. Paute, Nueva Loja, Nuevo Rocafuerte, and San Sebastian from INAMHI were included for comparative analysis. The last three stations are located in the Napo River basin in Ecuador, which is tributary to the Peruvian Amazon further downstream of San Regis.

Model evaluation
Streamflow simulations were assessed with the Nash-Sutcliffe model efficiency (NSE), the root-mean-square error (RMSE), the relative bias and the Pearson correlation. The calculated deviations were modified using approach 1 described by Daren-Harmel and Smith (2007) to account for streamflow data uncertainty. The entire time series available from the modelling period of 1998-2008 were used. The internal model fluxes (i.e. evapotranspiration, canopy throughfall, surface runoff, and subsurface runoff) are evaluated by calculating the statistics of the spatial variability over each major biomes -lowland forest and flood forest (below 1200 m a.s.l.), montane forest (between 1200 and 3500 m a.s.l.), and upland (above 3500 m a.s.l.) systems as shown in Fig. 2 -within the entire basin. In the absence of a dense network of local observations, the distribution of observations from the literature (Table 4) are taken as substitute for observations from a "real" system. The assumption is that the best simulation for the basin will produce similar natural variability, assessed in terms of the mean and spread of the distributions.

Uncertainties in the observations
Potential errors in the observations of precipitation and streamflow were assessed by analysing the runoff ratio, i.e. the total observed streamflow to the total precipitation input (Table 3). Ratios typical for humid tropical environments are in the range of 0.6-0.7 (Campling et al., 2002;Buytaert et al., 2006a;Rollenbeck and Anhuf, 2007), while the values found in the study basins exceed 0.80, with values up to 1.88. This suggests severe errors in either the streamflow measurements or precipitation products, or unaccounted sources of water. For the latter, cloud water input may be responsible, as this can be significant in montane forests, ranging between 22-1990 mm yr −1 (see a comprehensive study by Bruijnzeel et al., 2011). However, a seasonality analysis of the water balance (results not shown) highlights that the largest overestimations of the runoff ratio occur during the austral winter. This is incompatible with cloud water input, which would occur during periods of persistent cloud cover (Zadroga, 1981) which in the case of the Marañón Basin is the austral summer.
Previous studies that highlighted difficulties with water balance closure in similar magnitudes have attributed these to errors in precipitation because of the scarcity of gauge data for the upper basin (i.e. upstream of Brazil) (Guimberteau et al., 2012;Coe, 2002). Indeed, large errors have been reported in the literature with the TRMM and NCEP precipitation over the Amazon Basin. Ward et al. (2011) observed in Paute Basin both datasets underestimating precipitation during the dry season by 50 mm month −1 on average. In the Brazilian Amazonia, Clarke et al. (2010) observed a negative bias in the maximum annual daily rainfall of up to 80 mm day −1 in the NCEP reanalysis data, and an extension of the dry season when compared to gauged data. Both studies also found poor correlation to the gauged time series. The most unrealistic runoff ratios were found at the northern Andean basins (Paute, Santiago, Nueva Loja, and San Sebastian, Table 3). Large, lowland basins such as Borja, Chazuta, San Regis, and Nuevo Rocafuerte have lower and more reasonable runoff ratios. This is compatible with Ward et al. (2011), who found comparable, unrealistically high runoff ratio values over the Paute Basin in Ecuador and the Baker Basin in Chilean/Argentinean Andes, even with interpolated rain gauge data. This may highlight that the uncertainty of precipitation by global precipitation products in the study area is the most problematic when applied at small scale and over the mountainous regions, particularly in south-east Ecuador. This has been previously demonstrated in studies of rain gauge data by Buytaert et al. (2006b) and radar data by Rollenbeck and Bendix (2011). Both attribute the difficulties of capturing precipitation to the highly variable topography. According to Buytaert et al. (2006b), due to relief-induced micro-climates, the extent of precipitation events may be as small as 4 km. Rollenbeck and Bendix (2011) revealed multiple interactive processes such as convective and orographic rainfall at local and regional scales. These are unlikely to be fully resolved even using the highest resolution that most regional climate models are currently capable of. The TRMM data, on the basis of the model performance, provide a reasonable starting point for estimates of precipitation and are superior to precipitation data from NCEP reanalysis data. Pinpointing the exact weakness of the TRMM data is a challenging task, as the TRMM 3B42 precipitation is derived from multiple observational datasets, converted from measurements of infrared (IR) temperatures, passive microwave radiation, and radar reflectivities from multiple sources of satellites that work at different temporal and spatial scales and domains (Huffman et al., 2007). Dinku et al. (2010) attributes the tendency of TRMM to underpredict precipitation in mountain areas to several reasons. Firstly, the temperature measured above orographic clouds greatly     exceeds the temperature threshold above which the clouds are considered precipitating. The warm clouds also tend not to contain ice particles, which would provide more accurate estimates of rainfall from passive microwave observations. The recent release of version 7 (Huffman and Bolvin, 2012) of the product may bring some improvements, but a full analysis is outside the intended scope of this paper. Several noteworthy trends can also be observed in the interannual variation in the runoff ratio (Fig. 3). The ratios calculated with the TRMM product are higher and increasing between 1998 and 2004, but show a sharp decreasing trend after 2004. In contrast, the runoff ratios with NCEP precipitation are generally lower prior to 2004, but deteriorate after 2004, yielding values above 1. The significance of the year 2004 as a turning point for both datasets is not clear -in the case of TRMM data, it may be possible that this is linked to changes in the estimation algorithm for one of the contributing satellites in mid-2003 (Huffman et al., 2007), but in the case of NCEP, the reanalysis model has been held static throughout the time series. Nevertheless, there is the general tendency of a drier climate during the wet season (based on the trend of maximum annual flows at Borja, not shown) and the fact that calibration campaigns for the streamflow stations started in [2003][2004]. Therefore, despite the strong case for precipitation uncertainty, the possibility of a high streamflow data uncertainty cannot be discounted. Figure 4 illustrates several spatial and temporal trends in the model performance indicators that are in line with the observed trends in the water balance in Sect. 3.1. The relative bias is increasingly negative from the largest to the smallest basin and diverges between the simulations with TRMM and NCEP. Moreover, with the TRMM simulations, the bias starts to decrease in 2004, which coincides with the improvement in the runoff ratio, and the opposite is true with the NCEP simulations. The correlation between modelled and observed time series are relatively stable throughout the entire modelling period; this suggests that the model is reasonably capable of capturing the majority of the fluctuations in the hydrograph, provided that the water balance is accurate.

Simulation of streamflow
Hydrol. Earth Syst. Sci., 17, 1113-1132 This point is further evidenced by the RMSE, which takes a general decreasing trend after the year 2004 and simultaneously results in positive Nash-Sutcliffe efficiencies. The observed hydrographs (Fig. 5) show that, with the exception of San Regis, the river can be extremely flashy with discharge decreasing by more than 5000 m 3 s over the course of several days. The model seems capable of reproducing this response where there is dominant orographic control on the basin hydrology, i.e. in Chazuta. Here the shapes of the rising limbs and recession are sufficiently modelled, despite several missed peaks and under/overshooting of the time to peak. These errors are to be expected at the fine temporal scale of the model and the additional uncertainty from the runoff routing scheme.
On the other hand, the Santiago Basin and to a lesser extent the Borja Basin show a much less seasonally variable response, in which a flashy regime overlays a larger baseflow component. The baseflow is likely to be sustained by an extensive system of Andean wetland and lakes that form a major part of these upper mountain basins (i.e. Páramos and Jalcas, Buytaert and Beven, 2011). JULES' poor estimation of this baseflow may be attributed to the incomplete representation of lateral fluxes and the natural stores provided by these local topographic depressions. This limitation prevails at the full basin scale, where the model fails to replicate the extremely regulated flow regime observed at San Regis. The role of the floodplain at this scale cannot be ignored, as the Ucayali-Maranon depression (Rasanen et al., 1992) is a prominent feature and is capable of attenuating a large volume of the flows. Figure 6 is a comparison between the modelled and observed flow duration curves, which provide a better insight into the model performance over the entire flow regime. The slopes of the curves are reasonably well simulated, particularly with the TRMM precipitation as the driving variable. This is likely due to better estimates of precipitation intensities from the observation dataset, which also comes at a finer spatial resolution than the NCEP, and therefore may be better at capturing local convective rainfall. However, there is an overall underestimation of the discharge in the low-to midflow region and overestimation of peak flows, further confirming a missing flow attenuation component in the model.
For similar reasons, JULES-TOPMODEL is underperforming when compared to JULES-BASE. The flashiness of the response simulated by JULES-TOPMODEL is exaggerated, and this is suspected to be due to errors in the parameter scale. Indeed, the mean topographic indices calculated over the coarse model grids were negatively skewed, and the timeseries average of the grid-saturated fractions (Fig. 7) clearly shows that the partial areas contributing to saturation excess runoff were unreasonably overpredicted. In the lower basin, 50 % of the gridbox is saturated on average, which is high even for the flood forests that go through seasonal flooding. Figure 9 provides some insight into the model's sensitivity to variations in the soil parameters. It shows barely discernible differences in the outcome using the soil parameters   (4) stratified. This is in spite of the large range of values between the two estimation methods (Fig. 10) and between the top and subsoil layers (not shown). This could be due to the large modelling domain, in which the local effects from the parameter perturbations offset each other at the regional scale. However, it is more likely that the model is more responsive to the model forcing, or to some other parameters that were not included in the sensitivity analysis, e.g. saturated hydraulic conductivity and soil depth.

Simulation of internal variability of surface hydrological fluxes
We further compared JULES' simulated internal variability of surface hydrological fluxes to the values published in the literature (Fig. 8) to assess its robustness in representing the overall hydrological balance of a humid tropical basin. We identified a general negative bias in the simulation of evapotranspiration (ET). The simulated mean for flood forest ET is even lower than that for the lowland forest, which is counterintuitive given the higher water availability in floodplains. One possibility for this low bias is an underestimation of canopy interception, but the evidence for this is weak. Despite positive skews in the distribution, the simulated canopy throughfall (TF) largely resides within the literature ranges.
A similar bias was observed by Blyth et al. (2011) (with JULES) and Guimberteau et al. (2012) (with ORCHIDEE-LSM) in Amazonia. Blyth et al. (2011) also underestimated primary production, which may be correlated as the canopy conductance is a function of photosynthesis rate within the model (Cox et al., 1999). Both these processes are dependent on the soil moisture, as is also the bare surface evaporation, and it is likely that the low bias with ET is due to the model overestimating soil water stress. Studies focusing on simulations of the soil moisture state have indeed shown negative biases, particularly in the lowest layer in the soil profile, suggesting a weakness of the free gravity drainage assumption (Bakopoulou et al., 2012;Finch and Haria, 2006). This assumption prevents drawing up of soil moisture during dry periods when the water table dips below the maximum soil depth. The JULES-TOPMODEL implementation is an attempt to address this limitation with an underlying unconfined aquifer; however, the model is persistently saturated and loses the effective rainfall to the routing (as evident in the larger contribution to surface runoff with JULES-TOPMODEL), resulting in no observed improvement to the ET.
This reconfirms the need to simulate better the movement of lateral fluxes within the basin. Dadson et al. (2010) implemented a 2-D routing scheme based on the kinematic wave assumption that continuously estimates flood extents based on the simulated water level of each grid cell. The land cover tiles in JULES are updated in the subsequent timestep by converting the flooded fraction into an open water surface. The subsurface and consequently plant roots, however, do not gain access to this available moisture, as open water tiles in JULES do not infiltrate. Therefore, the improvement to ET estimates is solely due to the increase in the open water ET, which may not be sufficient for the flood forest system of the Marañón River basin. A more optimal model may be the floodplain implementation with the ORCHIDEE land surface model by d 'Orgeval et al. (2008), who model the surface area and volume of swamps and floodplain in order to calculate the water retention time, allowing re-infiltration into the subsurface during this period. However, their model splits each coarse LSM grid into smaller subbasins, likely requiring an instantaneous redistribution of soil moisture over the entire grid in the subsequent timestep. This assumption may undermine the soil moisture accounting in flooded versus dry sections of the grid, although it may be less problematic over finer scale grids.
A final nonetheless important observation is that in the upland biome, ET is better simulated and because of errors in the water balance, this corresponds to poor runoff generation. This is a clear contrast to the simulations in the lowland forests, where the poor estimation of ET instead makes up for the water balance errors, resulting in well-estimated runoff and, consequently, streamflow. In the latter case, data errors compensated for model and parameter errors and created a false impression of good modelling. Our study has shown that hydrological predictions with the JULES-LSM can be unreliable due to a large uncertainty in the driving data and the poor simulation of the baseflow component in the upper Andean basins. In the peneplain, the model is unable to reproduce the well-regulated regime as it neglected the hydrological functions of the flood forest. Nevertheless, for a global model that is not purpose-built for hydrological modelling, JULES is capable of producing reasonable simulations of the flow regime at fine temporal scale.
In constructing a robust model for impact analysis of a resilient system such as the Amazon, it is important to represent the hydrological system holistically in terms of the internal states and fluxes, perhaps more than it is to score a near-perfect Nash-Sutcliffe efficiency. We further assessed whether the model is capable of behaving as a mirror image of real systems elsewhere in terms of the basin's internal of hydrological fluxes. We have identified the model weakness in the estimation of ET and suspect this to be due to errors in predicting soil water availability due to misrepresentation of the inundated areas of the Andean wetlands and in the Amazon floodplain. Future research will explore adaptation of the model structure to better represent the hydrological functions of these natural features.