The effect of GCM biases on global runoff simulations of a land surface model

Global Climate Model (GCM) outputs feature systematic biases that render them unsuitable for direct use by impact models, especially for hydrological studies. To deal with this issue many bias correction techniques 10 have been developed to adjust the modelled variables against observations, focusing mainly on precipitation and temperature. However most state-of-art hydrological models require more forcing variables, additionally to precipitation and temperature, such as radiation, humidity, air pressure and wind speed. The biases in these additional variables can hinder hydrological simulations, but the effect of the bias of each variable is unexplored. Here we examine the effect of GCM biases on historical runoff simulations for each forcing 15 variable individually, using the land surface model JULES set up at the global scale. Based on the quantified effect, we assess which variables should be included in bias correction procedures. To this end, a partial correction bias assessment experiment is conducted, to test the effect of the biases of six climate variables from a set of three GCMs. The effect of the bias of each climate variable individually is quantified by comparing the changes in simulated runoff that correspond to the bias of each tested variable. A methodology 20 for the classification of the effect of biases in four Effect Categories (ECs), based on the magnitude and sensitivity of runoff changes, is developed and applied. Our results show that, while globally the largest changes in modelled runoff are caused by precipitation and temperature biases, there are regions where runoff is substantially affected by and/or more sensitive to radiation and humidity. Global maps of bias ECs reveal the regions mostly affected by the bias of each variable. Based on our findings, for global scale applications, 25 bias correction of radiation and humidity, in addition to that of precipitation and temperature, is advised. Finer spatial scale information is also provided, to suggest bias correction of variables beyond precipitation and temperature for regional studies.


Introduction
In recent years, there is a strong consensus on the changes in climate caused by increased concentrations of anthropogenic green-house gas emissions (King et al., 2015;O'Neill et al., 2017;IPCC, 2013). Under the pressing circumstances of a warming world, scientific research has focused on estimating the range of changes in the future climate and the effectiveness of different adaptation strategies. The main tool for the 5 investigation of future climate is the utilization of Global Climate Models (GCMs). GCMs are based on physical principles that describe the components of the climate system, such as cloud formation and water and energy flux exchanges.
Although each generation of GCMs shows improvements compared to its predecessor , climate model outputs still contain substantial biases that express as deviations of the modelled climate 10 variables from respective historical observations. These inherent biases can emanate from misrepresentations of physical atmospheric processes (Maraun, 2012), from uncertainties regarding the boundary and initial model conditions (Bromwich et al., 2013), and from the relatively coarse resolution employed by the GCMs (Katzav and Parker, 2015). As a result, outcomes of hydrological climate change impact studies have been reported to become unrealistic without a prior adjustment of climate forcing biases (Ehret et al., 2012;Hansen 15 et al., 2006;Harding et al., 2014;Sharma et al., 2007). To overcome this limitation, various bias correction techniques have been developed to post-process climate model data to statistically match observations. Bias correction methods are calibrated based on a historical time period for which observations are available. The adjustment is then applied to both modelled historical period and to the period beyond the time-frame of the observations. 20 Bias correction procedures have mainly focused on adjusting the biases of precipitation and/or temperature (Christensen et al., 2008;Li et al., 2010;Miao et al., 2016;Photiadou et al., 2016;Piani et al., 2010). These variables have traditionally been prioritized for bias correction as they are considered as the most important driving variables of hydrological processes in modelling applications -even though from a physical perspective radiation is the driving force of the hydrological cycle. However, many state-of-the-art regional 25 and Global Hydrological Models (GHMs) and Land Surface Models (LSMs) require -apart from precipitation and temperature-additional meteorological forcing, such as solar radiation, air humidity, surface air pressure and wind speed (a summary of the input variables needed by various hydrological models can be found in the Supplement of Hattermann et al. (2016)). For this reason, biases in variables like radiation, humidity and wind speed can hinder the representation of hydrological fluxes such as runoff, evapotranspiration (ET), 30 snow accumulation and snowmelt by the impact models (Hagemann et al. 2011;Haddeland et al. 2012), indicating that bias correction should be extended to include more input variables.
Bias correction itself also has limitations, as it is a demanding process, both in terms of computational cost and of the involved methodological development. Moreover, the use of bias correction is challenged by conceptual pitfalls such as the disruption of the physical consistency of climate variables, the mass/energy balance and the omission of correction feedback mechanisms to other climate variables (Ehret et al., 2012).
For these reasons, it is worth examining whether the effect of biases of input variables on hydrological outputs 5 justifies the use of bias correction. Even though this information would be key for making informed decisions on the variables that should be bias corrected for a specific model application, few relevant studies can be found in literature. Some insight is given by Haddeland et al. (2012), who investigates the combined effect of bias correcting radiation, humidity and wind speed in addition to precipitation and temperature on hydrological simulations. However, the extent to which individual forcing variable biases affect hydrological 10 simulations and the way that this effect varies spatially are important research questions that remain open.
Here we investigate the effect of the biases in GCM climate variables on the historical runoff output of a large scale LSM. To this end, we firstly quantify the improvements in the representation of historical modelled runoff when bias corrected variables are used as forcing. Secondly, we examine the individual effect that the bias of each climate variable can have on runoff simulations. This way we can provide an 15 assessment of the variables beyond precipitation and temperature that may be considered as "priority" variables for bias correction, due to their possible pronounced effect on hydrological simulations.

The JULES land surface model
Hydrological simulations were performed with the Joint UK Land Environment Simulator (JULES) model 20 (Best et al., 2011). JULES is a physically based model that calculates water, energy and carbon exchanges between the land surface and the atmosphere. The science modules that comprise the model are: surface energy fluxes, snow cover and surface hydrology, soil moisture and temperature, soil carbon, vegetation dynamics and plant physiology. The model requires seven climate variables as forcing, namely: precipitation, temperature, longwave and shortwave radiation, specific humidity, surface pressure and wind speed. Runoff 25 production in JULES has two components. The first one is surface runoff, produced by the infiltration excess mechanism. The second one is subsurface runoff (or drainage from the bottom of the soil column), which is calculated as a Darcian flux under the assumption of zero gradient of matric potential. Calculation of potential evaporation follows the Penman-Monteith approach (Monteith, 1965). Water held at the plant canopy evaporates at the potential rate while restrictions of canopy resistance and soil moisture are applied for the 30 simulation of evaporation from soil and plant transpiration from potential evaporation (Best et al., 2011). For a detailed description of JULES the reader can refer to the model description papers of Best et al. (2011) and Clark et al. (2011). Examples of recent model applications on climate change impact assessments can be found in the studies of Papadimitriou et al. (2016), where JULES is used to investigate future water availability in Europe, and Grillakis et al. (2016), who estimated the climate induced changes in soil temperature regimes.

Model setup and outputs 5
JULES was run at the global scale, with a spatial resolution of 0.5 degrees. A daily time step was employed for all the model runs. To warm up the model, 10 spin-up cycles from 1973 to 1978 were performed before each main run. The main runs span from 1978 to 2010 but only the time period of 1981 to 2010 is used for the analysis. The model outputs are produced with a daily time resolution.

Hydrological evaluation 10
This study focuses on the runoff production output of JULES, hereafter denoted as RF. For the assessment of model performance, RF is aggregated at the basin level to allow for comparison with discharge observations. To this end, RF is converted to discharge at the basin outlet (noted as Q) through a delay algorithm proposed by Zulkafli et al. (2013) and the use of TRIP river routing scheme (Oki and Sud, 1998) to determine the grid boxes upstream of the basin's outlet. 15 For the evaluation of JULES' hydrological performance, three metrics are used: Nash-Sutcliffe efficiency (NSE), Percent bias (PBIAS) and the coefficient of determination (R 2 ). The formulas for the calculation of NSE and PBIAS are given in equations 1 and 2: where is simulated discharge, is observed discharge and is the mean of observed discharge data. Discharge observations were obtained from the Global Runoff Data Centre (GRDC) database for 9 large-scale basins shown in Figure 1. Information on the basin stations which for model evaluation is presented in Table S1 in the Supplement of this paper. 25 The evaluation metrics are calculated from monthly discharge data. These are the monthly averages of daily discharge for simulations while observations were obtained in monthly time-steps. Model evaluation was based on the historical period from 1981 to 2010. The months missing from the observed discharge time series were neglected from the calculation of the evaluation metrics.

Climate data
The climate dataset used for bias correction of the GCM data and as a baseline for comparison of the results is the WATCH Forcing Data methodology applied to ERA-Interim data (WFDEI; Weedon et al. 2014).
WFDEI data span from 1979 to 2012, but here only the time period from 1981 to 2010 was used. The WFDEI dataset is based on its predecessor WFD (WATCH Forcing Data; Weedon et al. 2010), which was derived 5 from the ERA-40 reanalysis product (Uppala et al., 2005). For detailed information on the derivation of the WFDEI dataset the reader is referred to Weedon et al. (2014).
Data from three GCMs participating in the fifth phase of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) were used as forcing. Information on the ensemble members can be found in Table 1. Climate model outputs were interpolated to the 0.5 o spatial resolution of the WFDEI dataset, using 10 the nearest-neighbor method.

Bias correction method
The bias correction methodology presented by Grillakis et al. (2013), namely Multi-segment Statistical Bias Correction (MSBC), is used to adjust the biases in precipitation. MSBC follows the principles of quantile mapping correction techniques and was originally designed and tested for GCM precipitation adjustment. 15 According to the method, the Cumulative Distribution Function (CDF) space is split into discrete segments and then the individual quantile mapping correction is applied on each segment, achieving better fit of the parametric equations on the data and thus better correction, especially on the CDF edges. The optimal number of segments is estimated by the Schwarz Bayesian information criterion to balance between complexity and performance. A modification of the methodology is used for bias adjustment of the rest of the variables that 20 were used. The modified methodology uses linear functions instead of the gamma that were used in the original methodology. This change allows for the facilitation of negative variable values that the gamma functions cannot simulate. Hence, the methodology becomes more universal to be used in different variable types and distributions. An additional methodological change is performed to the highest and lowest segments correction, which are explicitly corrected using only the difference between the historical period model data 25 and the observations. This provides rigidity to the correction, avoiding unrealistic temperature values at the edges of the corrected data CDF. A detailed description and technical details of the modification can be found in (Grillakis et al., 2017). As MSBC methodology belongs to the parametric quantile mapping techniques, it shares their advantages and drawbacks. A comprehensive analysis of advantages and disadvantages of the methods that follow the quantile mapping comparing to others can be found in Maraun et al. (2010) and 30 Themeßl et al. (2012). The methodology has already been used in in the framework of ECLISE FP7 (265240) and HELIX FP7 (603864) projects and in a number of climate change impact studies Papadimitriou et al., 2016). Additionally MSBC has participated in the Bias Correction Intercomparison Project (BCIP) (Nikulin et al., 2015), where it was found to compare well to the other methodologies and was ranked high in performance.
As the bias adjustment involves only the reference period of the GCM data using the same period's observations, its effect is simply limited to the equalization of the cumulative density functions of the raw 5 GCM data towards the WFDEI data. A number of parameter checks were performed to the corrected data, such prevention of unrealistic values (e.g. negative values to positively constrained variables) and the avoidance of extreme values beyond or below the historical record of WFDEI. The correction was performed separately for each calendar month, keeping physical coherence of the bias adjusted variables, as they are adjusts for their seasonality in a coherent way according to the observational dataset that is used. 10

Experimental design
In order to examine the effect of each forcing variable's bias on runoff we designed and implemented an experiment comprised of two parts (bias assessment and partial correction bias assessment) and nine sets of JULES' runs in total. A graphical description of the performed experiment is shown in Figure 2. Climate data 15 from 3 GCMs and the WFDEI dataset are used as JULES' forcing. The sets of runs forced with GCM data, include three model runs -one per GCM. Then the analysis progresses using the ensemble mean. The time span of this analysis is the historical period 1981-2010. This is also the time span of the period used for bias correction of the GCM output.

Bias assessment 20
The first part of the experiment is to assess initial and remaining biases in the forcing data and in simulated runoff. Initial bias refers to the difference between raw GCM variables and the respective WFDEI variables.
Remaining bias is the bias in the forcing variables after the bias correction, i.e. the difference between bias corrected GCM variables and the respective WFDEI variables. Referring to runoff, "initial" and "remaining" biases are defined as the difference between runoff simulations forced with raw and bias corrected forcing 25 respectively from simulations forced with the WFDEI dataset. This definition is employed to shorten and simplify the expressions used in this paper (i.e. "initial bias in runoff" instead of "the difference between runoff forced with raw GCM data and WFDEI data"). In this part of the experiment, three sets of JULES' runs were conducted: i) forced with WFDEI (WFDEI) 30 ii) forced with uncorrected climate data (Raw) iii) forced with bias corrected climate data (BC).

Partial correction bias assessment
For the second part of the experiment -the partial correction bias assessment-six more sets of JULES' runs were performed. In each of these runs, one of the six forcing variables (precipitation, temperature, radiation, humidity, surface pressure and wind speed) is used in its raw form while the rest of the input forcing is bias 5 corrected. The partial correction assessment runs are symbolized as NobcV (NOt Bias Corrected variable V), where V is one of the six forcing variables: precipitation (P), temperature (T), radiation (R), specific humidity (H), surface pressure (Ps) and wind (W). It has to be noted here that -downward longwave (Rl) and downward shortwave (Rs) were examined together, hence in respective NobcR run, both downward shortwave and downward longwave radiation were forced in uncorrected form. Partial correction assessment is composed 10 as a tool to quantify the individual effect of each forcing variable on runoff but is not designed to suggest and assess run formats.
The simulated runoff of each partially corrected input is compared to the respective simulation in which all input variables are bias corrected (denoted as BC). This comparison allows us to assess the "loss" of the performance of simulations when a variable is neglected from the bias correction procedure. It must be noted 15 however that the "loss of performance" concept bears the assumption that the BC simulation is closer to the WFDEI simulation comparing to a partially corrected set.

Categorization of individual variable bias effects
A new framework for the classification of the effects of forcing variables' biases on modelled runoff is developed and implemented. The classification employs the comparison of the bias in each forcing variable 20 (ΔV) and the corresponding relative effect in simulated runoff (ΔRF), discretizing four different categories ( Figure 3). To facilitate the comparison among the different forcing variables, ΔV and ΔRF are expressed as percentages. More specifically, ΔV and ΔRF are defined as follows.
ΔV is the difference between the raw and the bias corrected variable value, divided by the bias corrected variable value. ΔV is estimated by equation Eq. 3. 25 ΔV= - * 100% (3) As an exception, for temperature ΔV refers to the absolute difference between raw and bias corrected temperature (in K).
ΔRF expresses the effect of a variable's bias on runoff and is calculated from the difference between runoff forced with all bias corrected variables except for the examined variable V (NobcV) and runoff forced with 8 all bias corrected variables (BC), divided by the runoff of all bias corrected variables (BC Sensitivity of runoff to changes in forcing variables (S) is the fraction of runoff change over the forcing variable change and serves as a measure to assess the relative magnitude of ΔRF compared to ΔV. When 5 ΔRF is sensitive to ΔV, relatively smaller changes in the variable should cause relatively larger changes in runoff and vice versa. Sensitivity is in general dimensionless, but for temperature has units of K -1 . S is estimated by: In total, there are 6 sets of ΔVs and 6 sets of ΔRFs, one for each examined variable and experiment 10 respectively, and 6 sets of sensitivities (S). The absolute values of ΔV, ΔRF and S denoted as |ΔV|, |ΔRF| and |S| are used to avoid dealing with the sign of the changes and rather focus on their magnitude. Figure 3, the effect of each variable's bias (|ΔV|) on runoff (|ΔRF|) is separated into four different categories according to two rules. The first rule is the characterization of |ΔRF| among all the experiments as "low" or "high" relatively to its median value, shaping the ordinate y=median ( 2.9 Regional scale bias assessment Regional focus is given at 24 regions and 9 hydrological basins. The regions were selected from the 26 regions presented in Giorgi & Bi (2005) (in our study Alaska and Greenland are excluded from the analysis).

As shown in
The hydrological basins were selected to cover different hydro-climatic regimes, in conjunction with GRDC data availability. The selected regions and basins are shown in Figure 1. The abbreviations of the regions' names can be found in Table 2.

Long-term annual biases in forcing variables at the global scale 5
Global maps of the initial and remaining annual biases of the forcing variables are shown in Figure 4.
Respective information on the seasonal biases is presented in Figures S1 and S2 of the Supplement of this paper. In general terms the remaining annual biases are smaller than the initial ones by one to two orders of magnitude. For precipitation (Figure 4a), the largest initial wet biases are observed for regions with high mountain ranges (the Andes in South America, the Alaska Range and the Rocky Mountains in North America 10 and the Himalayas in Asia) and for the tropical African and Indonesian regions.. Only a very small percentage (0.75%) of the land surface has small biases (-0.01 to 0.01 mm/day) while the largest biases (>5 mm/day or <-5 mm/day) occupy 31.18% of the land surface. The remaining biases in precipitation are small (up to 0.01 mm/day in absolute terms, for 80.32% of the land surface) and located in the tropics. The initial biases in temperature are cold biases for 57.82% of the land surface while warm biases (mainly found in the Alaskan, 15 Greenland, north and central Asia regions as well as in the Mediterranean and the Andes) occupy 42.12% of the land surface ( Figure 4b). Initial biases greater than 2 K in absolute terms cover approximately one third of the land surface (34.74%). After bias adjustment, the remaining temperature bias is less than 0.1 K for the vast majority of the land surface (97.27%).
The initial biases of longwave and shortwave radiation ( Figure 4c and Figure 4d respectively) exhibit similar 20 spatial variations but have different signs. Shortwave radiation shows a greater extent of large biases (>50 W/m 2 in absolute terms) compared to longwave radiation (8.16% as opposed to 2.95% of the land surface).
Initial biases in specific humidity are greater than 10 -3 kg/kg (1g/kg), in absolute terms, for one quarter of the land surface (23.65%) ( Figure 4e). The largest biases in surface pressure (>50 or <-50 HPa) occupy 10.01% of the land surface and are found in the areas where high mountain ranges are located (Rocky Mountains, 25 Andes, Himalayas) ( Figure 4f). The remaining bias in surface pressure is less than 0.1 HPa (in absolute terms) for most of the land surface (96.50%). For more than half of the land surface (55.79%) wind's initial biases are larger than 0.5 m/s or smaller than -0.5 m/s (Figure 4g). The remaining biases of the wind variable range between -0.01 and 0.01 m/s for the majority of the land surface (87.71%).
Generally, the initial GCM biases in precipitation and temperature are more pronounced over high 30 mountainous regions and the tropics. Recent studies argue towards a dependency between biases and altitude.
According to the study of Haslinger et al. (2013), both temperature and precipitation biases of a GCM tested over the Alpine Region, show increasing trends with height. Regarding the tropics, various studies show increased GCM biases in these regions compared to model performance in other climate zones Randall et al., 2007;Solman et al., 2013). The initial surface pressure biases are also linked to altitude, as surface pressure heavily depends on elevation. Initial biases in surface pressure have an elevationsimilar pattern and could be a result of the different spatial resolution of the elevation model in the GCMs 5 and WFDEI. The WFDEI dataset resolution is 0.5 degrees while the original GCM spatial resolution is considerably lower (around 2.5 degrees). GCM surface pressure is simulated taking into account a relatively low resolution elevation model. Although GCM surface pressure is interpolated to the WFDEI resolution, this does not correct the elevation induced error in the GCM simulations.
The remaining biases in precipitation at the tropical regions were also identified and discussed extensively 10 by Grillakis et al. (2013) and are related to the error in the CDF approximation during bias correction. For the rest of the variables, the remaining bias although not actually zero is very close to zero (well below the smallest positive and above the smallest negative rank in the legend, e.g. below -0.1 K and below 0.1 K for temperature). The color scale in Figure 4 was selected with the intention of showing the remaining biases, but this does not mean that their values are accountable. They are rather trace errors occurring due to 15 truncation numerical errors during the bias correction process. Hence the remaining biases (except for precipitation) could not be attributed to a specific mechanism.  Table S2 of the Supplement of this paper. Table S2 provides the values of raw input variables for each ensemble member, the ensemble mean value and the respective WFDEI value, 25 averaged for the 24 study regions.

Regional and seasonal biases in forcing variables
Precipitation biases are less pronounced in Europe (NEU, MED, NEE) and in central and north Asian regions (CAS, NAS). The wettest precipitation biases are encountered in the equatorial and Southern Africa (EQF, SQF and SAF) and concern DJF precipitation ( Figure 5). The driest biases are found for the CAM, AMZ and SAS regions, for JJA precipitation. Temperature displays cold biases in most regions. A notable exception is 30 the warm bias in DJF temperature in the NAS region, which is the most pronounced temperature bias found.
Generally the DJF temperature biases are the largest, followed by ANN, while the JJA season has the smallest temperature biases.
The two radiation components, long-wave (Rl) and short-wave (Rs) radiation, show an inverse behavior in their biases ( Figure 5). That is to say, in regions where Rl has negative biases Rs exhibits positive biases and vice versa. According to Demory et al. (2014), overestimation of shortwave radiation is a common issue

Model evaluation
In order to assess JULES' performance, we compare discharge modelled with WFDEI and with the raw GCM dataset to discharge observations for nine study basins. Figure 6 shows the seasonality of observed and modelled discharge and the evaluation metrics of the two sets of simulations (WFDEI and raw GCM) are 20 presented in Table 3Error! Reference source not found..
For seven out of the nine basins (Amazon, Congo, Volga, Ganges, Danube, Elbe and Kemijoki) seasonality is well captured by the WFDEI simulation ( Figure 6). In contrast, the raw GCM simulation exhibits Τhe shown persistent departure from the mean climatology of discharge could include three types of errors.
The first is the error stemming from the insufficient description of the runoff processes by the land surface model and from the routing algorithm (Blyth et al., 2011). The second type of error is a result of errors in the 5 forcing datasets (either observational or GCM output) with regards to depicting the real climatic drivers (Elsner et al., 2014;Mizukami et al., 2014). A third possible error comes from the comparison of naturalized discharge of the simulations with measured discharge due to influences like abstractions and dams regulating the natural river flow (Müller Schmied et al., 2014). An extra error component, which is not considered here, could result from the uncertainty in discharge measurements (Coxon et al., 2015). 10 The model evaluation has revealed two basins (Mississippi and Lena) for which raw GCM forced discharge simulations outperform the WFDEI simulations. For Mississippi, the WFDEI run gives higher discharge than the observations throughout the year, revealing a deficiency of the model to capture the water balance of this basin. Most of the Mississippi extent is in the CNA region, where negative precipitation biases have been documented ( Figure 5). Thus, the raw GCM run is forced with less precipitation compared to WFDEI and 15 less discharge is produced, masking the model deficiency in this basin and improving the metrics of model performance. It is also important to note that the range of the raw GCM simulations is quite broad especially for a three member ensemble. The upper range of the GCM ensemble exceeds the WFDEI simulated runoff during almost half the seasonal cycle. This indicates that the individual ensemble members would not necessarily outperform the WFDEI run and that, for this specific basin, the ensemble averaging has possibly 20 produced a "false-positive" in model performance. In this particular basin, model performance may also be hindered due to the comparison of naturalized with actual discharge, as Mississippi is a heavily regulated river. For Lena, the WFDEI run underestimates measured discharge by about 40%. The Lena basin falls into the extent of the NAS region, for which positive precipitation biases have been documented ( Figure 5). The extra water in the raw GCM run counteracts the tendency of the model to underestimate discharge in the 25 Lena basin, resulting in an improved model performance. In the context of the present study we are not able to identify the exact reasons why model performance is hindered in some basins. It is unrealistic for a global LSM to achieve top performance around the world (Hattermann et al., 2017), as, due to its global nature, some fixes in some regions could result in deteriorations in performance in other parts of the land surface.
Thus, the interpretation of the following analysis of the present study should consider the model deficiencies 30 revealed at this section. Figure 7 shows the initial and remaining biases in runoff, derived from ANN, DJF and JJA long term means.

Long-term biases in runoff at the global scale
As with the biases in the input forcing variables, the remaining bias in runoff is one to two orders of magnitude smaller than the initial bias. Hence, the use of bias corrected data led to an improved representation of runoff by the model, compared to the baseline of the WFDEI run. Accordingly, the studies of Teutschbein & Seibert (2012) and Rojas et al. (2011) found that hydrological simulations are substantially improved with 5 the use of bias corrected forcing.
Regarding the raw GCM run, the largest runoff underestimation biases (<-5 mm/day) are encountered in central-north America, the central-east part of South America and East Asia. The most pronounced runoff overestimation biases are found in the west part of North and South America, in equatorial, south Africa, northern Europe, the Tibetan region and Indonesia. Initial runoff biases are larger than 1 mm/day in absolute 10 terms for 16.26%, 14.85% and 20.18% of the land surface respectively for ANN, DJF and JJA. The differences between the seasonal means (DJF, JJA) and the annual mean (ANN) are in general subtle. Yet, the increases in runoff overestimation biases in DJF in south equatorial Africa and in JJA in the Tibetan plateau are worth noting. Large initial biases (>5 mm/day in absolute terms) in seasonal means occupy a greater percentage of the land surface compared to annual mean (0.70% for ANN, compared to 1.25% and 15 1.97% for DJF and JJA respectively).
The remaining biases in runoff range from -0.1 to 0.1 mm/day for the majority of the land surface (95.19%, 87.40% and 80.30% for ANN, DJF and JJA respectively). Negligible biases (smaller than 0.01 mm/day in absolute terms) are found for more than one third of the land surface (specifically for 38.06% of the land area for ANN, 37.60% for DJF and 34.42% for JJA). The (negative) remaining bias in ANN runoff is more 20 pronounced in the west Amazonian region. This probably corresponds to the remaining bias in precipitation identified for the Amazon region ( Figure 4). In addition to the significant reduction of the biases in runoff forced with bias corrected data, it can be observed that the remaining biases have switched signs compared to the initial biases. This means that in regions where the initial bias in runoff is positive (negative), thus the raw GCM forced runoff is larger (smaller) than runoff forced with WFDEI, the use of bias corrected forcing 25 results in runoff slightly lower (higher) than WFDEI runoff. A respective behavior was not observed in the initial and remaining biases of the most impacting forcing variables (P and T) but it was, to an extent, present for other variables (Rl, Rs and H). Thus, the "overcorrection" manifested for bias corrected runoff compared to WFDEI runoff cannot be attributed to remaining biases in precipitation and temperature. Instead, it could plausibly be associated with the compound effect of the remaining biases in part of (or in all other) forcing 30 variables.

Effect of each forcing variable's bias on runoff
The effect that the bias of each forcing variable can have on runoff is investigated here, by comparing runoff from the bias corrected run to the partial correction assessment runs. The results are shown in Figure 8, for ANN, DJF and JJA averages.
First, we discuss the runoff differences calculated from the ANN period. Precipitation and temperature are the only two variables that cause runoff differences larger than 5 mm/day (in absolute terms) when neglected 5 from bias correction. However, these differences regard a very small percentage of the land surface: 0.61% for precipitation and only 0.02% for temperature. Moreover, precipitation bias causes changes in runoff greater than 1 mm/day (in absolute terms) for 14.28% of the land area. Such changes for the other variables occupy a significantly smaller fraction of the land area (ranging from 1.21% for temperature to 0.05% for wind). Based on the above it can be stated that precipitation is the variable that mostly affects runoff response. 10 Precipitation bias causes both wet and dry biases in different regions of the land surface, with a pattern that closely resembles the effect of the initial GCMs' biases on runoff (Figure 7). A similar pattern between precipitation and runoff biases was also observed by Teng et al. (2015), who noted that precipitation errors are magnified in modelled runoff. Temperature biases result in runoff overestimation for around 60% of the land surface (e.g. over west-and east-North America, the Amazon region, equatorial Africa, northern Europe 15 and parts of Asia) and runoff underestimation for around 40% (example regions: parts of central-south America and of central Asia). Temperature biases correspond with small changes in runoff (up to 0.01 mm/day in absolute terms) over about one third of the land area. Excepting the radiation components from the bias correction procedure produces negative runoff changes for the majority of the land surface (67.60%), while for around 80% of the land surface the differences in runoff range between -0.1 and 0.1 mm/day. The 20 bias in the specific humidity variable corresponds to runoff overestimations for 64% of the land area. The areas of runoff overestimation are mainly located at the higher latitudes (northern part of north America, Europe, north Asia). For 36.43% of the land surface, changes in runoff due to specific humidity biases span between 0.1 and 0.5 in absolute terms. Surface pressure and wind are the variables that show the smaller effect on the hydrological output, as their exclusion from bias correction corresponds to small changes in 25 runoff (less than 0.1 mm/day in absolute terms) for the vast majority of the land surface (around 94% and 92% of the land surface respectively for surface pressure and wind speed). The most pronounced differences in runoff due to surface pressure biases are negative and are encountered over the high mountain ranges' regions of South America and Asia (Andes and Himalayas respectively).
The patterns of runoff changes due to the biases of the forcing variables derived from annual (ANN) and 30 seasonal (DJF, JJA) averages show only subtle variations. In general the above analysis on the ANN runoff differences applies also to the seasonal values, with small variations on the land fractions that show a specific response to forcing biases.
From this analysis it can be deduced that apart from the main hydrological cycle drivers (precipitation and temperature), radiation and specific humidity can also pose a substantial effect on runoff, especially for specific regions. These findings will be further investigated and discussed in the following sections. Other 5 studies also advocate towards the considerable effect that biases in radiation (Mizukami et al., 2014) and humidity (Masaki et al., 2015) can have on hydrological fluxes.

Runoff sensitivities to forcing variables
Sensitivity of runoff changes to the biases of the forcing variables is examined by exploring the relationship between the input forcing biases (ΔV) and the corresponding changes in runoff (ΔRF). The regional variation 10 of this relationship is also investigated. Figure 9 shows scatterplots of ΔRF versus ΔV for each examined variable, for 10 selected regions. The dots in each scatterplot correspond to the land grid boxes of each region.
The presented regions are selected as representative of different parts of the land surface, as the number of the regions shown in the manuscript had to be reduced for clarity of the results. Scatterplots of the 24 examined regions can be found in the Supplement of this paper ( Figure S3). The median values of ΔV, ΔRF 15 and S of the land grid boxes of each region, for the 24 examined regions, are shown in Table 4.
The correlation between the six ΔVs and respective ΔRFs differs substantially between the examined regions.
Generally, the correlations show a non-uniform behavior, identified by the highly scattered data clouds. This implies a high spatial variability of runoff sensitivity to the examined variables.
For precipitation, the ΔRF over ΔP relationship exhibits a nonlinear behavior, indicating that the relative 20 change is runoff is not proportional to precipitation bias, but also depends on the magnitude of precipitation bias. Renner et al. (2012) also identified nonlinearities in the relationship between relative changes in streamflow and changes in precipitation and argued that nonlinear behavior is a result of the combined effects of water and energy balances. Temperature biases have an inversely proportional and highly nonlinear relationship with changes in runoff. The ΔRF over ΔT relationship is also variant for different regions. For 25 example, the scatterplots for NEU and WNA indicate that small temperature biases may correspond with large changes in runoff. In contrast, the scatterplot for CAM indicates that larger temperature biases correspond with smaller changes in runoff compared to the other regions. Radiation biases are small but can correspond with high changes in runoff for some regions (WNA, SAS, WAF, AMZ). For specific humidity it can be observed that small positive biases correspond to high changes in runoff for some region (NEU, 30 axis (i.e. changes in runoff are smaller). Surface pressure has smaller biases compared to the other forcing variables and its effect on runoff also appears reduced. Wind has a wide range of both positive and negative biases which, however, do not seem to affect runoff accordingly.
The variation of the ΔRF over ΔV relationships across the different regions can be attributed to a number of factors. First, it depends on the magnitude and signal of the biases in the forcing variables. As previously 5 shown, these can have significant spatial variations (Figure 4). For example, according to the median values of relative changes in Table 3 Additionally, we should consider that although we assess the effect of long-term annual biases on long-term annual runoff, the results are still depended on the seasonal cycles of the variables and/or runoff, especially if the seasonality of precipitation in the region is strong. For example, the same annual bias in temperature would translate differently to runoff changes in a region with precipitation evenly dispersed throughout the 15 year and in another region where most of annual precipitation happens during the summer months. Finally, as this is a model-based experiment, we should consider whether high sensitivities of some variables for specific regions are a result of over-sensitivity of the model. Vano et al. (2012) documented considerable differences in the spatial distribution of sensitivities to precipitation modelled by five LSMs. Figure 10 shows global maps of bias effect categories (ECs) for each forcing variable, derived according to the methodology described in Section 2.8. The land area fraction corresponding to each EC is tabulated in Table 5.

Spatial distribution of bias effect categories 20
Precipitation is the variable whose biases have the largest effect on runoff, with the vast majority of the land surface (92%) corresponding to the high change categories ECI (67.80%) and ECII (24.20%). Radiation has 25 the second largest land fraction in ECI but temperature has the second largest land fraction in the high change categories (ECI and ECII). Radiation also has the largest land fraction in the high sensitivity categories (ECI and ECIII). As discussed in Section 3.6, this is possibly a result of combining shortwave and longwave radiation for the calculation of the radiation biases. For specific humidity, the most affected areas (ECI) show a significant spatial coherence and are clustered in the higher latitudes of the globe. Surface pressure biases 30 belong to ECI for around one tenth of the land surface. The highly affected areas mainly correspond to regions with high mountain ranges. For wind the majority of the land surface corresponds to ECIV. Still, around one quarter of the land surface belongs to the high change categories (ECI and ECII).

Discussion of runoff sensitivities
Here we compare our findings to the respective literature to assess the realism of JULES' sensitivity. We use the median sensitivity value of the grid boxes of each region (Table 4) as the representative sensitivity S for 5 each region. Moreover, we discuss issues of possible model over-sensitivity in particular regions and the caveats of this study.

Sensitivity of runoff to precipitation
Most studies have examined the sensitivity (also reported as elasticity) of runoff (or discharge) to precipitation. A number of studies have examined sensitivity to precipitation for regions or basins in the 10 United States. Values of runoff sensitivity (S) to precipitation between 1.5 and 2.5 were reported by Sankarasubramanian and Vogel (2003) for the US (WNA, CNA and ENA). Fu et al. (2007) reported values of 1.5 to 1.67 for the Spokane River basin (located in WNA). Vano et al. (2012) found that S to precipitation ranged from 2.2 to 3.3 for different LSMs for the Colorado River basin (also located in WNA). For the Mississippi River basin (mainly located in CNA), Renner et al. (2012) found that S of streamflow to 15 precipitation is 2.38 and 2.55 using two different methods for sensitivity estimation. For another basin located in CNA, Brikowski (2015) reported runoff S to precipitation to be 2.64. For the US region, the S values found in this study compare very well with the literature values. Runoff S to precipitation is 2.12 for WNA, 2.54 for CNA and 1.69 for ENA. Many studies report S to precipitation for regions or basins of China. Reported values of runoff S to precipitation in the Yellow River basin (located in EAS) are 1.4 to 1.69 (Fu et al., 2007), 20 1.6 to 3.9 for 89 catchments of the EAS region (Yang and Yang, 2011), 1.71 and 1.74 (estimates of two different methods) for the headwaters of the Yellow River (Renner et al., 2012). Again, the value found in our study is in good agreement with the literature (S to precipitation for EAS is 1.70).

Sensitivity of runoff to temperature and other variables
A number of studies have examined runoff sensitivity to temperature changes. Vano et al. (2012) reported S 25 to temperature values ranging from -2 to -9 C -1 between 5 LSMs for the Colorado River basin (WNA) and Brikowski (2015) reported a value of -0.41 C -1 for S to temperature in a basin in CNA. Our values for these regions are substantially lower (-0.13 K -1 for WNA and -0.07 K -1 for CNA). This divergence could be attributed to two factors. First, to an extent it could be connected to possible non-sensitivities of our model to temperature changes for these regions. Second, the differences could arise from the inclusion (or not) of 30 the physical link between temperature and other variables in the analysis. Vano et al. (2012) use different LSMs to calculate sensitivities by perturbing daily temperature maxima and minima. These changes also affect the downward longwave radiation and humidity, which are then used by the evapotranspiration routines of the LSMs. In our case, the change in temperature does not interact with radiation and humidity, as those are read as input variables by the model. When temperature is allowed to interact with humidity, increased temperature will increase the water vapour capacity of the air, and more water will be evaporated. The lack of this physical link in our simulations could, to an extent, explain the decreased sensitivity of runoff to 5 temperature changes compared to Vano et al. (2012). In the analysis of Brikowski (2015), sensitivities of runoff to precipitation and temperature are derived from respective historical data. Thus, sensitivity to temperature will also include the changes caused by the interaction of temperature with other meteorological variables. In a study with a different approach, Yang and Yang (2011) separated the effect of precipitation, temperature, net radiation, relative humidity and wind speed on runoff and calculated sensitivities for each 10 variable. They reported values of S to temperature ranging from -0.11 to -0.02 C -1 between 89 catchments of the EAS region. For the same region, we have computed S to temperature as -0.06 K -1 , which is included into the literature stated range. Moreover, our S values for radiation, humidity and wind speed are also in good agreement with Yang and Yang (2011). According to Yang and Yang (2011), S to radiation ranges from -1.9 to -0.3, S to humidity from 0.2 to 1.9 and S to wind speed from -0.8 to -0.1. The range refers to values 15 computed for 89 catchments in the EAS region. Our respective values for this region are -1.53 for radiation, 0.82 for humidity and -0.09 for wind speed. This supports the argument that the large deviations of the sensitivity to temperature between our study and the studies of Vano et al. (2012) and Brikowski (2015), result from interactions in the forcing variables included in the referenced studies.

20
The reported S to radiation values are higher in absolute terms than S to precipitation values for many of the examined regions and also globally (Table 4). However, according to the findings presented in Section 3.5, precipitation and temperature correspond to higher changes in runoff compared to radiation. That is because high S to radiation results from relatively low ΔV values, rather than from relatively high ΔRF values (compared e.g. to precipitation). Small ΔV for radiation is possibly the consequence of combining shortwave 25 and longwave radiation to calculate the total bias in radiation, as the two radiation components have inverse signs for most regions ( Figure 5).

Sensitivity of runoff to specific humidity at high-latitude regions
Although S to humidity for EAS compares well with literature, unexpectedly high values of S to humidity are found for other regions (5.24 for NEU,9.58 for NEE,7.58 for NAS). We performed an extra analysis to 30 investigate this issue and the basic findings are included in Figure 11 and the Supplement of this paper. Figure  resulting runoff. Very high sensitivity of runoff to H is observed for a specific area, the zone between 70N and 40N latitudes. In that zone, a difference of about 10% in H corresponds to an increase of 40% to 60% in runoff. Investigation of the different fluxes related to runoff production in the model revealed two mechanisms that explain this behavior. First, due to higher humidity, the water vapour deficit of the air is reduced and evapotranspiration is decreased, thus allowing more of the precipitated water available as runoff. 5 This mechanism explains around one third of the magnitude of reported changes in runoff ( Figure S4 of the Supplement). The second mechanism happens due to super-saturation of the air, especially during the colder months of the year when the dew point is lower, and includes the condensation and deposition of water vapour (direct transition from vapour to ice). Depositioned water accumulates as snowmass. Snowmass is higher for the raw H run (H has positive biases), which results in increased snowmelt and thus increased runoff ( Figure  10 S5 of the Supplement).
A comparison of super-saturated air conditions for the different sets of data (WFDEI, Raw, BC and NobcH) can help us identify the origin of the aforementioned behavior. From the input specific humidity H, we  Figure 6 of the Supplement of this paper. The analysis reveals that the higher latitude regions -that display high sensitivity of runoff to H-, are under super-saturated conditions for more than 10% 20 of the time ( Figure S6 of the Supplement). The length of supersaturated conditions estimated for the WFDEI, Raw or BC data do not exhibit a respective spatial pattern, although super-saturation is found in all three datasets ( Figure S6 of the Supplement). Thus, the high runoff sensitivity over the high latitude regions is not a result of supersaturated conditions in the raw GCM H and it rather stems from: 1) raw GCM H being higher than BC H and 2) the calculation of relative humidity within JULES, done by combining raw GCM H with 25 bias corrected T and Ps. This inconsistency strengthens the argument for the need of bias correction of more forcing variables -in addition to P and T. Specific humidity is a variable that is often left uncorrected, a practice that could possibly result to runoff overestimations in the northern latitudes based on our findings, in cases that hydrological models which account for deposition and condensation are used.
Since this experiment was performed with a single LSM, it cannot be concluded whether this behavior is 30 common between the LSMs or is an over-sensitivity of the JULES model. However, it highlights the importance of bias correction for specific humidity for specific regions, where runoff would have been highly overestimated using raw specific humidity as forcing.

Study caveats
An issue that must be considered for the interpretation of the results of this study is that they have been based on a single impact model. As the uncertainty stemming from the selection of the impact model is large 5 (Gudmundsson et al. 2012;Hagemann et al. 2013), it is preferable to use multiple models in order to capture a wide range of possible results. The effect of the meteorological forcing on a hydrological output is heavily model dependent, as different models employ different concepts and/or equations for the representation of key hydrological processes. This concern has been also discussed by other single model studies on meteorological variables' effects on hydrological outputs (Mizukami et al. 2014;Masaki et al. 2015). 10 Nonetheless, the results of single model studies are useful in giving indicative answers on the issues they examine and set a basis for the methodology needed for respective multi-model applications.

Summary and conclusions
The present study examined the effect of the biases in GCM output variables on historical runoff simulations, using the JULES LSM. The effects of biases were studied for each forcing variable separately, for a total of 15 six meteorological variables (precipitation, temperature, radiation, specific humidity, surface pressure and wind speed). Biases of each variable and the respective effect of runoff were quantified at the global and regional scale. A framework for the categorization of the effects of biases of the different variables was developed and implemented, leading to global maps of bias ECs.
We found that bias correction of GCM outputs results to substantially improved representation of historical 20 runoff. For this reason, our study adds to the numerous studies that advocate on the use of some kind of bias correction of GCM data prior to their use as impact model forcing. Precipitation and temperature biases were identified to cause the largest changes in runoff. Radiation and specific humidity can also pose a substantial effect on runoff, especially for specific regions. The sensitivity of runoff to the different forcing variables exhibits a high spatial variability. Depending on the region, runoff can be more sensitive to radiation or 25 humidity compared to precipitation or temperature. The produced EC maps show that all variables can potentially affect runoff to a high extent depending on the region. The fraction of the land surface occupied by the high effect category ECI (high changes in runoff and high sensitivity of runoff to the variable's changes) ranges between the variables from 67.80% for precipitation to 6.09% for wind.
The produced maps of ECs aid the identification of the regions mostly affected by the bias of each variable. 30 Thus, they could serve as a decision tool in cases when an informed decision needs to be made on the variables that would need to be bias corrected or could be neglected from bias correction, according the planned model application. Moreover, when raw forcing is used in model applications, EC maps could provide a guidance towards the areas where the results would need a more careful interpretation.
Based on the findings of this study we suggest that the widely used concept of bias correcting precipitation and temperature should be extended to include more input variables. Radiation and specific humidity should 5 be added to the priority variables for bias correction in hydrological applications, along with precipitation and temperature.
Due to the heavily model dependent nature of runoff sensitivity to forcing variables, generalized conclusions for the behavior of other impact models to GCM biases cannot be drawn from the present single model assessment. Nevertheless, this study aims to initiate a discussion on the effect of GCM biases on hydrological 10 output, as the consideration of these sensitivities is crucial to understand the uncertainty spectrum of hydrologically relevant climate change assessments.