Large-scale Hydrological Modelling by Using Modified Pub Recommendations: the India-hype Case

The scientific initiative Prediction in Ungauged Basins (PUB) (2003–2012 by the IAHS) put considerable effort into improving the reliability of hydrological models to predict flow response in ungauged rivers. PUB's collective experience advanced hydrologic science and defined guidelines to make predictions in catchments without observed runoff data. At present, there is a raised interest in applying catchment models to large domains and large data samples in a multi-basin manner, to explore emerging spatial patterns or learn from comparative hydrology. However, such modelling involves additional sources of uncertainties caused by the inconsistency between input data sets, i.e. particularly regional and global databases. This may lead to inaccurate model parameterisation and erroneous process understanding. In order to bridge the gap between the best practices for flow predictions in single catchments and multi-basins at the large scale, we present a further developed and slightly modified version of the recommended best practices for PUB by Takeuchi et al. (2013). By using examples from a recent HYPE (Hydrological Predictions for the Environment) hydrological model setup across 6000 subbasins for the Indian subcontinent, named India-HYPE v1.0, we explore the PUB recommendations, identify challenges and recommend ways to overcome them. We describe the work process related to (a) errors and inconsistencies in global databases, unknown human impacts, and poor data quality; (b) robust approaches to identify model parameters using a stepwise calibration approach , remote sensing data, expert knowledge, and catchment similarities; and (c) evaluation based on flow signatures and performance metrics, using both multiple criteria and multiple variables, and independent gauges for " blind tests ". The results show that despite the strong physiographical gradient over the subcontinent, a single model can describe the spatial variability in dominant hydrological processes at the catchment scale. In addition, spatial model deficiencies are used to identify potential improvements of the model concept. Eventually, through simultaneous calibration using numerous gauges, the median Kling–Gupta efficiency for river flow increased from 0.14 to 0.64. We finally demonstrate the potential of multi-basin modelling for comparative hydrol-ogy using PUB, by grouping the 6000 subbasins based on similarities in flow signatures to gain insights into the spatial patterns of flow generating processes at the large scale.


Introduction
Numerical hydrological models have been used worldwide for operational needs and scientific research since the early 1970s (e.g.Hrachowitz et al., 2013;Pechlivanidis et al., 2011;Refsgaard et al., 2010;Singh, 1995).The Prediction in Ungauged Basins (PUB) initiative of the International Association of Hydrological Sciences (IAHS) was launched in 2003 to improve the reliability of models at ungauged regions, overcome the fragmentation in catchment hydrology, and advance the collective understanding (Sivapalan et al., 2003).PUB highlighted the need to (1) characterise the data and model information content, (2) examine the extent to which a model can be reconciled with observations, and (3) point towards model structural improvements (Gupta et al., 2008).In this regard, several approaches (e.g.multiobjectives, signature measures, information-based metrics, subperiod evaluation) have been applied to reveal significant information about the hydrological systems and indicate perceived model structural errors (Hrachowitz et al., 2013).The use of parameter constraints has also been a significant ad-Published by Copernicus Publications on behalf of the European Geosciences Union.I. G. Pechlivanidis and B. Arheimer: Large-scale hydrological modelling vancement since such an approach can increase model consistency and reliability (Bulygina et al., 2009;Hrachowitz et al., 2014).Constraints are generated by independent information via either additional data, i.e. remote sensing, tracers, quality, multiple-variables, etc. (Arheimer et al., 2011;Finger et al., 2011;McDonnell et al., 2010;McMillan et al., 2012;Samaniego et al., 2011), and/or expert knowledge (Bulygina et al., 2012;Fenicia et al., 2008;Gao et al., 2014).
It is apparent that the PUB community made significant progress towards these scientific objectives; however, the investigations were normally conducted at only one or a limited number of catchments (Hrachowitz et al., 2013).Such an approach is indeed focused on detailed process investigation but is limited when it comes to generalisation of the underlying hydrological hypotheses; to advance science in hydrology, much can be gained by comparative hydrology to search for robustness in hypotheses (Blöschl et al., 2013;Falkenmark and Chapman, 1989).The need to improve process understanding via large sample hydrology has also been highlighted in the new 2013-2022 IAHS scientific initiative named Panta Rhei -Everything Flows (Montanari et al., 2013).
Multi-basin modelling at the large scale complements the "deep" knowledge from single catchment modelling, whilst the current release of open and global data sets has given new opportunities for catchment hydrologists to contribute (Andreassian et al., 2006;Arheimer and Brandt, 1998;Gupta et al., 2014;Johnston and Smakhtin, 2014).Hydrological modelling at the large scale has the potential to encompass many river basins, cross-regional and international boundaries and represent a number of different physiographic and climatic zones (Alcamo et al., 2003;Raje et al., 2013;Widén-Nilsson et al., 2007).Application of multi-basin modelling at the large scale can be used to predict the hydrological response at interior ungauged basins (Arheimer and Lindström, 2013;Donnelly et al., 2015;Samaniego et al., 2011;Strömqvist et al., 2012).The use of a large sample of gauges can also allow for exploration of emerging patterns (e.g.climate change impacts) and facilitate comparative hydrology allowing to test hypotheses for many catchments with a wide range of environmental conditions (Blöschl et al., 2013;Donnelly et al., 2015;Falkenmark and Chapman, 1989).
Modelling at the large scale, however, includes additional model uncertainties.Physical properties (e.g.topography, vegetation, and soil type) in large systems generally show higher spatial variability and thus larger heterogeneity in system behaviour (Coron et al., 2012;Sawicz et al., 2011), which in turn affects model parameters (Kumar et al., 2013).In addition, large river basins are often strongly influenced by human activities, such as irrigation, hydropower production, and groundwater use, for which information is rarely available at high resolution in global databases.This introduces additional uncertainty regarding process understanding and description at the large scale.Moreover, the topographic and forcing data of global data sets (i.e.water divides, weather and climatic data) are more likely to be inconsistent, erroneous, and/or only available at a coarse resolution (Donnelly et al., 2013;Kauffeldt et al., 2013).
Applying catchment models at the continental scale in a multi-basin manner is a way to introduce catchment modelling approaches to the existing global hydrological models, i.e. land-surface schemes and global water-allocation concepts.In this paper, we pose the following scientific questions: (1) to what extent are the PUB recommendations for catchment scale also relevant for hydrological modelling at the large scale?(2) How have the scientific advancements during the PUB decade improved the potential for processbased hydrological modelling at the large scale?To address these questions, we (a) identify specific challenges at the large scale (uncertain/erroneous basin delineation and routing, errors in global data sets, human impact; i.e. reservoirs/dams) and exemplify how to overcome them, (b) further develop and modify the PUB best practices to be applicable at the large scale, (c) illustrate the improvement on parameter identification by using remote sensing data and expert knowledge, (d) cluster catchments based on physiographic similarity and their hydrological functioning, (e) ensure model reliability using flow signatures and temporal variability of multiple modelled variables, (f) detect links between model performance and physiographical characteristics to understand model inadequacies along the gradient, and finally (g) discuss how process understanding can benefit from multi-basin modelling and what hydrological insights can be gained by analysing spatial patterns from largescale predictions in ungauged basins.We use examples from the recent HYPE (Hydrological Predictions for the Environment) model set-up of the Indian subcontinent, which experiences unique and strong hydroclimatic and physiographic characteristics and poses extraordinary scientific challenges to understand, quantify, and predict hydrological responses.
2 Best practices from PUB when modelling multi-basins at the large scale Takeuchi et al. (2013) recommend a six-step procedure for predicting runoff at locations where no observed runoff data are available (Fig. 1a).This best practice recommendation is intended for single catchments and requires modification when applied to multi-basins at the large scale (Fig. 1b).Big data sets are subject to uncertainty and identification of errors is usually time-consuming.Analysing each data set or catchment may be impractical and incurs the risk focusing on details instead of the most crucial overall hydrological functioning across the model domain.We therefore recommend starting with a top-down approach in which the model is set up directly before proceeding with the PUB recommendations (circle of steps in Fig. 1).The hydrological model needs to include the description of most water fluxes, storage, and anthropogenic influences that can be relevant and satisfy the modelling objectives.In addition, we recommend using a model that is familiar to the modeller and open for changes, allowing coherent hydrological interpretations and code adjustments to cope with the region's spatial heterogeneity and hydrological features.Setting up the model system includes (i) acquiring readily available data sets that cover the entire geographical domain or merge data sets to get a full coverage; (ii) defining calculation points and river network, by taking into account the location of gauges, major landscape features, user requests, catchment borders and routing; and (iii) making a first set of model input-data files and making the first model run for the model domain with a multi-basin resolution.The analysis of results from the first model run will indicate major obstacles, such as systematic errors in input data or model structural limitations.Moreover, by having the technical system in place immediately facilitates an incremental and agile approach to model set-up, with direct feedback on model performance at many gauges.We then recommend starting to improve the performance according to the six steps of best practices for predictions in ungauged basins, using a bottom-up approach to refine input data, model structure, and parameter values.

Read the landscape
Go out to your catchment, look around. . .! (Blöschl et al., 2013, p. 385) It is practically impossible to visit all of the basins in a large-scale domain, so instead we recommend to (i) navigate on hard-copy maps, digitised maps, and on the Web (e.g.Google Earth) to check landscape characteristics; (ii) review the literature for dominant processes and well-known features or hydrological challenges in the region; (iii) proceed with quality checks and cross-validations with other data sources (i.e.sources that are limited in space but contain local information); (iv) validate the basin delineation and routing using archived metadata from other available data sets; (v) check the quality of observed discharge data to assure coherence of time series; and, finally, (vi) check the spatiotemporal information of meteorological data sets after transformation from the grid to the subbasin scale.It is important to get an understanding of the entire domain and ensure that the data sets correspond to this understanding, hence tackling systematic errors in the data.

Runoff signatures and processes
Analyse all runoff signatures in nearby catchments to get an understanding. . .! (Blöschl et al. 2013, p. 385) Detailed inspection of flow signatures for each gauging station, from large data sets (often in the range a thousand stations; see http://hypeweb.smhi.se/), is best done by using clustering techniques to discover spatial similarities (Sawicz et al., 2011).It is then important to use many flow signatures for each site to fully capture the characteristics of the hydrographs.We also recommend searching for statistical relationships between the observed flow signatures and basin characteristics (both physiography and human alteration) across the model domain.This will increase our understanding of the dominant processes and fitness of the model structure (Donnelly et al., 2015).

Process similarity and grouping
. . .find similar gauged catchments to assist in predicting runoff in the ungauged basin!(Blöschl et al. 2013, p. 385) In most process-based models, the modeller has some freedom to define the characteristics of the smallest calculation units, which are normally linked to physiography to account for spatial distribution of for instance soil properties or land use.When producing these calculation units both technical (e.g.computational efficiency) and conceptual (e.g.restrictions with the number of classes) concerns must be taken into account.However, lakes, wetlands, glacier, and urban areas should be respected since even small proportions can significantly alter the flow regime.When calculation units are defined, we recommend clustering the basins/gauges with similar upstream characteristics and/or system behaviour to isolate key processes for regionalisation of parameter values during calibration.We finally suggest checking the spatial distribution by plotting the catchment characteristics of subbasins on maps and comparing them to the original or other data sources.

Quality checks
This is an additional step in the procedure accounting for repetition of steps 1-3 in an iterative way to ensure quality in the required input data and files of the model prior to parameter tuning (Fig. 1); it is easy to make mistakes and introduce errors when handling large data sets with automatic scripts (the generalisation of scripts is not always straightforward and some manual adjustment is usually required) and/or by human error (particularly when many modellers collaborate), which can lead to erroneous assumptions on hydrological processes during calibration.We recommend analysing flow time series as follows: (i) compare modelled to observed time series and signatures, (ii) check water-volume errors and their distribution in space, (iii) inspect the spatial distribution of model dynamics to correct spatial patterns from systematic errors, and (iv) search for errors in the model set-up (routing, meteorological input, etc.).catchments. . .more information than the hydrograph. . .! (Blöschl et al., 2013, p. 385) Here, it is crucial that the model structure represents the modeller's perception of how the hydrological system is organised and how the various processes are interconnected.For the model set-up to be "right for the right reasons", we recommend to: (i) constrain relevant parameters to alternative data rather than just to time series of river discharge (e.g.snowmelt parameters to snow depths, evapotranspiration parameters to data from flux towers and satellites) or select a subset of gauges representing different flow generating processes; (ii) apply expert knowledge when analysing internal variables to ensure that the model structure reflects the understanding of flow paths and their interconnections; (iii) change the model algorithms or structure if tuning of parameters is not enough to reflect the perception of the hydrological system; (iv) include specific rating curves of lakes and reservoirs wherever available, and tune parameters for irrigation and dam regulation to fit the flow dynamics at downstream gauges; and (v) if possible assimilate observed data, e.g.snow, upstream discharge, and regulation rules in reservoirs.

Hydrological interpretation
Interpret the parameters. . .and justify their values against what was learnt during field trips and other data. . .! (Blöschl et al., 2013, p. 385) Although, hydrological interpretation has been present in every step of the model set-up procedure described here, this step includes the overall synthesis and analysis of results both at the large scale and for single catchments in the multi-basin approach.For spatial interpretation, we recommend plotting maps with multi-basin outputs for several variables, performance criteria, and signatures across the model domain.This allows checking the model's coherency at various landscape features, e.g.spatial patterns of vegetation, geology, climate, population density, and human alterations.The objective is to understand the drivers that influence flow, find rational reasons behind the hydrological heterogeneity, and identify knowledge gaps or model limitations.For temporal interpretation, we recommend plotting time series for some basins in each group of similar landscape units and catchment response.This is to make sure that the model reflects our perception and assists to better understand the dominant drivers of the flow generation processes and water dynamics in the region.

Uncertainty -local and regional
. . .by combining error propagation methods, regional cross-validation and hydrological interpretation. . .! (Blöschl et al., 2013, p. 385) Multi-basin models are more computationally demanding than single-basin models and it is therefore not always feasible to explicitly address all uncertainties from all sources.To explore the model performance in ungauged basins, we recommend dividing the set of gauging stations into those used in calibration and independent "blind-tests", respectively.Cross-validation, e.g. using the jackknife procedure (Good, 2006), is practically difficult in process-based modelling of multi-basins.To examine uncertainties we recommend to (i) use several performance (diagnostic) criteria and many flow signatures, (ii) relate the spatial distribution of model performance to physiographical variables, and (iii) check model performance for independent gauging sites and new data sets.
The major spatiotemporal deviations found between modelled and observed data should be the focus for the next round in the cycle of steps for better predictions.We recommend reading the landscape and searching for local knowledge, again, to elaborate new hypotheses of hydrological functioning and data sources.We also recommend documenting and version-managing each model set-up before looping into step 1, to ensure knowledge accumulation for a broader audience and to make the set-up process transparent.This sets a baseline for the next round of improvements.

Study area and data description
India is considered the seventh largest country by area and the second-most populous country with over 1.2 billion people.The country covers an area of about 3.3 million km 2 and some of its river basins expand over several countries in the area (i.e.China, Nepal, Pakistan, and Bangladesh; see Fig. 2).The spatiotemporal variation in climate is perhaps greater than in any other area of similar size in the world.The climate is generally strongly influenced by the Himalayas and the Thar Desert in the northwest, both of which contribute to drive the summer and winter monsoons (Attri and Tyagi, 2010).Four seasons can be distinguished: winter (January-February), pre-monsoon (March-May), monsoon (June-September), and post-monsoon (October-December).The temperature varies between seasons ranging from mean temperatures of about 10 • C in winter to about 32 • C in the pre-monsoon season.In terms of spatial variability, the rainfall pattern roughly reflects the different climate regimes of the country, which vary from humid in the northeast (rainfall occurs about 180 days year −1 ), to arid in Rajasthan (20 days year −1 ).Accordingly, river flow shows large spatial and seasonal variability across the subcontinent (Fig. 2b), e.g. the Ganges River has an intra-annual amplitude in monthly river discharge of 50 000 m 3 s −1 .
For the hydrological model set-up, we use global data sets to extract the input data (see Table 1).APHRODITE (Asian Precipitation -Highly Resolved Observational Data Integration Towards Evaluation of the Water Resources; Yatagai et al., 2009Yatagai et al., , 2012) ) and AphroTEMP (Yasutomi et al., 2011) are the only long-term continental-scale data sets that contain a dense network of daily data (here only daily precipitation and mean temperature are required) for Asia including the Himalayas.Data of land use and soil type were aggregated into fewer classes than in the original databases.Discharge data are available from the Global Runoff Data Centre (GRDC) at 42 sites, limited to monthly values in the period 1971-1979.More discharge data are held in the Indian government agencies but are not accessible to the public.Consequently, in this application, flow information (Table 2) is available only for a small fraction of the subcontinent, which makes the region a great example for PUB.Monthly potential evapotranspiration (pot.E) data were obtained for the period 2000-2008 from the Moderate Resolution Imaging Spectroradiometer (MODIS) global data set (Mu et al., 2007(Mu et al., , 2011)).The data set covers the domain in a spatial resolution of 1 km and is derived based on the Penman-Monteith (Penman, 1948) approach.
Water divides and catchment characteristics were appointed for each subbasin by using the World Hydrological model Input Set-up Tool (WHIST; http://hype.sourceforge.net/WHIST/).This is a spatial information tool from SMHI (Swedish Meteorological and Hydrological Institute) to transform data and create input files for hydrological models, from different types of databases.From the information of topographic databases, for example, WHIST can delineate the subbasins and the linking (routing) between them.This is also the tool for allocating information of soil, vegetation, surface water, regulation and irrigation to each calculation unit.For the Indian subcontinent, we chose to work with some 6000 points for calculations of runoff in the river network (i.e.6000 subbasins).

A multi-basin hydrological model for large-scale applications -the HYPE model
The HYPE model is a dynamic rainfall-runoff model which describes the hydrological processes at the catchment scale (Lindström et al., 2010).The model represents processes for snow/ice, evapotranspiration, soil moisture and flow paths, groundwater fluctuations, aquifers, human alterations (reservoirs, regulation, irrigation, abstractions), and routing through rivers and lakes.The HYPE source code is continuously developed and released in new versions for open access at http://hype.sourceforge.net/,where also model descriptions, manuals, and file descriptions can be downloaded.HYPE is most often run at a daily time step and simulates the water flow paths in soil for hydrological response units (HRUs), which are defined by gridded soil and landuse classes and can be divided in up to three layers with a fluctuating groundwater table.The HRUs are further aggregated into subbasins based on topography.Elevation is also used to get temperature variations within a subbasin to influence the snowmelt and storage as well as evapotranspiration.Glaciers have a variable surface and volume, while lakes are defined as classes with specified areas and variable volume.Lakes receive runoff from the local catchment and, if located in the subbasin outlet, also the river flow from upstream subbasins.On glaciers and lakes, precipitation falls directly on the surface and water evaporates at the potential rate.Each lake has a defined depth below an outflow threshold.The outflow from lakes is determined by a general rating curve unless a specific one is given or if the lake is regulated.Regulated lakes and man-made reservoirs are treated equally but a simple regulation rule can be used, in which the outflow is constant or follows a seasonal function (as it is often the case with hydropower) for water levels above the threshold.A rating curve for the spillways can be used when the reservoir is full.
Irrigation is simulated based on crop water demands calculated either with the FAO-56 crop coefficient method (Allen et al., 1998) or relative to a reference flooding level for submerged crops (e.g.rice).The demands are withdrawn from  rivers, lakes, reservoirs, and/or groundwater within and/or external to the subbasin where the demands originated and are constrained by the water available at these sources.After subtraction of conveyance losses, the withdrawn water is applied as additional infiltration to the irrigated soils.The agriculture and irrigation data sets (see Table 1) are used to define irrigated area, crop types, growing seasons, crop coefficients, irrigation methods and efficiencies, and irrigation sources.The irrigation parameters regulating water demand and abstraction are usually manually calibrated using discharge stations in irrigation-dominated areas.
River discharge is routed between the subbasins along the river network and may also pass subbasins, flow laterally in the soil between subbasins or interact with a deeper groundwater aquifer in the model.For the study in this paper, the HYPE model version 4.5.0 was set up for the entire Indian subcontinent (4.9 million km 2 ) with a resolution of 6010 subbasins, i.e. on average 810 km 2 , and is referred to as India-HYPE version 1.0.

Model calibration and regionalisation
The calibration objective was to derive a reliable model adequately representing the temporal dynamics of flow (high flows, timing, variability, and volume) across the Indian river systems.With such a model set-up, we can identify spatial patterns of hydrologic similarity across the subcontinent and also analyse impacts of environmental change on water resources.The HYPE model has many rate coefficients, constants, and parameters which in theory can be adjusted, but in practice about 20 of them are tuned during calibration.Many of the parameters are linked to physiographic characteristics in the landscape, such as soil type and depth (soil depen- dent parameters) or vegetation (land-use-dependent parameters), while others are assumed to be general to the entire domain (general parameters) or specific to a defined region or river (regional parameters).Parameters for each HRU are calibrated for representative gauged basins and then transferred to similar HRUs, which are gridded at a higher resolution than the subbasins across the whole domain to account for spatial variability in soil and land use.Using the distributed HRU approach in the multi-basin concept is thus one part of the regionalisation method for parameter values.Some other parameters, however, are either estimated from literature values and from previous modelling experiences (a priori values) or identified in the (automatic or manual) calibration procedure.Slightly different methods for regionalisation of parameter values have been used when setting up the different HYPE model applications, depending on access to gauging stations, additional data sources, and expert knowledge.
The following procedure was used for India-HYPE v.1.0.

Stepwise, iterative calibration of parameter groups
To tackle, to a certain extent, the equifinality problem in this processed-based model, the parameters (general, soiland land-use-dependent, specific or regional) are calibrated in a progressive way, i.e. stepwise calibration (Arheimer and Lindström, 2013) using different subsets of the gauging station in each step.In this way, errors induced by inappropriate parameter values in some model processes are not compensated for by introducing errors in other parts of the model.Hence, groups of parameters responsible for certain flow paths or processes (e.g.soil water holding capacity) are calibrated first and then kept constant when the second group of parameters (e.g.river routing) is calibrated.However, stepping downstream along the model code includes some reconsideration about chosen parameter values in an iterative procedure.For each step and group of parameters, a subset of representative gauging stations is used in simultaneous calibration, which means that no gauging station is calibrated individually.This is to get parameters that are robust also for ungauged basins.Model performance in specific sites is thus traded against average performance across the full model domain or regions.
For the Indian subcontinent, the following groups of HYPE parameters were calibrated stepwise: (i) general parameters (e.g.precipitation and temperature correction factors with elevation), which significantly affect the water balance in the system, snowpack and distribution, and regional discharge; (ii) soil-and land-use-dependent parameters (e.g.field capacity and rate of potential evapotranspiration), which can influence the dynamics of the flow signal, groundwater levels, and transit time; (iii) regional parameters, which are applied as multipliers to some of the general soil and landuse parameters and may be seen as downscaling parameters as they compensate for the scaling effects and/or other types of uncertainty.The multipliers are either specific for a region or a river basin.

Expert knowledge for parameter constraints
During this progressive stepwise calibration approach, constraints based on expert knowledge and basin similarity are introduced.As an example, we apply a constraint imposed on the mactrsm soil dependent parameter (mactrsm is the threshold soil water for macropore flow and surface runoff).In the first run, during the calibration procedure the parameter is allowed to vary freely within the parameter range and all distributions for the soil types are acceptable (unconstrained sets).We then apply expert knowledge on the parameter distribution and agree that a model will only be retained as feasible if it can satisfy the following constraint: mactrsm Coarse >mactrsm Medium >mactrsm Fine .
The mactrsm values for the remaining two soil types in the India-HYPE model domain, i.e. organic and shallow, are expected to be close to the corresponding values for the coarse soil, although the value for shallow soil is constrained to be less than mactrsm for organic soils.

Spatial clustering based on catchment similarities
We assume hydrologic similarity across the region on the basis of similarity in physiographic characteristics.We applied a k-means clustering approach within the 17-dimensional space, consisting of five soil types, seven land-use types, mean annual precipitation, mean temperature, mean slope, mean elevation, and basin area.This separated the subbasins into homogeneous classes.A silhouette analysis was used to overcome the subjectivity on the determination of the number of clusters.The catchment similarity approach significantly reduces the number of parameters, while it allows for regionalisation of parameters, which are assumed to be robust enough also for ungauged basins.

Spatiotemporal calibration and evaluation
India-HYPE was calibrated and evaluated in a multi-basin approach by considering the median performance in all selected stations.Thirty stations were selected for model calibration and 12 "blind" stations for spatial validation.The years 1969-1970 are used as a model warm-up period, the next 5 years for model calibration (1971)(1972)(1973)(1974)(1975) and the final 4 years for temporal performance evaluation (1976)(1977)(1978)(1979).
The differential-evolution Markov-chain (DE-MC; Ter Braak, 2006) optimisation algorithm is used to explore the feasible parameter space and to investigate parameter sensitivity.DE-MC was applied at each step of the iterative calibration procedure (to optimise the general, soil-and landuse-dependent, and regional parameters) with 200 generations of 100 parallel chains each being explored.The Kling-Gupta efficiency, KGE (Gupta et al., 2009), was used to define the performance of the model towards the observed discharge.KGE allows for a multi-objective perspective by focusing on separately minimising the correlation (timing) error, variability error, and bias (volume) error.We also investigated the relative influence of timing, variability and volume error on the KGE value.To do this, we transformed the three components to result into a consistent range of possible values (the metrics are named cc, alpha and beta corresponding to timing, variability and volume errors, respectively; see Appendix A).

Evaluation based on flow signatures
The model was further evaluated on its ability to capture spatial and temporal variability in discharge by comparing modelled flow signatures and monthly simulations with observed data.Here, three flow signatures are calculated for each gauging station to illustrate the different aspects of the flow variability and the hydrograph characteristics (Appendix A): the mean annual specific runoff (Qm, mm yr −1 ), the normalised high flow statistic (q05, -), and the slope of the flow duration curve (mFDC, -).

Multi-variable evaluation
To judge model credibility, observed variables other than river discharge are used, for instance from satellite products.For India-HYPE, these included evaluations against the estimated areal extent of snow and the snow water equivalent from the GlobSnow system and potential evapotranspi-ration (pot.E) from the MODIS system.The assumption is that MODIS pot.E can be used as reference to calibrate the HYPE parameters that control pot.E; this refers only to the cevp land-use-dependent parameter, which is a coefficient of potential evapotranspiration (mm d −1 • C −1 ) (Lindström et al., 2010).The cevp parameter was optimised for each land-use type so that the HYPE modelled annual pot.E matches the MODIS annual pot.E at the entire model domain.A Monte Carlo uniform random search was used to explore the feasible cevp parameter space (constant for each land-use type; 0.15-0.30)and to investigate parameter identifiability and interdependence (10 000 samples).The root mean square error (RMSE) and absolute bias (Bias) were used as objective functions in this analysis; 0 values indicate a perfect model with no errors for both criteria.Note that the analysis was conducted in the 2000-2008 period during which MODIS data were available.We therefore assume that the cevp parameter is static in time and representative also for the 1971-1979 period.

Linking performance to physiographical characteristics
To better understand the model performance and identify potential for model improvements, we apply classification and regression trees (CART; Breiman et al., 1984).CART is a recursive-partitioning algorithm that classifies the space defined by the input variables (i.e.physiographic-climatic characteristics) based on the output variable (i.e.KGE model performance).The tree consists of a series of nodes, where each node is a logical expression based on a similarity metric in the input space (physiographic-climatic characteristics).In this case, we divided the KGE performance into three groups, bad (KGE < 0.4), medium (0.4 < KGE < 0.7), and good (KGE > 0.7), which were termed C0, C1, and C2, respectively.A terminal leaf exists at the end of each branch of the tree, where the probability of belonging to any of the three output groups can be inspected.Here we summarised the physiographic-climatic characteristics of the basin into five soil types (coarse, medium, fine, organic, and shallow), seven land-use types (crops, forest, open land with vegetation, urban, bare/desert, glacier, and water), mean annual precipitation, and mean temperature.

Catchment functioning across gradients
We finally explored the spatial runoff patterns across the entire subcontinent by analysing the flow characteristics in all 6000 catchments modelled.Here, we used the modelled discharge and calculated 12 flow signatures for each subbasin (see Appendix A): mean annual specific discharge (mm yr −1 ), range of Pardé coefficient (-), slope of FDC (-), normalised low flow (-), normalised high flow (-), coefficient of variation (-), flashiness defined as 1−autocorrelation (-), normalised peak distribution (-), rising limb den-Hydrol.Earth Syst.Sci., 19, 4559-4579, 2015 www.hydrol-earth-syst-sci.net/19/4559/2015/ sity (-), declining limb density (-), long-term mean discharge (m 3 s −1 ), and normalised relatively low flow (-).We then applied a k-means clustering approach within the 12dimensional space (consisting of the 12 calculated flow signatures) to categorise the subbasins based on their combined similarity in flow signatures.Through the mapping of the spatial pattern we gained insight into the similarities of catchment functioning and could identify the dominant flow generating processes for specific regions.To further highlight the hydrological insights gained during model identification, we conducted the clustering analysis on two different steps of the model calibration and explored the sensitivity of calibration on the spatial patterns of flow signatures.

Results and discussion
The very first model set-up to establish a technical model infrastructure of the Indian subcontinent showed very poor model performance, with an average and median KGE for all stations of −0.02 and 0.0, respectively.This performance was expected while it set the baseline for further improvements following the six steps of the modified PUB best practices.

Read the landscape
Background knowledge was firstly acquired via visual and/or numerical analysis of available maps that describe the spatial patterns of land use, soil, and climate, as well as via the study of the scientific literature on regional hydrological investigations, which enabled identification of dominant physical processes and flow paths.Such soft information was useful for turning processes on/off and selecting relevant algorithms, e.g. for management and snow melting.Communication with local scientists (i.e.governmental hydrological institutes), managers (i.e.regional water authorities) and endusers (i.e.agricultural sector) enabled knowledge exchange and justified the model approach.Three extensive field trips provided important soft information about system behaviour in the semi-arid northwest and humid subtropical northeastern parts of the country (i.e.identification of irrigation water sources for agricultural needs and estimation of water losses due to faults in the irrigation systems).Analyses of the topographic data were of major importance since they affected the subbasin delineation and routing.Although HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) is based on high-resolution elevation layers, which are hydrologically conditioned and corrected, there are still many errors.Merging HydroSHEDS with GRDC (hence forcing the delineation at subbasins where GRDC stations are available) involved some mismatches in terms of the size of upstream areas between the subbasin delineations and the GRDC metadata.As an example, the location of the Dandeli station in the Kali Nadi River basin (asterisk 1 in Fig. 2) was adjusted to match the underlying topography and drainage accumulation data based on published and computed upstream areas, respectively (see Fig. 3a).The consequent change in the routing resulted in a considerable improvement in the model performance (KGE improved from −0.51 to 0.30; see Fig. 3b).Many similar corrections had to be made.
To make corrections also for ungauged basins and major rivers, the delineated basins were additionally evaluated using a shapefile of basin areas reported by Gosain et al. (2011).Some minor corrections had to be done in the routing to achieve similarly delineated basins, particularly in the northwestern region, where mean elevation at the subbasin scale does not show much variability.

Runoff signatures and processes
As recommended, several flow signatures were extracted from the gauging stations across India to be compared to physiographical patterns.Flow signatures were also used for model evaluation to find potential for improvements.The analysis was done at different stages in the model set-up and, finally, there was a relatively good agreement of the observed and modelled flow signatures (Fig. 4).In general, poor agreement was found in mountains and in semi-arid regions, which are characterised by local, convective rainfall events during the monsoon season.No clear pattern is found between signature agreement and basin scale for calibrated river gauges.
We also explored how flow signatures can be affected by human impacts by analysing modelled responses considering and omitting the human influence.Figure 5 highlights the significant effect reservoirs have to dampen hydrographs and control discharge variability, hence various flow signatures.The model can represent fairly well the reservoir routing and KGE improved from 0.37 to 0.48 after introducing a regulation scheme.The model improved on capturing the seasonality of regulation; however, at this modelling state it was not able to represent the monthly peaks.Note that model results are subject to the general rating curve generalised to all reservoirs; there were no downstream data available to calibrate the parameters specifically for a given reservoir/dam.

Process similarity and grouping
After having identified relevant HRUs, reclassified them into suitable calculation units, and inserted major features as lakes and dams, we identified basin similarities to drive the identification of the model's regional parameters.The cluster analysis was applied to all 6 010 subbasins of the domain within the 17-dimensional space (see Sect. 3.3).We identified 13 different classes of varying size (Fig. 6) out of 42 values, which is the number of gauged river basins in the domain, yet with relatively high class strength (i.e. the variability of characteristics within each cluster is relatively low).It is important to note that the physiographic (soil and land use)  characteristics had more influence on the clustering as opposed to the climatic properties; the clustering was repeated without climatic information but the spatial pattern of the clusters remained.In the last stage of the stepwise calibration procedure, the regional model parameters were estimated for each cluster region.Using the clustering for regional calibration (Sect.5.4), however, could not significantly improve the overall model performance but, nevertheless, the model consistency at all stations was improved.Overall, we found a high number of potential catchment similarity concepts to drive parameter identification in the ungauged basins.

Quality checks
Steps 1-3 of our best practices were performed in an iterative procedure including checking against independent data sources that resulted in the reconsideration of assumptions and corrections of input data.For instance, the proportion of each land-use type driven by GLC2000 was calculated and compared to soft information from official governmental reports.According to GLC2000 11 % of the country is forest, which contradicts the estimated 22 % based on reports from the Ministry of Water Resources (India-WRIS, 2012; River Basin Atlas of India, RRSC-West, NRSC, ISRO, Jodpur, India).To address this, forest information from the Global Irrigated Area Mapping (GIAM; Thenkabail et al., 2009) was merged with GLC2000.Although the proportion of forest areas was corrected, this merging consequently changed the proportion of open land with vegetation and crops from 14 and 68 % to 12 and 59 %, respectively.
In addition, several modelled and observed flow signatures were compared repetitively at every stage of model refinement.We found it valuable to adjust as much as possible before starting to work on parameter values and model algorithms.For instance, the analysis of flow time series and signatures during the first model runs showed consistent underestimation of runoff in the Himalayan-fed basins.A comparison of the mean annual precipitation between APHRODITE and national precipitation gridded data provided by the In-  dian Meteorological Department showed an underestimation of the APHRODITE precipitation in the mountainous regions: the APHRODITE precipitation network is sparse over the Himalayas (Yatagai et al., 2012).To overcome this underestimation, a correction factor was applied to precipitation (in HYPE, this was a multiplier of 4 % per 100 m) at regions with elevation greater than 400 m.By allowing such modification in the data, we expected that the calibration of model parameters could further compensate for precipitation uncertainty.

Model -right for the right reasons
When setting up India-HYPE we considered realism in the process calculations by using parameter constraints.We did not have to adjust the model structure and we did not assimilate data or rating curves as we did not have access to such observations.

Additional data sources
The calibration of the pot.E model routine against the MODIS pot.E data resulted in a well-identified cevp value for most land-use types.Analysis of the Monte Carlo results presents an initial screening of parameter sensitivities (Fig. 7).Results show that the different objective functions extract different information from the pot.E spatial pattern.As expected, cevp values for crops, forest, and open land with vegetation types are the most sensitive to both objective functions, since these land-use types dominate the region (60, 23, and 11 % of India, respectively) and hence significantly affect pot.E. Overall, India-HYPE was lower in pot.E at the arid regions and over the Himalayas (on average by 15 %), whereas it was higher in pot.E along the western and eastern coastlines (on average by 12 %).Although the two estimates do not fully match, the use of additional information to constrain parameters (hence constraining the model's results for specific processes) is promising.However, the uncertainty of MODIS results was not examined and more data sources should be included.

Expert knowledge
Expert knowledge was applied to filter out unrealistic relationships of the mactrsm parameter for different soil types (see Sect. 3.3).Both the constrained and unconstrained mod-0" 0.5" 1" Coarse" Medium" Fine" Organic" Shallow" mactrsm" Constrained" Unconstrained" els resulted in a comparable calibration performance: median KGE was 0.48 and 0.49 for the constrained and unconstrained models, respectively.The optimum set for the unconstrained model gave an unrealistic distribution of the parameter values for the coarse and medium soil types (Fig. 8).However, the optimum values are within the parameter range defined in the constrained calibration approach.The slight increase is due to the free calibration parameters whose values and/or distributions are allowed to compensate for errors/uncertainties at other processes.In such cases it is important to select the model which performs well and respects the theoretical understanding of the system.This illustrates the value of the recommendations to constrain parameters based on expert knowledge -the right model for the right reason.

Stepwise calibration procedure
The predictability of the model with prior parameter values was very poor (Fig. 9), highlighting the limitations when parameters are regionalised from a donor system of strongly different hydroclimatic characteristics (e.g.Sweden).A significant improvement in the performance is achieved in both the calibration and the evaluation period after the calibration of the general parameters due to a better representation of the water volume in the rivers (beta in KGE improved from 0.51 to 0.78).Calibration of the soil and land-use parameters further improved the overall performance; however, KGE slightly decreased at the basins in which the model performed poorly during the previous calibration step.Using the clusters based on catchment similarities for regional calibration did not significantly improve the overall model performance; however, the model consistency at all stations was improved in both calibration and evaluation periods.

Hydrological interpretation
The temporal interpretation was done by analysing interacting dynamics of internal model variables, i.e. precipitation (P , mm), snow depth (SD, mm), temperature (T , • C), evapotranspiration (E, mm), soil moisture deficit (SMDF, mm), and discharge (Q, m 3 s −1 ).These are checked visually in a set of validation basins, to avoid unrealistic model behaviour due to parameter setting.Results from this point onwards correspond to the calibrated India-HYPE model (after step 3 in Fig. 9).Results in the Chenab River at the Akhnoor station (branch river of the Indus system; asterisk 3 in Fig. 2) show that the snowmelt characterises the monthly hydrograph (Fig. 10).Snow accumulation/melting processes occur at the headwaters of the basin which experience T below 0 • C during the winter and pre-monsoon period and above 0 • C during the rest of the months ("Up" black-dashed T series in Fig. 10).P also varies in space while it exhibits strong seasonal variability according to the location ("Up" blackline and "Down" blue envelope in the P series).Spatiotemporal analysis of P allows for a better understanding of the snow depth temporal distribution; in the model, snow depth increases when precipitation occurs and temperature is below 0 • C. Given the model's evapotranspiration module, potential E varies depending on mean temperature.However the distribution of actual E is dependent on the water availability in the soil, which further justifies the strong (negative) correlation between actual E and SMDF.
For spatial interpretation of flow predictions, we investigated potential relationships between model performance and physiographic-climatic characteristics; hence identify the controls of poor model performance.sightful and show that poor performance (KGE < 0.4) is generally achieved at basins with shallow soil type greater than 13 %.The probability of obtaining poor performance is also highest for basins with medium soil type greater than 34 % and precipitation less than 1038 mm.Consequently, emphasis should be given to parameters for medium and shallow soils in a future effort to improve the model performance.

Uncertainty -local and regional
The India-HYPE model was calibrated and validated in space and time, and the overall model performance (at the end of the stepwise approach) in terms of KGE (Gupta et al., 2009) and its decomposed terms is presented in Table 3. India-HYPE achieved an acceptable performance and is therefore considered adequate to describe the dominant hydrological processes in the subcontinent.However, the performance decreased (from KGE = 0.64 to KGE = 0.44) when the model was evaluated for gauges, which are independent both in space and time.This shows that the model still needs improvements to be equally reliable for predictions in ungauged basins at independent time periods.The decomposed KGE terms show that the model, during the validation period and for the validation stations, cannot fully capture the variability of the observed data (described by the alpha term).Alpha decreases during the validation period at the validation stations from 0.78 to 0.58, which consequently affects the KGE values.However, other flow characteristics, i.e. timing and volume, are well represented also during the validation period.
To search for major uncertainties and potential for improvements, we finally analyse the model performance in both the calibration and validation stations across the domain.The ability of the model to reproduce the monthly variability of discharge varies regionally as shown by the KGE (Fig. 12).Performance is generally poor in the mountainous and semi-arid regions (western and eastern Himalayas and northwest India, respectively).The Indian River basins are also regulated, limiting the model's predictive power; regulation strategies are irregular and difficult to reproduce.The  (1976)(1977)(1978)(1979) 0.44 0.84 0.58 0.75 KGE's decomposed terms (cc, alpha and beta) can reveal the causes for the model errors.For example, the poor performance at the Indus river system (northern India) is due to the poor representation of the observed variability of discharge, which is probably related to parameterisation in the model's snow accumulation/melting component.In addition, a mass volume error seems to be the main cause of the poor KGE performance in the southwestern rivers.This seems to be due to the underestimation of precipitation and/or overestimation of actual evapotranspiration: a comparison of APHRODITE data against precipitation data from the Indian Meteorological Department showed underestimation of precipitation in this region.Conclusions are similar for the stations used in the calibration and validation analysis, hence justifying the model's spatial consistency in the region.

Spatial flow pattern across the subcontinent and dominant processes
Although the India-HYPE model has limitations, we identified potential for further improvements during the set-up procedure.The present version has demonstrated the usefulness of multi-basin modelling for comparative hydrology and how to gain insights in spatial patterns of flow generating processes at the large scale.The final clustering analysis of the 12 flow signatures from India-HYPE version 1 resulted in six different classes of varying size (Fig. 13) with different distributions in the signatures (Fig. 14).Similarity in catchment behaviour for each class was interpreted and dominant flow generating processes could be distinguished as follows.
Catchments in cluster 3 are located in the Himalayan region and in the western Indian coast (Western Ghats) and are characterised by high ranges of annual specific runoff (Qm) due to high precipitation occurring in these regions, and variable flow regime (high mFDC).Variability is dependent on snow/ice processes which are important in controlling the flow regime, at least in the Himalayan region (cf.annual cycle in the Indus River in Fig. 2).Flow is also characterised by high rising and declining limb densities (RLD and DLD).The climate in catchments of cluster 3 is humid subtropical and tropical with high evapotranspiration.Catchments in the northwestern part of India (cluster 4; arid regions including the Thar Desert) are characterised by high intra-annual variability (DPar) and low values of flow (q95).Ephemeral rivers exist in this region due to high evaporation rates (e.g. the Luni River), and generate runoff mainly during the monsoon period.The high variability in the flow regime is also shown by the high values of CV (coefficient of variation), flash and RLD signatures.Similar flow characteristics are observed for the catchments located in the semi-arid regions (cluster 1), yet not at the same range of signature values as for cluster 4. The catchments in cluster 1 are also fast, responsive and their flow shows strong dynamics, in terms of RLD and DLD.Catchments in cluster 2 are located in the tropical climate and their runoff response is mainly driven by rainfall.Although these catchments receive less precipitation compared to other regions, their normalised high flow statistic (q05) is the highest of any cluster group.Moreover, catchments in cluster 5 are located at the downstream areas of the Indus River, distinguished for their high values of low flows.Finally, catchments in cluster 6 are characterised for their high mean annual discharge values and are located at the downstream areas of the large river systems (Indus, Ganges and Brahmaputra).Note also that only few catchments belong to these cluster groups: 112 and 57 catchments in clusters 5 and 6, respectively.
Repeating the clustering analysis at two different steps of the calibration procedure helps us to assess spatial changes in the hydrological response in the region and hence improve our process understanding.Figure 13 shows that parameterisation can affect the spatial pattern of clusters in terms of catchment functioning.In particular, clusters after calibration (regional step) seem to have a consistent spatial structure; this also justifies the validity of parameter regionalisation approaches based a spatial proximity between catchments.Results from clustering based on physiography show spatial consistency in the arid region (Thar Desert) and the western coast (Western Ghats), respectively.This affected identification of the regional parameters (multipliers of precipitation and evapotranspiration) applied at the subbasin scale, which consequently led to a more consistent spatial structure in the mapping (cf.Fig. 13a and b).Finally, calibration of the soil and land-use parameters led to a better representation of snow processes and hence affected the flow signatures in the Himalayan region (cluster 3).

Performance in India-HYPE v1.0 and future model refinements
Many other catchment-scale and multi-basin hydrological models have been applied in (parts of) the Indian subcontinent.However, it is generally common that only results from success stories are reported, which limits the potential for comparative analyses and hence improving process understanding.Here, we presented results from all 42 Indian GRDC stations including both failures and successes.We completed the setup of the first model version and documented the India-HYPE version 1.0 providing also guide- lines on how to start working on the next version, looping back to step 1.Overall, India-HYPE performed well for most river systems, with the performance being comparable to other studies in which a model was applied at the large scale.Application of the VIC (Variable Infiltration Capacity) hydrological model resulted in a similar performance for the large systems of the Ganges, Krishna, and Narmada (Raje et al., 2013) with the Nash-Sutcliffe efficiency, NSE (Nash and Sutcliffe, 1970), varying between 0.44 and 0.94 (at the same stations India-HYPE achieved an NSE between 0.45 and 0.94).In contrast to previous studies, our contribution lies in the fact that anthropogenic influences (i.e.reservoirs and irrigation) are simulated, as those have been shown to be very important in controlling the amplitude, phase, and shape of the hydrograph.Other models, e.g.SWAT, have also been applied in India to assess the impacts of climate change; however, the parameters have been estimated empirically from the literature, whilst the performance was not reported (Gosain et al., 2006(Gosain et al., , 2011)).Catchment-scale hydrological models from India have generally been achieving a high performance (Arora, 2010;Patil et al., 2008), mainly due to the local gauged data used: usually the data are governmental and confidential with high spatiotemporal resolution and less uncertainty/error.In addition, model parameters in single catchments are normally transferred along a smoother hydroclimatic gradient and are calibrated for individual gauging stations.Nevertheless, catchment-scale studies set a benchmark of performance and provide deeper knowledge of process description, which leads to further refinements in multi-basin modelling.Of particular interest are the investigations about the western Himalayas, in which India-HYPE performed poorly.Studies by Singh and Bengtsson (2004); Singh and Jain (2003) and Singh et al. (2006) highlight the importance of accumulation/melting processes in the snow-/glacier-fed parts of the region, each accounting for 17 % of the total discharge; however for other regions of the Indus system higher contributions from snow and ice are reported (Immerzeel et al., 2009).The poor model performance in terms of alpha (variability) and beta (volume) highlights the need to refine the current snow/glacier algorithms and/or improve the parameters by using this soft information in model evaluation.Similar model needs can be concluded when assessing the India-HYPE performances at the Ganges and Brahmaputra basins based on previous literature (Arora, 2010;Nepal et al., 2014).Finally results for the arid northwest and mountainous regions highlighted the need to refine the pot.E algorithm.Most regional hydrological studies considered relationships including extraterrestrial radiation and relative humidity, e.g.Hargreaves-Samani or Penman-Monteith, which are expected to improve the magnitude and variability of evapotranspiration losses (Samaniego et al., 2011).Therefore, the pot.E model component will be further investigated and refined in the next version of India-HYPE.

Conclusions
By investigating the modified recommendations for predictions in ungauged basins across the Indian subcontinent, we found the following.
-Each step in the best-practice procedure was relevant and we could find methods that also work at the large scale using the knowledge derived for catchments during the PUB decade.We argue for adapting an incremental and agile approach to model set-up, which requires frequent testing to get feedback on introduced changes.The large-scale modelling is more prone to technical problems and data inconsistencies that become apparent when running the model and therefore should be resolved early in the model set-up process.
-Multi-basin modelling of ungauged rivers at the large scale reveals insights into spatial patterns and dominating flow processes.Indian catchments can be categorised into six clusters based on their flow similarity.River flow varies spatially in terms of flow means, variability, extremes, and seasonality.Catchments in the Himalayan region and the Western Ghats seem to respond similarly and are characterised by high mean annual specific runoff values and variable flow regime.The response of the catchments in the tropical zone is characterised by high peaks, while catchments in the dry regions show very strong flow variability and respond quickly to rainfall.
-Overall, the model showed high potential for representing the hydrological response across the region despite the strong hydroclimatic gradient.However, the India-HYPE v.1.0still needs to be improved to be equally reliable for predictions in ungauged basins as for gauged rivers.The model set-up procedure, according to the PUB recommendations, leads to insights into where the single-model structure did not perform well.Based on this, future model improvements will mainly focus on the western Himalayas and arid regions by refining the hypothesis of snow/glacier processes and the evapotranspiration algorithm.

Figure 1 .
Figure 1.Best practices for predictions in ungauged basins: (a) according to Fig. 13.1 by Takeuchi et al. (2013) in Blöschl et al. (2013); (b) modified version for multi-basin applications at the large scale.

Figure 2 .
Figure 2. (Left) Map of the Indian subcontinent (model domain).Results will be shown from investigation areas with a star in the order of their numbering.(Right) Annual cycles (1971-1979) at four river systems (a-d) of various climate (P -observed precipitation, Act.Emodelled actual evapotranspiration, Q -observed discharge).

Figure 3 .
Figure 3. Example of the impact of basin delineation and routing on model behaviour: (a) correction in the location (red x and green circle is prior and after the correction, respectively) of the Dandeli discharge station (Kali Nadi River basin) and (b) the corresponding modelled discharge before and after the correction.In (a) the subbasins and flow accumulation are also depicted.

Figure 4 .
Figure 4. Signature analysis in the spatiotemporal model evaluation: (a) the mean annual specific runoff, (b) the normalised high flow statistic, and (c) the slope of the flow duration curve.Blue and red circles are used for the calibration and evaluation stations, respectively.

Figure 5 .
Figure 5. Impact of model parameterisation of reservoir regulation on discharge for (a) monthly streamflow, and (b) annual hydrograph,showing naturalised (without) and regulated (with) conditions at the basin outlet (located at asterisk 2 in Fig.2).

Figure 6 .
Figure 6.Subbasin clusters using a k-means clustering approach based on physiographical characteristics.

Figure 7 .
Figure7.Coefficient of potential evapotranspiration (cevp) parameter as identified (the range is derived from the 100 parameter sets that perform best, and the optimum set) for different objective functions (RMSE and Bias) and land-use type.Lines with markers present the optimum parameter values for different objective functions.

Figure 8 .
Figure 8. Constraints (grey dashed lines) and optimum (solid lines) values of the mactrsm soil-dependent model parameter based on process understanding.

Figure 9 .
Figure9.Improvements in model performance (average KGE for 30 stations) during the stepwise calibration approach (steps 1-3 correspond to general, soil-land use, and regional calibration as described in Sect.3.3)."1st run" corresponds to model performance of the very first model set-up to establish a technical model infrastructure."Prior" corresponds to model performance before parameter calibration and after overcoming routing errors.The evaluation is conducted at the calibration (blue) and the validation (red shaded) period.

Figure 10 .
Figure10.Analysis of model variables at the Akhnoor station (Chenab River; asterisk 3 in Fig.2): P , SD, T , E, SMDF and Q. E corresponds to potential (Pot.) and actual (Act.)evapotranspiration, and Q corresponds to modelled (Mod.) and observed (Obs.)discharge).Note that P and T series are plotted at the outlet of the basin (Down) and the most upstream subbasin (Up).

Figure 11 .
Figure 11.Classification trees relating regions of different KGE performance with physical and climatic characteristics.The bars represent the probability of a performance resulting in any of the three performance classes (C0, C1 or C2).

Figure 12 .
Figure 12.Spatial variability of KGE (and its decomposed terms) model performance for the calibration (circle) and validation (triangle) stations.

Figure 13 .
Figure 13.Subbasin clusters based on flow signatures at different stages of the model set-up: (a) prior, and (b) regional.

Figure 14 .
Figure 14.Distribution of signature values for each cluster (at the regional step).The flow signatures are described in Appendix A.

Table 2 .
Statistics for the 42 gauging stations of river discharge used in the model evaluation.
* Values of inter-annual variability correspond to coefficients of variation calculated on 9-year periods.

Table 3 .
Median model performance for calibration and validation stations and periods.