A Hydrological Prediction System Based on the SVS Land-Surface Scheme : Implementation and Evaluation of the GEM-Hydro platform on the watershed of Lake Ontario

This work describes the implementation of the distributed GEM-Hydro runoff modeling platform, developed at 15 Environment and Climate Change Canada (ECCC) over the last decade. The latest version of GEM-Hydro combines the SVS (Soil, Vegetation and Snow) land-surface scheme and the WATROUTE routing scheme in order to provide streamflow predictions on a gridded river network. SVS is designed to be two-way coupled to the GEM (Global Environmental Multiscale) atmospheric model exploited by ECCC for operational weather and environmental forecasting. Although SVS has been shown to accurately track soil moisture during the warm season, it has never been evaluated before for hydrological 20 prediction. This paper presents a first evaluation of its ability to simulate streamflow for all major rivers flowing into Lake Ontario. The skill level of GEM-Hydro is assessed by comparing the quality of simulated flows to that of two established hydrological models, MESH and WATFLOOD, which share the same routing scheme (WATROUTE) but rely on different land-surface schemes. All models are calibrated using the same meteorological forcings, objective function, calibration algorithm, and watershed delineation. Results show that GEM-Hydro performs well and is competitive with MESH and 25 WATFLOOD. A computationally efficient strategy is proposed to calibrate the land-surface model of GEM-Hydro: a simple unit hydrograph is used for routing instead of its standard distributed routing component. The distributed routing part of the model can then be run in a second step to estimate streamflow everywhere inside the domain. Global and local calibration strategies are compared in order to estimate runoff for ungauged portions of the Lake Ontario watershed. Overall, streamflow predictions obtained using a global calibration strategy, in which a single parameter set is identified for the 30 whole watershed of Lake Ontario, show skills comparable to the predictions based on local calibration. Hence, global calibration provides spatially consistent parameter values, robust performance at gauged locations, and reduces the complexity and computational burden of the calibration procedure. This work contributes to the Great Lakes Runoff InterHydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-508, 2016 Manuscript under review for journal Hydrol. Earth Syst. Sci. Published: 1 November 2016 c © Author(s) 2016. CC-BY 3.0 License.

One example is section 1.4:I agree with reviewer 2 that the calibration procedure presented here is very hard to follow and requires revision (and may well benefit from a diagram).For instance, when you mention that calibration was performed on "the GRIP-O gauged area", do you mean that the model was calibrated on each of the gauged subbasins individually, or some kind of aggregate?If the former, how were the parameters then transferred to the ungauged area; if the latter, how was the aggregate obtained?It is important to keep in mind that reproducibility is a corner stone of scientific publication.I do not think that this is already achieved in the version that you attached in your reply to reviewer 2, especially if one considers that a considerable number of readers of HESS may not be thoroughly familiar with the GEM-Hydro setup.
These considerations lead me to request major revisions.The manuscript will be sent back to the reviewers, with the specific request to evaluate whether their concerns about presentation and transparency of the model implementation and evaluation have been addressed adequately, in addition to any scientific concerns.
Such revisions would also give you the opportunity, as reviewer 1 suggests, to increase the scientific relevance and perhaps even the streamline some of the content.
Depending on the availability of the original reviewers, I may also seek the opinion of a third reviewer.
At this point, I cannot guarantee that your manuscript will eventually be published in HESS; this will entirely depend on the thoroughness with which you will be able to address the reviews, which I believe are very thorough and constructive.

Kind regards Wouter Buytaert handling editor
Answer to Editor: The section 1.4 was reformulated and a diagram was added; A reference was added for the Unit Hydrograph (UH) in section 1.2; the following sentence was added in section 1.1: " The basin averages are computed as a weighted average of the SVS grid cells located in the considered basin." the following sentence was added in section 2.1 to give another detail related to the interpretation of Fig. 4 (formerly Fig. 3): " It can also be noticed on Fig. 4 that calibration sometimes inverts the sign of the PBIAS criteria (switching from over-to under-estimation or vice-versa)."A sentence was also extended at the beginning of section 2.3 (extended part in red) in order to add some information about the parameter value differences between local and global calibration as suggested by reviewer 1: " Moreover, it was noticed (not shown here) that parameter values were very different between local and global calibration procedures, even for catchments displaying very similar performances between the two strategies (such as subbasins 3, 5 and 8, see Fig. 7), highlighting the fact that local calibration is more prone to over-calibration (i.e., equifinality)." In order to mitigate the fact that we derive some hypotheses based on a few events only (as suggested by reviewer 2), the following sentence (in section 2.2): " Peak flow events associated to the spring freshet are generally better represented by MESH, which may be due to a better representation of the soil freezing and melting processes occurring in CLASS (MESH LSS)." was replaced with this one: "Peak flow events (even for other subbasins) associated to the spring freshet are generally better represented by MESH, which may be due to a better representation by CLASS of various cold regions hydrological processes, such as snow accumulation and melt, snow interception by vegetation, as well as soil freezing and thawing." Finally, the manuscript was read in order to try to streamline the content where possible.
We hence believe that reviewer 2 comments have been mostly satisfied, except regarding the intercomparison section which we would prefer to let in the paper for the reasons mentioned in the answer to reviewer 2 comments.

Introduction
Given the continuous increase in precipitation forecast skill of Numerical Weather Prediction (NWP) systems, as documented for example over the United States (US) by Sukovich et al. (2014), it is becoming possible to obtain skillful runoff forecasts directly from NWP model outputs, and streamflow forecasts by routing these gridded runoff fields.Indeed, modern NWP models all simulate to some extent the snow, vegetation, and soil processes that contribute to the generation of runoff and streamflow.However, many limitations are still associated with the representation of such processes in NWP systems, which are documented in Clark et al. (2015) and Davison et al. (2016).
Hydrological processes simulated by land-surface schemes (LSS) used for NWP are improving quickly (Balsamo et al., 2009;Masson et al., 2013;Alavi et al., 2016;Wagner et al., 2016), as soil water content and snow water equivalent are recognized as key state variables for streamflow forecasting (Koster et al., 2004;Entekhabi et al., 2010).Environment and Climate Change Canada (ECCC), which provides operational weather and environmental forecasts within its boundary, is in the process of implementing a major upgrade to the LSS of the Global Environmental Multi-scale model (GEM), the national model.This new scheme, named SVS for Soil, Vegetation and Snow, has been devised to assimilate space-based soil moisture retrievals as well as surface data, and has proven efficient at simulating soil moisture and brightness temperature (Alavi et al., 2016;Husain et al., 2016).SVS will replace the Canadian version of the ISBA scheme (Interaction Sol-Biosphère-Atmosphère) that has been used operationally since 2001 (Bélair et al., 2003).One of this paper's objectives is to present the first evaluation of the capabilities of the new SVS scheme for hydrological prediction in Canada.
GEM's LSSs can be run either two-way coupled to the atmospheric model or offline, using GEM or other observed atmospheric forcing.The platform for running GEM offline is known as GEM-Surf (Bernier et al., 2011).Runoff obtained from the LSS can then be routed to the outlet of the watershed using the WATROUTE routing scheme (Kouwen, 2010).This configuration is known as GEM-Hydro.
Our current evaluation of GEM-Hydro focuses on the Lake Ontario watershed for many reasons including (1) the socio-economic impacts that improvements to streamflow and lake level prediction skill can have on a region of Canada that is quite populated and industrialized; (2) the large amount of data available for model set up, calibration, and validation, compared to other regions of Canada; and (3) the fact that this is a Canada-USA transboundary watershed which is comanaged by ECCC and US Army Corps of Engineers (USACE) staff, in accordance with water level management rules set by the International Joint Commission (IJC) for each control structure, including the Moses-Saunders power dam at Cornwall, the outlet of Lake Ontario (Fig. 1).Different cascades of interconnected models have been developed over the years to simulate the Great Lakes water levels and thermodynamics, as reported by Wiley et al. (2010), Deacu et al. (2012), andGronewold et al. (2011), the latter describing the Advanced Hydrologic Prediction System (AHPS), a seasonal water supply and water level forecasting system developed by the National Oceanic and Atmospheric Administration (NOAA) Great Lakes Environmental Research Laboratory (GLERL) in the mid-1990s that has since been employed operationally (with few changes in methodology) by the USACE and regional hydropower authorities.Recently, ECCC has implemented a short-term (84-h) operational water cycle prediction system for the Great Lakes and St. Lawrence River (WCPS-GLS) that uses coupled atmospheric, hydrologic, and hydrodynamic models (Durnford et al., submitted).This system makes use of the same platform used in this study, GEM-Hydro, but relies on the simpler ISBA LSS.
To our knowledge, the AHPS and WCPS systems are the only two systems that provide inflow forecasts for each of the Great Lakes on both sides of the Canada-US border, and neither relies on very sophisticated hydrological models.The need for improving simulations and forecasts of runoff to the Great Lakes is recognized by both agencies (Gronewold and Fortin, 2012).Multiple additional hydrologic models are indeed available (Coon et al., 2011), however their spatial domains are typically constrained to either the US or Canada.Before embarking on an upgrade of operational systems, GLERL and ECCC agreed to perform a number of intercomparison studies under the umbrella of the Great Lakes Runoff Intercomparison Project (GRIP), in order to better understand the status of existing systems, and to set a benchmark for model performance against which future models could be compared.The first study was conducted on the Lake Michigan (GRIP-M) watershed by Fry et al. (2014) who compared historical runoff simulations from dissimilar hydrologic models using different calibration frameworks and input data.Amongst the models compared were GLERL's Large Basin Runoff Model (LBRM; Croley and He, 2002) that is part of the AHPS, the NOAA National Weather Service model (NWS; Burnash, 1995), and ECCC's MESH distributed model (Modélisation Environnementale -Surface and Hydrology; Pietroniro et al., 2007;Haghnegahdar et al., 2014).A second configuration of MESH was also included, based on Deacu et al. (2012), from which evolved the configuration of GEM-Hydro used by Durnford et al. (submitted) for the operational WCPS-GLS system.
The NWS model performed best in terms of Nash-Sutcliffe skill, but was positively biased, perhaps because of its typical use as a flood forecasting tool.Overall, it was difficult to attribute any difference in model results to the model structure, given that different forcing data and calibration procedures had been used by each contributor to the project.
The GRIP project was extended next to Lake Ontario (GRIP-O) by Gaborit et al. (2016 a), who compared two lumped models, namely LBRM and GR4J (modèle du Génie Rural à 4 paramètres Journalier; Perrin et al., 2003), with the exact same forcing data and calibration framework.Two precipitation datasets were used as input: the Canadian Precipitation Analysis (CaPA; Lespinas et al., 2015), and a Thiessen polygon interpolation of the Global Historical Climatology Network -Daily (GHCND; Menne et al., 2012).CaPA is a near real-time quantitative precipitation estimate product from ECCC that is available on a 10-km grid for all of North America: (http://collaboration.cmc.ec.gc.ca/cmc/cmoi/product_guide/submenus/capa_e.html).
The main findings of the first GRIP-O study are that the performance of the models was very satisfactory (average Nash √ in validation of 0.86 (over all subbasins and configurations), whatever the precipitation database used, for all tributaries of Lake Ontario, despite the fact that most tributaries have a regulated flow regime.This satisfactory performance justifies the use of CaPA as a precipitation forcing dataset in later studies, especially for distributed models which require gridded precipitation as input.The performance of lumped models also provides a reference level of performance when evaluating distributed hydrological models (see for example Figure 43, where we can see that GEM-Hydro and GR4J performances are very similar).
The present work is an extension of the first GRIP-O study but focused on distributed hydrological models.
Compared to the first GRIP-O study which mainly aimed at identifying the performances that it is possible to reach for the area and the local relevance of the CaPA analysis for hydrological modeling, this study mainly aims at finding a methodology to implement the distributed GEM-Hydro model over the whole Lake Ontario watershed, including its ungauged parts, in an efficient manner.Distributed models are more complicated to implement and more computationallyintensive than lumped ones, but have a broader range of applications.Moreover, GEM-Hydro can estimate the Lake Ontario Net Basin Supplies (or NBS, the sum of lake tributary runoff, overlake precipitation, and overlake evaporation : Brinkmann 1983).A second objective is to compare GEM-Hydro with two other distributed models (which is this study's contribution to GRIP-O) in order to identify avenues to further improve GEM-Hydro.In our view, this study's main outcome is the efficient and reliable methodology proposed to implement a sophisticated distributed model made of a LSS and a routing scheme over a large area with ungauged parts.

Models
Three different platforms are compared in this study: MESH, WATFLOOD, and GEM-Hydro.They have in common a distributed representation of most hydrological processes occurring in a watershed and a structure organized around two main components: a LSS for the representation of surface processes (evapotranspiration, infiltration, snow processes, water circulation in the soils), and a river routing scheme for simulating water transport in the streams, which consists of WATROUTE for all models.WATROUTE is a 1-D hydraulic model relying mainly on flow directions and elevation data (Kouwen 2010).It routes to the catchment outlet the surface runoff and recharge produced by the surface schemes.In WATROUTE, runoff directly feeds the streams while recharge can be provided to an optional Lower Zone Storage (LZS) compartment, representing superficial aquifers, which releases water to the streams.WATFLOOD and GEM-Hydro make use of the LZS, whereas recharge from MESH feeds directly into the stream.
The version of MESH used in this study relies on version 3.6 of the Canadian LAnd Surface Scheme (CLASS).
Each grid cell is subdivided in a number of tiles, and each tile is classified as belonging to one of the five grouped response units (GRUs), based on its land-use/soil type combination.In this paper, we follow the local calibration strategy advocated by Haghnegahdar et al. (2014) for MESH (see section on calibration strategy).GEM-Hydro is very similar to MESH, but is tied to the LSSs available in GEM: ISBA and SVS.A previous study on the same watershed demonstrated the clear superiority of SVS over ISBA, especially in regard to the baseflow component of the streamflow (see Gaborit et al., 2016 b).We thus only use SVS with GEM-Hydro in this paper.
WATFLOOD (Kouwen, 2010) is a distributed model of intermediate complexity that only needs precipitation and temperature as forcing, as opposed to MESH and GEM-Hydro which need additional atmospheric variables (Table 1).It relies on the GRUs concept and on many empirical equations.WATFLOOD has been employed by Pietroniro et al. (2007) over the Great Lakes watershed.
In this project, WATFLOOD and MESH are implemented with a 10 arcmin (≈ 20 km) spatial resolution (both for their LSS and routing schemes), while GEM-Hydro is implemented with a 10 arcmin resolution for the LSS and 0.5 arcmin (≈ 1 km) for the routing.Sensitivity tests (Gaborit et al., 2016 b) revealed that 2 and 10 arcmin resolutions for SVS lead to quite similar performance in terms of streamflow at the outlet, while a substantial amount of computational time is saved when running the coarser resolution (almost proportionally if using the same number of nodes).The same was shown for WATROUTE which produces outputs of similar quality be it implemented at a low (10 arcmin for MESH and WATFLOOD) or high (0.5 arcmin with GEM-Hydro) resolution, as long as results are evaluated for large enough catchments (i.e., catchments which spread over at least a few grid cells).However, the high-resolution WATROUTE version is preferred in GEM-Hydro for consistency with the WCPS-GLS (Durnford et al., submitted) recently developed at ECCC.Hence, the higher resolution GEM-Hydro's routing scheme is not expected to give GEM-Hydro any advantage in comparison to MESH and WATFLOOD.
The internal time-step used for GEM-Hydro is 10 minutes, which slightly improves streamflow simulations in comparison to a 30 min.time-step (see Gaborit et al., 2016 b).Further reducing it does not improve the results.The internal time-steps used for MESH and WATFLOOD are respectively equal to 30 and 60 minutes.The internal time-step of a model is generally maximized up to the desired output interval, provided that it satisfies numerical stability.In the GEM-Hydro version used in this study, a 10-min.time-step was required to achieve numerical stability, but a newer version now allows to increase it.Table 1 summarizes the main specificities of the models and the required forcing data.Table 2 shows the datasets used for physiographic information.
As the GEM-Hydro suite (including WATROUTE) is quite demanding in terms of computational time, it was decided to test a stand-alone configuration of GEM-Hydro relying on text files only and in which WATROUTE is replaced by a Unit Hydrograph (UH).This version is here forth referred to GEM-Hydro-UH.Indeed, the computational time for the experiment setup described here and when splitting the domain in four on an ECCC supercomputer is about 1.5 min per day for the LSS part of GEM-Hydro (SVS), provided that the pre-processing of the atmospheric variables was already done (which is the case in calibration: the pre-processing is done only once).The WATROUTE code is not yet parallelized, each grid point being processed from upstream to downstream, but requires only 25s per day for the setup described here when running on a local machine.However, the WATROUTE pre-processing (i.e., preparation of the WATROUTE input files from the SVS outputs) takes about 30s per day.Therefore, WATROUTE computational time was still lower than the SVS one for this setup.One simulation run over the GRIP-O period (4.5 years) therefore requires about 2 days with GEM-Hydro and prevents from performing any automatic calibration (which requires at least 400 runs, see below).Instead of using GEM-Hydro to run SVS, a stand-alone SVS version was used.This executable saves a tremendous amount of computation time compared to GEM-Hydro mainly because of the Input/Output processing time: the stand-alone version makes use of text files which are kept open during the simulation and requires only 34.5s per day on a local machine for this setup (1.2 h for the 4.5 years GRIP-O period or 230 days of calibration with 400 runs if running the whole domain).However, the computational time required by WATROUTE still had to be bypassed to perform automatic calibrations, which was done with the UH concept.The UH (see for example Sherman 1932) allows the estimation of the streamflow at the basin outlet by partitioning the basin averages of runoff and recharge in time.The same WATROUTE LZS formulation is used in GEM-Hydro-UH in order to estimate stream recharge.The basin averages are computed as a weighted average of the SVS grid cells located in the considered basin.The UH only requires a decay parameter corresponding to the lag or response time of the considered catchment, which controls the delay between the rainfall event and the resulting streamflow peak.It is estimated with the Epsey method (Almeida et al. 2014), which requires the catchment area, perimeter, and the maximum and minimum elevations along the catchment main river.The UH lag-time is also used as a free parameter during calibration (Table 3).It is inspired from the UH applied to the routing storage of GR4J (Perrin et al., 2003), but is employed here at an hourly time-step.This framework allows a considerable reduction of computational time dedicated to calibration.
Hydrographs resulting from GEM-Hydro and GEM-Hydro-UH can be very similar (Fig. 23).Finally, the SVS parameters identified by calibrating GEM-Hydro-UH are next transferred to the full version of GEM-Hydro, which then only needs WATROUTE Manning coefficients to be adjusted (if needed) in order to mimic the optimal hydrographs obtained with GEM-Hydro-UH.This last adjustment can be done manually with a few offline WATROUTE runs.
The version of WATROUTE used in this work with GEM-Hydro relies on spatially-varying Manning values derived from physiographic information (i.e., land use), and on spatially-constant values (i.e. the same everywhere inside a given watershed) for the two LZS coefficients.These values were manually adjusted in order to be suitable to the whole GRIP-O area (Fig. 1), and hereafter referred to as the standard values for WATROUTE.In contrast, WATFLOOD relies on spatially-constant values for the Manning and LZS coefficients, which are adjusted during the automatic calibrations (see Table 4).In MESH, 5 river classes are defined based on spatial attributes, and each class possesses its own Manning coefficients which are adjusted during calibration (Table 5).MESH does not include the LZS representation.This configuration difference between the distributed models is not envisioned to give GEM-Hydro any advantage, as comparisons were made between using fixed or spatially-varying Manning values with GEM-Hydro, leading to the conclusion that performances could be the same in both cases after a few manual adjustments (see Gaborit et al., 2016 b).

Study area and data
The GRIP-O spatial framework is defined on Fig. 1.A more detailed description of the area is available in Gaborit et al. (2016 a).
The Lake Ontario basin (Fig. 1) covers 83 000 km 2 , of which 19 000 km 2 is the lake surface.All upstream water arriving through the Niagara River is excluded to focus only on the lateral runoff component of Lake Ontario NBS (see Introduction).The US/Canada border follows the Niagara River, the middle of Lake Ontario, and the St.-Lawrence River down to Cornwall regulation dam, the Lake outlet.Apart from some major cities (e.g.Toronto), the catchment is mostly rural (agriculture, pasture, forest), as shown in Danz et al. (2007).
Streamflow time series were selected based on their duration and proximity to the lake shoreline.Of the 30 selected sites (Fig. 1), 27 have no missing data, 2 are complete at 94%, and one at 80% over the GRIP-O period.Nearly 70% of the total Lake Ontario watershed is gauged by the selected sites.Most of the rivers are regulated in some ways, mainly for hydropower and flood mitigation, but regulation generally consists of reservoirs with a simple weir at their outlet (i.e., static control).Therefore, this did not prevent lumped models from reaching good performances in the former GRIP-O study of Gaborit et al. (2016 a).As a consequence, no effort was made to represent in a detailed manner the artificial structures of the region in WATROUTE.Moreover, the small diversions occurring to fill some canals in the region, or even the aquifers which can contribute significantly to baseflow (Singer et al., 2003;Kassenaar and Wexler, 2006), do not prevent lumped models from reaching good performances.This , which is helpful to this study, yet the flow values involved in the diversions would a priori still have to be taken into account when estimating Lake Ontario's NBS..The physiographic data required by the distributed models under study consist of soil texture, land use / land cover, Digital Elevation Model (DEM), and flow direction grids.Table 2 lists the datasets used to provide the physiographic and atmospheric inputs required by the models.26 land cover classes are defined in GEM-Hydro, while WATFLOOD and MESH rely only on 7 of them, which are aggregations of GEM-Hydro classes.Soil textures are from the Global Soil Dataset for Earth system modeling (GSDE; Shangguan et al., 2014), which contains information down to 2.8 m.However, soil texture is calibrated for MESH (Table 5).Soil texture was not calibrated for GEM-Hydro-UH, but some hydraulic parameters, which are derived from soil texture, were calibrated (Table 3).WATFLOOD does not need soil texture information (Table 2).By default, the maximum soil depth is defined aswas set to 1.4 m in GEM-Hydro (for the area under study), 4.1 m in MESH, and is not defined in WATFLOOD.The maximum soil depth is calibrated in GEM-Hydro and MESH (Table 3 to Table 5).However, GEM-Hydro relies on a constant soil depth for a given model implementation, while MESH uses a different soil depth value for each of its five GRUs.Sensitivity tests performed with GEM-Hydro (Gaborit et al., 2016 b) indicated that its outputs have a limited sensitivity to the maximum soil depth value, given that it is greater than 1 m.
Precipitation forcing consists of 24-hourly accumulations from the Canadian Precipitation Analysis (CaPA version 2.4b8).Over the period of interest, CaPA consists of precipitation fields modeled by the Canadian Regional Deterministic Prediction System (RDPS, ≈15 km resolution), corrected by local rain gauge observations (Lespinas et al., 2015).CaPA provides both 6-h and 24-h accumulations.The 24-hour accumulations were preferred to the 6-h CaPA data because fewer observations (about twice less) are used in the 6-h product to correct the model fields of precipitation, especially over the US part of the domain.The daily CaPA accumulations were disaggregated on an hourly time-step by following the temporal pattern of hourly precipitation from the RDPS (Carrera et al., 2010).The remaining atmospheric forcings (Table 1) are taken from RDPS outputs, using short-term forecasts having lead time of 6 to 18 h.

Calibration strategy
The GRIP-O experiment extends from June 1st, 2004 to September 26th, 2011.Calibrating a hydrologic model over a period of four to five years is generally deemed sufficient to achieve reasonable model robustness (e.g.Refsgaard et al., 1996).The calibration period thus ranges from June 1st, 2007 to September 26th, 2011 (4.5 years).Validation is from June 1st, 2005 to June 1st, 2007 (2 years, last one being used as spin-up for calibration), and spin-up from June 1st, 2004 to June 1st, 2005 (1 year).Note that during the automatic calibrations, the spin-up year was simulated only once and for all subsequent runs.The objective function is the Nash-Sutcliffe criterion (Nash and Sutcliffe, 1970) computed on the squareroot of the observed and simulated time series, in order to avoid over-emphasizing peak-flow events -here forth referred to as "NSE √".These decisions are consistent with the lumped modelling decisions made for GRIP-O in Gaborit et al. (2016 a).
Other evaluation criteria used in this study consist in the common Nash-Sutcliffe criteria (NSE), the Nash criteria calculated over the log of the flows ("NSE Ln"), and a Percent Bias criteria (PBIAS, equation 1) assessing the simulation's overall water budget fit: a positive value denotes a general tendency to underestimate flows, and vice-versa.
* 100 (1) All metrics are evaluated at the daily time-step.Calibration relies for all models on the Dynamically Dimensioned Search (DDS) algorithm (Tolson and Shoemaker, 2007).Calibration cost did not allow models to be calibrated locally for all GRIP-O subbasins (Fig. 1), but only those shown on Fig. 4. One local calibration takes between 2 and 5 days of computation (400 model runs, see below).Table 3 to Table 5 list the free parameters of the models.Different paradigms were used to calibrate them.GEM-Hydro-UH was calibrated using multiplicative coefficients that adjust the spatially-varying values of a given parameter, leading to a reasonable number of free parameters (16) while preserving spatial variability.MESH was implemented calibrating the 12 free parameters of its 5 different GRUs in an independent manner, thus resulting in 60 free parameters.WATFLOOD had the lowest number of free parameters during calibration, and involved calibrating parameter values which are valid for the entire subbasin (no spatial variability) or for one of the three main land cover types considered inside the model, i.e. bare ground, snow covered ground, or other grounds (Table 4).
It is important to emphasize that the approach used to calibrate GEM-Hydro may result in unrealistic values for some parameters, as the multiplicative coefficients could bring them beyond the range of physical coherence.More precisely, soil water content thresholds and albedo (Table 3) cannot be higher than 1.Therefore, these values were constrained to realistic ranges after they were adjusted by the calibration algorithm by imposing them a minimum value of 0 and a maximum of 1.
The initial parameter values were either set to default ones that generally provide satisfactory results for the model (GEM-Hydro-UH, Table 3) or to random values (WATFLOOD, MESH).The number of maximum model runs allowed depends on the model being used.For example, 400 runs revealed sufficient for GEM-Hydro-UH (Sect.2.2) in the sense that no significant performance improvement was achieved beyond.This is because the number of GEM-Hydro-UH free parameters is relatively low (16, Table 3).The DDS algorithm is very efficient in the sense that it adjusts the search behavior to the maximum number of objective function evaluations (model runs) in order to converge to good quality solutions (Tolson and Shoemaker, 2007).The similarity of the performances obtained with GR4J and GEM-Hydro-UH (Fig. 43) supports the choice of the methodology used here, as GR4J was implemented with a maximum of 2000 model runs, three distinct calibration trials, and had an even lower number of free parameters (6, see Gaborit et al., 2016 a).
A maximum of 1000 model runs was used to calibrate MESH and of 1500 for WATFLOOD.Finally, the calibration strategy used for MESH consists of an improved and reliable strategy based on the work of Haghnegahdar et al. (2014).Despite the random initial values used for MESH and WATFLOOD, only one calibration trial was performed for each of the models on a given subbasin.Even though the three models studied here were not calibrated using the same number of free parameters and the same maximum allowed model runs, it is assumed that the calibration strategies employed allow each model to come very close to its optimal performance for a given subbasin and the time period considered.Indeed, the strategy used for each of the three models is the result of expert knowledge and always involves parameters affecting the whole range of the main hydrological processes, i.e. evaporation, snowmelt, infiltration, soil transfer, and time to peak (channel friction).It is thus logical to use different strategies for each of the models as these do not involve the same parameters, land use classification, or even physical processes.The most important methodological consistencies for achieving a fair comparison between models include, in our view, a common calibration algorithm and objective function, along with common physiographic and forcing data.
Finally, some subbasins in Fig. 1 have more than one major tributary flowing into Lake Ontario.In this case, the most-downstream observed flows on independent tributaries are summed and then extrapolated to the whole subbasin using the Area Ratio Method (ARM; Fry et al., 2014).The resulting "synthetic" flows were considered as observations for GEM-Hydro-UH calibration over the whole subbasin, including its ungauged parts.This methodology was applied to all subbasins with more than one most-downstream gauge (identified with the "N/A" mention for the station attribute in Table 6) for consistency with the calibration experiments performed in the first GRIP-O study (see Gaborit et al., 2016 a), and because lumped models (and GEM-Hydro-UH) can only estimate streamflow at one location.For these subbasins, the true gauged fraction is specified in Table 6.

Strategy for ungauged areas
The ultimate objective of the GRIP-O project consists ion improving simulated Lake Ontario NBS, which calls for estimating runoff from all ungauged areas.For that sakeTo do so, calibration was performed using GR4J and GEM-Hydro-UH models on theover the GRIP-O gauged area (which includes all GRIP-O gauged subbasins, see Fig. 1), and the resulting parameter sets were was used in the model implemented transferred to the same models but when implemented for over the whole Lake Ontario watershed, including its ungauged parts (Fig. 1).The "GRIP-O gauged area" is actually gauged at 88.5% due to the strategy used for subbasins with several major tributaries (see end of previous section).
For GR4J, a single (unique) model was used over each of these two areas, requiring a unique calibration and a straightforward parameter transfer.Therefore, the GRIP-O gauged area is represented in GR4J as if it had a unique main river.It was demonstrated in the first GRIP-O paper (see Gaborit et al., 2016 a) that a unique (i.e., single) GR4J model calibrated over a large area could lead to runoff estimates of similar quality than with multiple models implemented over local subbasins., the former strategy being more efficient.Hence for GR4J, local calibration was used but with a unique model for the GRIP-O gauged area.
GEM-Hydro-UH was however implemented locally for each of the gauged GRIP-O subbasins, but a global calibration strategy (see further down) led to a unique calibrated parameter set which was then transferred to a GEM-Hydro model implemented over the whole Lake Ontario watershed.The GRIP-O gauged area consists of the true gauged area (Fig. 1), plus the ungauged areas of the gauged subbasins including multiple gauge stations.This is because with local models (as with GEM-Hydro-UH) and in the case of subbasins with several most-downstream gauges, the implementation was performed over the whole subbasin, including its ungauged part (see above).Therefore, the gauged area considered in this section and referred to as the "GRIP-O gauged area" is actually gauged at 88.5%.
The approach based on calibration for the GRIP-O gauged area and parameter transfer to the whole Lake Ontario watershed was preferred to other possible alternatives mainly for two reasons: it allows calibrating the models using close approximations of observed flows (the area used for calibration is gauged at 88.5%, see above) instead of less reliable flow estimations for the whole watershed (gauged at 70%), and to take into account rainfall over the ungauged areas as well as rainfall over the gauged areas, or, in other words, to use the best approximation of rainfall.Yet this methodology involves two implementations of each model: one for the gauged part of the watershed and one for the whole area (Fig. 1).
It was demonstrated in the first GRIP-O paper (see Gaborit et al., 2016 a) that a unique (i.e., single) GR4J model calibrated over a large area could lead to runoff estimates of similar quality than with multiple models implemented over local subbasins.This was also demonstrated by Croley (1983) with the LBRM.A single (global) model has the advantage of requiring only one implementation and calibration, whereas local models require multiple implementations and possibly multiple calibrations.Therefore, a unique GR4J model was implemented twice, one over the GRIP-O gauged area (see above) for calibration, and one over the whole Lake Ontario watershed.
With GR4J, the parameter transfer protocol is straightforward as we end up with a unique parameter set for the unique model.However, GEM-Hydro-UH was here implemented in a local manner, i.e., for each of the gauged GRIP-O subbasins.When it is calibrated locally for each of the gauged subbasins, we end up with specific parameter sets for each of the subbasins, making the reliability of any parameter transfer very low (Sect.2.1).Therefore, another strategy was chosen to calibrate GEM-Hydro-UH: global calibration.
The gGlobal calibration of GEM-Hydro-UH consists in finding a unique trade-off parameter set that allows to simultaneously improve performances for all subbasins (Ajami et al., 2004;Haghnegahdar et al., 2014;Gaborit et al., 2015), whereas local calibration consists in finding each subbasin's optimal parameter set.Local calibration logically leads to the optimal performances for a given subbasin, but global calibration may lead to temporal robustness (Gaborit et al., 2015) and spatial consistency of the parameter values, because they are either fixed or adjusted the same way over the total whole area under study.Local calibration, on the other hand, because of equifinality and experiment imperfections (model processes, forcing data, observed flows, etc.), may compensate for simulation errors and lead to parameter sets that do not work well when transferred to other (even neighbor) subbasins, as tends to suggest the fact that very different parameter sets were obtained here with the local calibrations of GEM-Hydro-UH (Sect.2.1 and Table 7).Despite global calibration may not be totally exempt of equifinality, the attention paid to the parameter ranges used (Table 3) allows to be confident in the physical relevance of the final parameter values.For GR4J, local calibration was used but for a unique implementation on the complete GRIP-O gauged area (see above for the justification).
The objective function associated to global calibration of GEM-Hydro-UH is as follows: with +,-.& the NSE √ value calculated from the local calibration on subbasin /, and +0,-2 & the NSE √ calculated from the global calibration on subbasin /.This objective function aims minimizing differences between performances obtained from global and local parameter sets.It does rely on the hypothesis that global performance cannot be higher than local performance, but even if it was the case, this objective function would still be validmake sense and the gain achieved with global over local performance would simply compensate for errors obtained on other catchments, possibly allowing to reach a perfect objective function value even with catchments having poorer performances with global calibration than with local calibration.However, as GEM-Hydro-UH was not locally calibrated for all of the 14 GRIP-O subbasins (only those of Fig. 43 because of the computation cost), performances obtained with local GR4J calibrations (Gaborit et al., 2016 a) were used for the remaining ones to set the reference performance to be used in Eq. ( 2)for missing ones, justifying the use of that model in this study.This substitution does make sense considering that firstly, GR4J and GEM-Hydro-UH local performances are similar (Fig. 43), secondly that GR4J local performances were always very satisfactory (see Gaborit et al., 2016 a), and thirdly finally that the objective function still makes sense if global performance is higher than the local one (see above).
Moreover, a supplemental free parameter was used for GEM-Hydro-UH during global calibration (in addition to those in Table 3), namely the percentage of completely impervious urban areas.This value was fixed to 0.33 during local calibrations, implying that 33% of liquid precipitation or snowmelt over urban covers was automatically considered as runoff with no chance to infiltrate.This value comes from a former study calibrating the SWMM 5 model over urban subbasins in Québec City, Canada (Gaborit et al., 2013).With local calibration, good performances could be reached, using this fixed value, even for "urban" subbasins (such as subbasins 14 and 15 in Sect.2.1) as the effect of urban surfaces could be accounted for by the other free parameters ofin Table 3.Moreover, this additional parameter helps to distinguish between natural and urban surfaces for global calibration.The calibrated value of the urban cover fraction, which is completely impervious, is equal to 0.69 after global calibration (Table 7).This does make sense as the urban areas around the shore of Lake Ontario generally correspond to high-density areas, such as for the city of Toronto.Note also that with global calibration, the response time parameter controlling the UH duration (Table 3) was replaced with a multiplicative factor adjusting the default response times of all local subbasins.
Models were finally implemented over the whole Lake Ontario watershed (Fig. 1), and runoff simulations performed with the parameter sets calibrated over the GRIP-O gauged area.GEM-Hydro was selected for this task instead of GEM-Hydro-UH since it was more straightforward and a priori more realistic (see further) to use WATROUTE instead of the simple UH for the ungauged areas of the lake Ontario watershed.In GEM-Hydro, standard Manning coefficients were used in WATROUTE, while the lag-time of GEM-Hydro-UH was adjusted during calibration.But it was assessed that simulations with GEM-Hydro (calibrated SVS and LZS parameters and standard Manning values) were very close, both in terms of hydrographs and performances at the gauged sites, to those from the calibrated GEM-Hydro-UH.Performances are generally even slightly better with GEM-Hydro (despite the standard Manning values) than with GEM-Hydro-UH for individual subbasins (not shown), despite the opposite is true when looking at the total GRIP-O gauged area as a whole (see Table 8).).
Figure 3 summarizes the methodology described here for estimating runoff from the ungauged areas of the Lake Ontario watershed with GEM-Hydro.

Results and discussion
The comparison between GEM-Hydro and GEM-Hydro-UH is first presented to demonstrate the relevance of the UH approach to save the computation time associated with running the routing model of GEM-Hydro.Score improvements obtained by calibrating GEM-Hydro-UH for several subbasins of Lake Ontario watershed are then presented, followed by a performance comparison for all models.Finally, the methodology proposed with GEM-Hydro and the lumped GR4J model to simulate streamflows for the ungauged parts of the Lake Ontario watershed is evaluated.
Figure 22 presents the hydrographs simulated for the Moira river (subbasin 11 in Fig. 1), with SVS default parameters, standard WATROUTE parameter values in the case of GEM-Hydro, and a UH lag time estimated with the Epsey method in the case of GEM-Hydro-UH.As can be seen from this figure, GEM-Hydro-UH is able to produce streamflow simulations which are very close to those obtained with GEM-Hydro, underlying the relevance of such an approach to save computational time.Between the uncalibrated GEM-Hydro and GEM-Hydro-UH performances and over the different GRIP-O subbasins, the average absolute difference in Nash √ was 8% with the worst difference being 21%.See also Table 8 for a comparison between the calibrated GEM-Hydro and GEM-Hydro-UH models when looking at performances for the total GRIP-O gauged area.A complete GEM-Hydro run over the GRIP-O calibration period (4.5 years) takes about 48 hours, while the GEM-Hydro-UH version requires only 1.2 hours over the same period.

GEM-Hydro-UH local calibrations
This section presents GEM-Hydro-UH performances (Fig. 43) either with its default parameter values or after its local calibration on Lake Ontario subbasins, whose characteristics are given in Table 6.
As can be seen from Fig. 43, calibration provides substantial improvements in NSE √ values.Similar results were obtained for NSE and NSE Ln (although these results are not shown), and a lower improvement for PBIAS.Interestingly, all Mis en forme : Normal uncalibrated NSE √ are above zero (Fig. 43), and even satisfactory for subbasins 10 and 11.This is encouraging for ungauged subbasins applications.It can also be noticed on Fig. 4 that calibration sometimes inverts the sign of the PBIAS criteria (switching from over-to under-estimation or vice-versa).
Calibration also improves GEM-Hydro-UH Snow Water Equivalent (SWE) simulations but to a lesser degree than for the streamflow.For example, the NSE values for SWE simulations over the 4 consecutive winters of the GRIP-O period improved from -0.12 to 0.42 for the Genessee subbasin, and from 0.49 to 0.68 for the Black River subbasin, respectively before and after calibration (the SWE variable was not used in the computation of the objective function).SWE observations come from the SNow Data Assimilation System (SNODAS, see NOHRSC 2004).Calibration does influence evapotranspiration, but no observations are available to evaluate this model output.For example, for the Moira River, the mean subbasin annual evapotranspiration (over the calibration period) is equal to 527 mm and to 647 mm, before and after calibration respectively.The robustness of the model is also deemed very good, since performances do not substantially deteriorate between calibration and validation (Table 8).
Calibrated parameter values are quite different from one subbasin to the other (even for neighbor subbasins), which may be due to equifinality (different parameter sets can lead to similar simulations) but also to the anthropogenic streamflow regulations.Table 7 presents the ranges of the final parameter values obtained with local calibration.This strongly limits the potential for parameter transferability to ungauged subbasins (Razavi and Coulibaly, 2012;Parajka et al., 2013).As explained in Sect.1.4, global calibration can help overcoming this by leading to a spatially-coherent parameter set.Results of such an approach are presented in Sect.2.3.
Calibrated GEM-Hydro-UH performance values are generally very close to those obtained with GR4J and CaPA precipitation (Fig. 43): the mean absolute difference in Nash √ values is 6.1%, with the maximum being 15%.This is very encouraging as the performance benchmark set by GR4J simulations is most of the time quite high and hard to attain for other models.Therefore, GRIP-O allowed to improve streamflow simulations for the Lake Ontario basin, in comparison to the studies of Croley (1983) and Haghnegahdar et al. (2014), which are the main former studies who proposed the implementation of hydrologic models over this area.Moreover, as new improvements are in progress for SVS (see below), it is probable that GEM-Hydro-UH and GEM-Hydro will even be able to surpass GR4J in terms of performance in the near future.

Inter-comparison of all models
This section aims at comparing MESH, WATFLOOD, and GEM-Hydro-UH performance values.The calibration strategy used for each of them is described in Sect.1.3.Note that MESH was only calibrated on the Moira and Black Rivers, and WATFLOOD on the Moira, Black, and Salmon Rivers.Calibration and validation performances are presented in Fig. 54 and calibrated hydrographs, in Fig. 65.
It was deemed uninformative to present the calibrated parameter values since they are highly location dependant and subject to the equifinality issue (see previous section).Table 7 however highlights the final parameter ranges for GEM-Hydro-UH.Overall, GEM-Hydro-UH outperforms MESH and WATFLOOD, both in calibration and validation (Fig. 54).
The robustness of the models is generally quite good, but less so for MESH on the Black River (subbasin 7 in Fig. 54).
When looking closely at the Moira River hydrographs (Fig. 65), important differences arise between the models.
For instance, WATFLOOD has a more flashy behavior and tends to overestimate peak flow events, MESH generally underestimates flows, and GEM-Hydro-UH lays somewhere in between.Peak flow events (even for other subbasins) associated to the spring freshet are generally better represented by MESH, which may be due to a better representation by CLASS of various cold regions hydrological processes, such as snow accumulation and melt, snow interception by vegetation, as well as soil freezing and thawing.Peak flow events associated to the spring freshet are generally better represented by MESH, which may be due to a better representation of the soil freezing and melting processes occurring in

CLASS (MESH LSS).
It is possible that the differences in model performance may be explained by the different calibration strategies used for each model, and that better performances could be obtained with MESH and WATFLOOD for these watersheds, although the calibration details were in each case determined by an expert user of each model.The optimal calibration strategy, as well as the number of free parameters, could be revisited for each model in order to see if this explains the above differences, but this is quite beyond the scope of the paper.
Even if the intercomparison is obviously limited in the number of available test cases, it allows highlighting the mandatory need of calibrating hydrologic models, that models have unique behaviors that translate in substantial differences in hydrographs, and that each of the models could benefit from some strengths of its competitors.For example, SVS would likely benefit from the implementation of the soil freezing and melting processes that are present in CLASS.
Results however strongly indicate that SVS can compete with more established Canadian models for simulating streamflow.In the coming years, after SVS becomes operationally implemented within ECCC's GEM-based NWP systems, it will be possible to obtain useful streamflow predictions by simply post-processing the runoff output from GEM using a unit hydrograph, or by routing these time series using a more sophisticated routing scheme.

Runoff estimation for the whole Lake Ontario basin
The parameter values identified from the global calibration are presented in Table 7, along with the ranges resulting from local calibrations.See Sect.1.4 for more information about methodology related to global calibration.It can be seen from Table 7 that final global parameters generally lay inside the intervals obtained from local calibration, highlighting the trade-off found by global calibration.Moreover, it was noticed (not shown here) that parameter values were very different between local and global calibration procedures, even for catchments displaying very similar performances between the two strategies (such as subbasins 3, 5 and 8, see Fig. 76), highlighting the fact that local calibration is more prone to overcalibration (i.e., equifinality).. GEM-Hydro-UH results are given first for each gauged subbasin, in order to compare global calibration, local calibration and default parameters (Fig. 76), followed by GR4J and GEM-Hydro results for the GRIP-O gauged area and the whole Lake Ontario watershed (Table 8 and Fig. 98).
GEM-Hydro-UH performances are lower with global calibration than with local calibration, as expected, and sometimes even lower after global calibration than with the default parameters for some subbasins (notably 10 and 11, Fig. 67).However, performances are satisfactory for most of the 14 GRIP-O subbasins with a single parameter set, which confirms that global calibration fulfilled expectations.Given that it takes between 2 to 5about 7 days to achieve a local calibration, global calibration, which was completed in 210 days, allows to save a substantial amount of computational time.
Furthermore and as previously stated, global calibration favors the spatial consistency of parameters and facilitates parameter transfer to ungauged areas, whereas there is no a priori best manner to transfer parameter values obtained from local calibration (Razavi and Coulibaly, 2012;Parajka et al. 2013).In this study, the strategy related to parameter transfer to the ungauged subbasins is based on spatial proximity, which was already identified as among the best parameter transfer methods for this type of climate in Canada (Razavi and Coulibaly, 2012).Despite a comprehensive assessment of the reliability of the methodology used here for parameter transfer would require the "leave-one-out" framework (see Razavi and Coulibaly, 2012), the satisfying performances and temporal robustness obtained for all GRIP-O subbasins with global calibration, along with the spatial consistency of the unique final parameter set, the homogeneity of the area under study and the spatial proximity of ungauged catchments together justify the relevance and a priori reliability of the methodology employed in this study.This statement if moreover supported by the evaluation performed further down for the whole watershed.
Performance evaluation for the total GRIP-O gauged area (Table 8) shows that GR4J is better than GEM-Hydro-UH in calibration, but worse in validation.GEM-Hydro-UH leads to a very satisfactory performance, but most importantly to a better streamflow simulation than GR4J in terms of dynamics (see Fig. 87).Note that the smoother GR4J behavior is not due to the single model approach for the whole area, as a similar behavior occurred when aggregating simulations from local GR4J models (Gaborit et al., 2016 a).This smooth behavior seems inherent to the lumped attribute and concepts of GR4J.
As depicted in Table 8, performances for the GRIP-O gauged area obtained with GEM-Hydro are close to those obtained with GEM-Hydro-UH, despite being lower for the former, which comes from the standard (uncalibrated) Manning coefficients used with GEM-Hydro, whereas the UH lag time was adjusted during the calibration of GEM-Hydro-UH.
Runoff simulations for the whole Lake Ontario watershed, including its ungauged areas, are very promising (Table 8).Even if runoff observations actually consist in this case in estimations based on the ARM, computed performances are a priori reliable given that the true gauged fraction of the total area is equal to about 70%.GEM-Hydro (and GEM-Hydro-UH) tends to overestimate streamflow total volumes (Table 8, PBIAS), while GR4J achieves a better estimation of the total runoff volumes.The fact that GR4J is better than GEM-Hydro-UH in terms of PBIAS is attributed to the fact that GR4J consists in a single (global) model for the whole area considered.PBIAS values obtained with local GR4J models were poorer (Gaborit et al., 2016 a).
It is important to emphasize that for the whole watershed including its ungauged parts, runoff was estimated with GEM-Hydro instead of GEM-Hydro-UH, which means that streamflow simulations are available at all points inside the domain, whereas GR4J only delivers estimations at the outlet.Moreover, even if the scores are slightly better for GR4J, the streamflow dynamics are generally better represented by GEM-Hydro, as is the case for example for the 2006 summer of Figure 87: GR4J represents a smooth streamflow recession, while GEM-Hydro-UH better follows the small peaks and drops occurring during the recession.
It is therefore argued that the methodology proposed here (global calibration of GEM-Hydro-UH and parameter transfer to GEM-Hydro) is relevant, efficient, and reliable, provided that a large enough fraction of the total area is gauged.It could moreover be applied in different climatic contexts, regions, and with different models.
Simply extrapolating GEM-Hydro-UH simulated flows from the GRIP-O gauged area to the whole Lake Ontario watershed with the ARM leads to the exact same performances as those of the GRIP-O gauged area, because when doing so, we end up with both the simulated and observed flows being extrapolated the same way (i.e., with the ARM), which does not change the scores at all.Based on these scores, it could be tempting to conclude that extrapolating the GEM-Hydro-UH flows to the whole Lake Ontario watershed leads to better results than transferring the calibrated parameters to GEM-Hydro over the whole Lake Ontario watershed, but it has to be reminded that for the whole watershed, observed flows are estimated with the ARM, which does not allow to completely trust the scores obtained (Table 8).No test was performed by implementing GEM-Hydro-UH over the whole basin.
Finally, Lake Ontario monthly NBS were estimated with the globally calibrated GEM-Hydro model, and results were compared both to the GLERL residual and component NBS estimates (Fig. 98).Residual NBS rely on the lake observed change in storage and streamflows for the Niagara and St-Lawrence rivers (DeMarchi et al., 2009).Component NBS used here are based on the GLERL Monthly Hydrometeorological Database (GLM-HMD;Hunter et al., 2015), which relies on observed data extrapolated with the ARM for runoff, on observed data interpolated with the Thiessen polygon method for overlake precipitation, and on the Large Lake Thermodynamics lumped Model (LLTM) for overlake evaporation.Component NBS estimates are updated on a regular basis.Data used in this work were updated on August 2nd, 2016.It is still unknown which of these two NBS estimation methods (i.e., residual or component method) is the most accurate (DeMarchi et al., 2009).
It can be seen that the cumulated NBS estimates derived from the calibrated GEM-Hydro model (using global calibration) stand between the component and residual NBS estimates, but are closer to the latter ones.It is however difficult to draw any conclusion regarding the bias of these estimation methods given the uncertainty associated with NBS estimates (DeMarchi et al., 2009).When comparing the GLM-HMD component NBS method to the calibrated GEM-Hydro simulation on a component-by-component basis, the main difference between the two occurs for overlake evaporation, with evaporation from the component method being significantly lower than GEM-Hydro evaporation (not shown).This mainly explains why the NBS estimates from the component method are higher than the other estimates in Figure 8.But again, it is not possible to accurately evaluate overlake evaporation estimates given the lack of observations for this variable.The uncalibrated GEM-Hydro model results in cumulative NBS estimations which are below all other NBS estimations, which tends to suggest that they are underestimated.Therefore, the methodology proposed to calibrate GEM-Hydro seems to improve Lake Ontario NBS simulations.

Conclusion
Our results indicate that the SVS LSS, as embedded in GEM-Hydro and GEM-Hydro-UH, provides a reasonable simulation of runoff to Lake Ontario.This result is encouraging because SVS is expected to replace ISBA in ECCC operational models in the near future.However, there is still room to further improve SVS.For example, as illustrated while comparing GEM-Hydro-UH, WATFLOOD and MESH, SVS may benefit from the implementation of soil freeze-thaw processes, the current absence of which is assumed to be partly responsible for SVS missing some of the runoff peaks in spring.A new snow module (ISBA-ES) is also being implemented into SVS, which currently relies on a simple force-restore approach.Finally, work is under way to represent a surface of variable area of ponded water in each SVS grid cell, in order to represent subgrid-scale lakes, wetlands, and to better represent account for the delay associated with surface runoff transfer into the streams..
According to the intercomparison experiment conducted on three subbasins, GEM-Hydro-UH and GEM-Hydro are competitive to MESH and WATFLOOD.However, as a limited number of subbasins were used for the inter-comparison due to computational time limitations, no general model ranking can be derived from this study.Calibration has of course proven that it is mandatory to optimize model performances.The calibrated GEM-Hydro-UH performances are close to GR4J ones (Gaborit et al., 2016 a).
The potential benefits of global calibration have been demonstrated here.It achieves satisfactory performances for a large area with a unique calibration and favors temporal robustness, spatial consistency, and parameter transferability.Global calibration of SVS is envisioned in future versions of the WCPS and has already proven interesting for different modeling platforms too, such as Hydrotel (Gaborit et al., 2015).
It is also envisioned to assess the benefits of SVS global calibration in improving weather forecasts, as a calibrated SVS could be coupled to the RDPS atmospheric model, and because a calibrated SVS version should improve surface fluxes representation.Calibrating a LSS based on streamflow and then using it in an atmospheric model to improve weather forecasts has not been reported in the literature so far, to our knowledge.
Finally, an efficient and transferable methodology has been proposed to estimate runoff for ungauged parts of a watershed.However, the method is not applicable if the area is completely ungauged.For this, however, GEM-Hydro has proven able to produce decent, generally satisfactory runoff simulations with default parameter values, except for areas with a high urban cover fraction, which needs further investigation.
In order to calibrate the GEM-Hydro model, its routing part was replaced by a simple UH during calibration, which saves a tremendous amount of computational time.The routing part of GEM-Hydro can be run afterwards, potentially adjusting the standard Manning values if needed (which can be done manually with a few runs).Lumped models have limited applications, while distributed ones can be useful to a number of environmental studies.Many distributed models do exist worldwide, each one possessing its own advantages and drawbacks, but also its own optimal implementation and calibration methodology, which makes a perfectly fair inter-comparison quite challenging, if not unrealistic.
This work successfully led to the implementation of an efficient distributed hydrological modeling platform for the land portion of Lake Ontario watershed, which has therefore become a readily testing ground for distributed models, for example for upcoming SVS improvements which are currently being implemented at ECCC and whose resulting benefits on streamflow simulations are dedicated to future work.Tables Table 1:

FigureFigure 2 :
Figure 3: diagram summarizing the methodology employed to simulate Lake Ontario runoff with GEM-Hydro 5

Figure 33 :Figure 44 :Figure 55 :
Figure 33: Uncalibrated and calibrated GEM-Hydro-UH performances over the calibration period.Results are presented as NSE