Scaling, similarity, and the fourth paradigm for hydrology

. In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm) and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modeling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses) that require testing; describe a mutual information framework for testing these hypotheses; describe boundary condition, state, ﬂux, and parameter data requirements across scales to support testing these hypotheses; and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity.

Abstract. In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm) and assert that it is time for hydrology to embrace a fourth paradigm of dataintensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modeling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses) that require testing; describe a mutual information framework for testing these hypotheses; describe boundary condition, state, flux, and parameter data requirements across scales to support testing these hypotheses; and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity.

Introduction
This synthesis paper is an outcome of the "Symposium in Honor of Eric Wood: Observations and Modeling across Scales", held 2-3 June 2016 in Princeton, New Jersey, USA. The focus of this contribution is the heterogeneity of hydrological processes; their organization, scaling, and similarity; and the impact of the heterogeneity on water and energy states and fluxes (and vice versa). We argue here that the growth of hydrologic science, from empiricism (first paradigm), via theory (second paradigm), to computational simulation (third paradigm) has yielded important advances in understanding and predictive capabilities -yet we argue that accelerating advances in hydrologic science will require us to embrace the fourth paradigm of data-intensive science, to use emerging datasets to synthesize and scrutinize theories and models, and improve the data support for the mechanisms of Earth system change.
The fourth paradigm is a concept that focuses on how science can be advanced by enabling full exploitation of data via new computational methods. The concept is based on the idea that computational science constitutes a new set of methods beyond empiricism, theory, and simulation, and is concerned with data discovery in the sense that researchers and scientists require tools, technologies, and platforms that seamlessly integrate into standard scientific methodologies Published by Copernicus Publications on behalf of the European Geosciences Union. and processes. By integrating these tools and technologies for research, we provide new opportunities for researchers and scientists to share and analyze data and thereby encourage new scientific discovery. As shown in Fig. 1, the scientific method applied to hydrology is not a linear processrather, because hydrology is already in the third paradigm, empiricism (the first paradigm) and theoretical development (the second paradigm) both lead to new theories and hypotheses that are embodied in computational models. These hypotheses may not be rigorously tested with many datasets, either because the datasets have not been gathered into an effective, accessible platform, or because the datasets require additional processing and information theoretic techniques to apply them to the model predictions for hypothesis testing. Further, as noted by Pfister and Kirchner (2017), hypothesis testing with models is fraught with challenges that require not only consideration of the data required to test a given hypothesis, but also careful consideration of how to encode hypotheses as uniquely falsifiable predictions (Fig. 1). Advances in data science now allow the fourth paradigm to inject "big data" into the scientific method using rigorous information theoretic methods without eliminating the other parts of the scientific method.
Our focus here on scaling and similarity directs attention to one of the most challenging problems in the hydro-logic sciences. As defined by , scale is a "characteristic length (or time) of process, observation, model" and scaling is a "transfer of information across scales" (see also Bierkens et al., 2000;Grayson and Blöschl, 2001). Functional relationships between hydrologic variables may also exist and these may be scale-independent (or scaleinvariant). Similarity is present when characteristics of one system can be related to the corresponding characteristics of another system by a simple conversion factor, called the scale factor. We should note that the terms "scaling" and "similarity" used here are specific to the hydrology literature and distinct from the general notions of self-similarity, fractals, and emergent behavior in the nonlinear dynamics literature. Classic examples of similarity include the ratio of catchment areas (Willgoose et al., 1991;Smith, 1992), and the topographic index ln (a/ tan β) (Beven and Kirkby, 1979) that are used for relating flows of two catchments and relating the topographic slopes and contributing areas to water table depths, respectively. Other examples include the hillslope Péclet number (Berne et al., 2005;Lyon and Troch, 2007) and the catchment seasonal water balance (Berghuijs et al., 2014). Heterogeneity or variability in hydrology manifests itself at multiple spatial scales (e.g., Seyfried and Wilcox, 1995;, from local (O(1 m); e.g., macropores) to hillslope (O(100 m); e.g., preferential Hydrol. Earth Syst. Sci., 21, 3701-3713, 2017 www.hydrol-earth-syst-sci.net/21/3701/2017/ flowpaths) to catchment (O(10 km); e.g., soils) and regional (O(1000 km); e.g., geology). Similarly, temporal variability is reflected on event, seasonal, and decadal timescales (e.g., Woods, 2005). Understanding scaling and similarity requires understanding how the interactions among multiple processes across scales affect the (emergent) hydrologic behavior on other space-time scales; such understanding underpins methods for computational simulation. The scaling and similarity problem is nevertheless very difficult. As asserted by Dooge (1986), "within the physical sciences and the Earth sciences there is and can be no universal model for water movement." Despite numerous attempts at integrating local models across soils (e.g., Kim et al., 1997), hillslopes (Troch et al., 2015, and watersheds (e.g., Reggiani et al., 1998Reggiani et al., , 1999Reggiani et al., , 2000Reggiani et al., , 2001, universal laws in hydrology and the required closure relations remain elusive because the physics are likely scale-dependent (e.g., Bierkens, 1996) and the data required to test these hypotheses are either not readily available or not easily synthesized, or, even worse, would never be observable (Beven, 2006). Further, computational advances have enabled so-called "hyperresolution" or, using an alternative term that is not necessarily equivalent, "hillslope-resolving" modeling (e.g., Chaney et al., 2016;Wood et al., 2011); but as noted in the discussion between Beven and Cloke (2012) and Wood et al. (2012), and later discussed in Beven et al. (2015), the ability to provide meaningful information from hillslope-resolving models is limited both by a lack of tested parameterizations on a given model scale as well as by lack of data for model evaluation (e.g., Melsen et al., 2016a).
In principle, moving to finer spatial and temporal resolutions may improve accuracy simply by reducing the truncation error in the numerical solution of the system of partial differential equations. In an analogy with fluid mechanics and the atmospheric sciences where "large-eddy simulations" are designed to capture the most energetic motions and thereby reduce the sensitivity to turbulence closure, one might ask whether "hillslope-resolving" models might resolve the most energetic components (in an information theoretic or entropy sense) of the terrestrial water storage spectrum such that the closure problem may be simplified. As discussed in many of the studies cited above, topography is fractal and this, combined with scaling between the pedon and the hillslope, drives much of the scaling behavior seen in hydrology. Most of the apparent fractal nature in relation to hydrology has been demonstrated on the scale of river networks (e.g., Tarboton et al., 1988), so a question that could be tested with data following the fourth paradigm is to what extent does resolving these river networks in models reduce the information loss. Further, proposed scaling relationships may be appropriate above a given scale, but as we move downward in scales from watershed to hillslope to local, these relationships may break down.
These current tactics in the hydrologic sciences are representative of the third paradigm of scientific investigation (Hey et al., 2009), characterized by applying computational science to simulate complex systems. The so-called third paradigm builds on the earlier first (empirical) and second (theoretical) paradigms. As discussed by Clark et al. (2017), computational science approaches to modeling hydrologic systems have been discussed for decades. With the advent of high-resolution Earth observing systems (McCabe et al., 2017), proximal sensing (Robinson et al., 2008), sensor networks , and advances in data-intensive hydrologic science (e.g., Nearing and Gupta, 2015), there is now an opportunity to recast the hydrologic scaling problem into a data-driven hypothesis testing framework (e.g., Rakovec et al., 2016a). By embracing such a framework, hydrologic analysis can become explicitly "scale-aware" by testing specific parameterizations on a given model scale. Now is the time for a fourth paradigm in hydrologic science.
With this goal in mind, this paper addresses the following questions: 1. What are the key scaling and similarity concepts (hypotheses) that require testing?
2. What framework could we use to test these hypotheses?
3. What are the data requirements to test these hypotheses? and 4. What are the model requirements to test these hypotheses?

Scaling and similarity hypotheses
Most scaling work to date has built on the representative elementary area (REA) concept Fan and Bras, 1995), and extensions to the representative elementary watersheds (REW) introduced by Reggiani et al. (1998Reggiani et al. ( , 1999Reggiani et al. ( , 2000Reggiani et al. ( , 2001)the REA-REW concept seeks to define physically meaningful control volumes for which it is possible to obtain simpler descriptions of the rainfall-runoff process (i.e., simpler than those on the point scale). An alternative, but related, concept is the representative hillslope (RH; Berne et al., 2005;Hazenberg et al., 2015). The REA-REW approach is conceptually similar to Reynolds averaging, and relies on the fundamental assumption that the physics are known on the smallest scale considered (e.g., Miller and Miller, 1956). Critically, the fluxes at the boundaries of the model control volumes require parameterization (the so-called "closure" relations). These closure assumptions are typically ad hoc and include subgrid probability distributions, scale-aware parameters, or new flux parameterizations. Fundamentally, these approaches conform to the third paradigm, in the sense that they take as given a set of conservation equations that govern behavior at the fundamental (patch, tile, grid, hillslope, or REW) scale (Fig. 2). Testing both the scaling and closure assumptions as hypotheses using data would move hydrology towards the fourth paradigm.
The examples above represent the classic "Newtonian" approach in hydrology, but the fourth paradigm advocated here is not specific to testing hypotheses derived from that approach and, as shown in Fig. 1, represents an augmentation to the scientific method in hydrology. Foundational (Sivapalan, 2005;McDonnell et al., 2007) and more recent work (Thompson et al., 2011;Harman and Troch, 2014) on "Darwinian" hydrology has used scale and similarity concepts to synthesize catchments across scales, places, and processes. As noted in McDonnell et al. (2007) there has been a call for a reconciliation of the Newtonian and Darwinian approaches, starting first in the ecology community (Harte, 2002), and we believe that moving to a fourth paradigm with the augmented scientific method depicted in Fig. 1 will embody the wishes of Darwin from his "Structure of Coral Reefs", as quoted in Harman and Troch (2014): ". . . In effect, what an immense addition to our knowledge of the laws of nature should we possess if a tithe of the facts dispersed in the Journals of observant travellers, in the Transactions of academies and learned societies, were collected together and judiciously arranged! From their very juxtaposition, plan, correlation, and harmony, before unsuspected, would become instantly visible, or the causes of anomaly be rendered apparent; erroneous opinions would at once be detected; and new truths -satisfactory as such alone, or supplying corollaries of practical utility -be added to the mass of human knowledge. A better testimony to the justice of this remark can hardly be afforded than in the work before us." An important avenue to advance hydrologic understanding and predictive capabilities is through attention to hypotheses of hydrologic scaling and similarity, i.e., different ways to relate processes and process interactions across spatial scales. One of the foundational works in hydrologic similarity is the topographic index (Beven and Kirkby, 1979) -the topographic index defines local areas of topographic convergence and is used to relate the probability distribution of local water table fluctuations to catchment-average surface runoff and subsurface flow. Building on this topographic similarity, this index was expanded to include soils and study runoff production (Sivapalan et al., 1987(Sivapalan et al., , 1990 and was further applied to examine scaling of evaporation (Famiglietti and Wood, 1994) and soil moisture Peters-Lidard et al., 2001). Such controls of water table depth on runoff production and evapotranspiration on catchment scales represent just one hypothesis of similarity and scaling behavior -an example alternative hypothesis, used in the variable infiltration capacity (VIC) model (Liang et al., 1994), is the description of how subelement variability in soil moisture affects the development of saturated areas in a catchment and the partitioning of precipitation into surface runoff and infiltration (Moore and Clarke, 1981;Dümenil and Todini, 1992;Wood et al., 1992;Hagemann and Gates, 2003). Other scaling hypotheses are used for other physical processes, for example, how small-scale variability in snow affects large-scale snow melt (Luce et al., 1999;Liston, 2004;Clark et al., 2011a) and how energy fluxes for individual leaves scale up to the vegetation canopy (de Pury and Farquar, 1997;Wang and Leuning, 1998).
The critical issue here is the interplay between the scale of the model elements and the choice of the closure relations: as computational resources permit higher resolution simulations across larger domains (Wood et al., 2011), more physical processes can be represented explicitly, and the closure relations must be tailored to fit the spatial scale of the model simulation. To some extent such hyper-resolution approaches abandon the quest for physically meaningful control volumes that characterizes the REA and REW concepts, and the representation of subelement processes in fully 3-D simulation of watersheds (e.g., Kollet and Maxwell, 2008;Maxwell and Miller, 2005) is becoming less and less obvious, and perhaps less and less necessary. A key question now is whether hyper-resolution applications through explicit 3-D models, or (at least for some variables) with clustered 2-D simulations (e.g., the HydroBlocks of Chaney et al., 2016), provide reasonable representations of scaling and similarity. Considering infiltration excess and saturation excess runoff generation processes, high-resolution numerical studies indicate that excess infiltration does not appear to have an ergodic limit (e.g., Maxwell and Kollet, 2008), while excess saturation processes scale with the geometric of subsurface saturated hydraulic conductivity (e.g., Meyerhoff and Maxwell, 2011). Similarly, one might imagine different scaling relations for evapotranspiration depending on the nature of controls due to radiation (topography), vegetation, and/or soil moisture (e.g., Rigden and Salvucci, 2015). For example, as recently shown by Maxwell and Condon (2016), the interplay of water table depths with rooting depths along a given hillslope exerts different controls on evaporation and transpiration, which links the water table dynamics with the land surface energy balance, even on continental scales. This finding is based on limited data, and would benefit from formal hypothesis testing in an information-based framework, as described in the next section.

A hypothesis testing framework for hydrologic scaling and similarity
As demand increases for hillslope-resolving or hyperresolution modeling (e.g., Beven et al., 2015;Beven and Cloke, 2012;Bierkens et al., 2015;Wood et al., 2011Wood et al., , 2012, the question arises as to whether the physics in our models, the parameters that are used in the models, and the input data (e.g., "forcings") are adequate to support such endeavors (e.g., Melsen et al., 2016b). Following from Nearing and Gupta (2015), we can formulate a framework for testing hypotheses based on measuring information provided by a model (e.g., parameterizations based on similarity concepts) as distinct from information provided to a model (e.g., forcing data or parameters). We should note that this is not hy-pothesis testing in the traditional sense, but rather a framework for scrutinizing hydrological scaling and similarity hypotheses with data. This concept was demonstrated by Nearing et al. (2016), who evaluated the information loss due to forcing data, parameters, and physics in the North American Land Data Assimilation System (NLDAS) model ensemble.
In this example, information was first measured using point data for soil moisture and evaporation and then compared to regressions that are kernel density estimators of the conditional probability densities and represent the upper bound of information available on a given variable from the forcing data alone and given the forcing data and parameters. As shown in Fig. 3, we can measure the total information about a given variable z contained in observations (H (z), left bar) and then measure the information about that variable provided by a given model simulation (I (z; y M ), right bar). The intermediate bars represent losses of information due to forcing data (boundary conditions) and due to parameters. If we take this example, and expand it to conceptualize a framework for hypothesis testing in hydrology, we can imagine multiple instances of H (z) computed on different spatial scales, as well as multiple instances of mutual information I (z, y M ), computed for models employing different representations of processes on that scale. One concrete example hypothesis described in the previous section is the use of TOPMODEL parameterizations for groundwater, versus representative hillslopes, versus "HydroBlocks" (Chaney et al., 2016), versus explicit 3-D modeling.
Critical to this exercise is the availability of forcing data, such as precipitation, radiation, humidity, temperature, and wind speed, that have sufficient information content on the scale being evaluated such that it can adequately characterize the variable (e.g., soil moisture) or process (e.g., evapotranspiration, runoff) being studied (e.g., Berne et al., 2004). Similarly, the parameters provided to the model must also contain information about the variable or process being studied on a particular spatial and temporal scale. The Nearing and Gupta approach provides a framework for explicitly measuring the information available from observations, comparing that to information provided by a model and attributing lost information to forcings, parameters, and physics, and hence provides a rigorous method to test our physics assumptions by confronting them with observations. Clearly, this leads to requirements for data that can support such a framework.

Data requirements
As shown in Fig. 1, the fourth paradigm for hydrology is characterized by the rigorous application of large datasets towards testing hypotheses as encapsulated in models. The process of constructing models requires observations both as input data and for model and process validation or hypothesis testing. A distinguishing characteristic of data for The term H (z) represents the total uncertainty (entropy) in the benchmark observations, and I (z; u) represents the amount of information about the benchmark observations that is available from the forcing data. Uncertainty due to forcing data is the difference between the total entropy and the information available in the forcing data. The information in the parameters plus forcing data is I (z; u), and I (z; u, θ ) < I (z; u) because of errors in the parameters. The term I (z; y M ) is the total information available from the model, and I (z; y M ) < I (z; u, θ) because of model structural error. model and process validation will be that we are observing spatial and temporal patterns of fluxes and states represented in our modeling framework, for example, soil moisture, snow pack or evapotranspiration. As discussed by Mc-Cabe et al. (2017), there has been a dramatic increase in the type and density of hydrologic information that is becoming available on multiple scales, from point-to mesoscale and regional to global. For example, the number of remote sensing missions dedicated to observing the water cycle allows further development of (large scale) hydrological models and data assimilation frameworks for more accurate soil moisture, evaporation, and streamflow prediction. In particular, there are exciting developments in mesoscale (i.e., hillslope to catchment) observations, which are critical for testing hypotheses about scaling (REA, RH, REW) by connecting point measurements, hydrological models, and remote sensing observations. Examples include recent advances in cosmic ray neutron sensors (Franz et al., 2015;Köhli et al., 2016;Zreda et al., 2008), distributed temperature sensing (DTS; Steele-Dunne et al., 2010;Bense et al., 2016;Dong et al., 2016), soil moisture observations, the use of crowdsourcing (de Vos et al., 2016) and microwave signal propagation from telecommunications towers for precipitation (Leijnse et al., 2007), to the rise in the use of unmanned autonomous vehicles to characterize the landscape on centimeter scale (Vivoni et al., 2014). These alternative data sources enhance our ability to observe, understand, and simulate the hydrological cycle. Advances in citizen science (Buytaert et al., 2014;Hut et al., 2016) and the use of so-called "soft" data for hydrological modeling (Van Emmerik et al., 2015;Seibert and McDonnell, 2002) show that even though these new data are collected on nontraditional spatiotemporal scales, they might give us new insights into how processes on different scales are coupled. Advances in hydrogeophysical characterization of the subsurface (Binley et al., 2015), such as electrical methods, ground-penetrating radar, and gravimetry, offer non-invasive mesoscale information that can be used to provide parameters or to infer boundary conditions, states, or fluxes. Recently, Christensen et al. (2017) demonstrated that dense airborne electromagnetic data can be used to map hydrostratigraphic zones, which is an encouraging capability. Imaging the subsoil may be feasible on local scales, but it is a challenge on river basin or continental scales. Hence, we encourage more joint efforts in hydrogeophysical imaging for integrated characterization of the subsurface.
Combined, these observations may be used in a benchmarking exercise similar to Nearing et al. (2016). Synthesizing hydrogeophysical methods with point observations and laboratory or field techniques for estimating "effective" soil hydraulic functions and parameters is a challenging opportunity (e.g., Kim et al., 1997), but one which might be tractable using a data-driven hypothesis testing framework. These new data sources allow us to understand and apply scaling between data sources (point scale to remotely sensed data) and between model scales and provide the critical data required to test alternative scaling hypotheses.
Beyond the new mesoscale observations, extensive catchment databases now exist to support hypothesis testing including the TERENO (Zacharias et al., 2011), MOPEX (Duan et al., 2006), contiguous USA benchmarking (Newman et al., 2015a), GRDC (http://www.bafg.de/GRDC/ EN/01_GRDC/13_dtbse/database_node.html), and EURO-FRIEND databases (Stahl et al., 2010). Recent similarity studies  have systematically analyzed large numbers of catchments focusing on streamfloworiented signatures such as the runoff coefficient, baseflow index, and slope of the flow duration curve and have then explored relationships between these signatures and model process timescales . Coopersmith et al. (2012) generalized this work with four nearly orthogonal signatures that included aridity, seasonality of rainfall, peak rainfall, and peak streamflow and demonstrated that 77 % of MOPEX catchments can be described by only six classes, which are themselves defined by combinations of the four signatures. Clearly there is information contained in these catchment databases about not just the coevolution of climate (forcing) and landscape properties (parameters), but also the physics of the catchment responses. Comparative hydrology (e.g., Kovács, 1984;Falkenmark and Chapman, 1989;Gupta et al., 2014) takes the first necessary step in the direction of the fourth paradigm, and following the framework described above, we can explicitly quantify the mutual information in the signatures, parameters, and forcings to help elucidate these connections beyond classification. One of the crucial factors that complicate scaling is the anthropogenic effect on Hydrol. Earth Syst. Sci., 21, 3701-3713, 2017 www.hydrol-earth-syst-sci.net/21/3701/2017/ catchments. Recent advances in modeling the coevolution of the human-water system (see, e.g., Troy et al., 2015;Ciullo et al., 2017) focused on identifying generic key processes and relations. Yet, it is unknown how these relate to systems on larger (and smaller) scales. To arrive at new understandings of scaling and similarities in human-influenced catchments, studying these issues from a sociohydrological point of view should be an integrated part of the way forward (e.g., Van Loon et al., 2016).

Modeling framework requirements
Embracing the fourth paradigm in hydrology will face several challenges. First, it is necessary to implement and/or extend a hydrologic modelling framework with sufficient flexibility to evaluate competing hypotheses of similarity and scaling behavior (Clark et al., 2011b). One possible framework is the Structure for Unifying Multiple Modeling Alternatives (SUMMA), recently introduced by , which has the capability to incorporate alternative spatial configurations and alternative flux parameterizations. Frameworks like SUMMA, which pursue the method of multiple working hypotheses, enable the decomposition of complex models into the individual decisions made as part of model development so that attention can be focused on specific decisions (e.g., related to scaling and similarity) while keeping all other components of a model constant, hence enabling users to isolate and scrutinize specific hypotheses. One confounding issue is that models with parameterizations designed to represent subgrid processes may not add information in a manner proportional to increased information in the inputs, while models that have a single column tile or subtile form may show a more direct relationship between information in inputs and information in outputs. Similarly, integrated models with lateral flow of water in surface and subsurface systems that generate runoff directly will have a different spatial sensitivity to the resolution of the input data than more traditional land surface models with no lateral flow and a parameterized runoff generation. Hence, the modeling framework must be able to isolate the role that surface and subsurface connectivity play in processing information on different scales. A second challenge consists of understanding how to deal with different uncertainties and errors of different observational products and hydrologic models when comparing them for the purpose of studying the scaling behavior. Several papers have highlighted the problem of different climatologies or sensitivities of remote sensing products (e.g., Albergel et al., 2012;Brocca et al., 2011), gridded meteorological products (Clark and Slater, 2006;Newman et al., 2015b), and streamflow observations (Di Baldassarre and Montanari, 2009;McMillan et al., 2010). A true correspondence of these remotely sensed variables with model results is often hampered, due to vertical mismatches in the soil column between the different products (Wilker et al., 2006), approximations in the structure of the hydrological model used, its parameterization and discretization, the initial conditions, and errors in forcing data (De Lannoy et al., 2007). Because of this, modeled variables often do not correspond well to observations; nevertheless, similar trends and dynamics between the different products are found (Koster et al., 2009). In several data assimilation studies, the problem of differences in climatologies is resolved by bias-correcting the observations towards the model (e.g., Crow et al., 2005;Kumar et al., 2014;Lievens et al., 2015a, b;Martens et al., 2016;Reichle and Koster, 2004;Sahoo et al., 2013;Verhoest et al., 2015). Yet, such (statistical) operations may not be appropriate for scaling studies. First of all, these methods only rescale the remotely sensed value, yet the uncertainties in these products need rescaling as well. Second, depending on the biascorrection method used (ranging from only correcting for the first moment to full cumulative distribution function (CDF) matching), different scaling relations may be found. Ideally, multiscale data should be used in a way that best demonstrates the ability of the models to reproduce processes at the scales at which those data are available, particularly with respect to reproducing attributes of dynamics (such as the time rate of decorrelation using an information metric) and the mutual information across variables, space, and time.
Testing hypotheses with multiple scale information also requires assimilation-modeling frameworks that allow integration of data into models at their native resolution so that simulations and observations can be compared without the need to introduce ad hoc downscaling or upscaling rules. One such framework has recently been proposed by Rakovec et al. (2016b). This framework uses the multiscale parameter regionalization (MPR; Samaniego et al., 2010) technique to link the resolutions of the various data sources with the target modeling resolution, keeping a single set of model transfer parameters that are applicable to all scales. As a result, seamless, flux-matching simulations can be obtained. The MPR-based assimilation framework proposed by Rakovec et al. (2016b) is general and can be used within any land surface or hydrologic model. This framework was originally tested with a mesoscale hydrological model (mHM) (Kumar et al., 2013;Samaniego et al., 2010) in order to test hypotheses related to model transferability across scales and locations as well as process description. This data assimilation approach is general and can be used -for example within the SUMMA  modeling framework -to test hypotheses related to the appropriate model complexity on a given scale. A model-agnostic MPR system called MPR-flex has been recently applied to the VIC model to estimate seamless parameter and flux fields over the contiguous USA (Mizukami et al., 2017). This symbiosis of model parameterization (MPR-Flex) and simulation frameworks (e.g., SUMMA, mHM, etc.) is a very promising avenue to test scaling laws as well as the uncertainty decomposition described above. Finally, the issue of subjective modeling decisions (e.g., the choice of time step, spatial resolution, numerical scheme, study region, time period for calibration and validation, performance metrics, etc.) and associated uncertainties is an issue that requires further attention (e.g., Krueger et al., 2012).

Summary and next steps
In this paper we review advances in hydrologic scaling and similarity. Beginning with the challenge of Dooge (1986), we posit that roadblocks in the search for universal laws of hydrology are hindered by our third-paradigm approach, and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Building on other synthesis papers in this issue McCabe et al., 2017), advances in data-intensive hydrologic science (e.g., Nearing and Gupta, 2015) have laid the foundation for a datadriven hypothesis testing framework for scaling and similarity. To achieve this goal, we have (1) summarized important scaling and similarity concepts (hypotheses) that require testing; (2) described a mutual information framework for testing these hypotheses; (3) described boundary condition, state flux, and parameter data requirements across scales to support testing these hypotheses; and (4) discussed some challenges to overcome while pursuing the fourth hydrological paradigm. Figure 1 illustrates the concept of embracing a fourth paradigm in hydrology where we enable a rigorous confrontation of our hypotheses embodied within our models with a range of data types across many locations and spatial-temporal scales. This paradigm represents a union and extension of previous scientific methods within a formal hypothesis-driven framework. Models are a synthesis of all that we have learned (e.g., conservation equations, constitutive relationships for soil infiltration) and data, particularly through first-paradigm examples like comparative hydrology, yield empirical relationships, signatures, and fingerprints that help lead to new understanding and theory (second paradigm). By coupling traditional (e.g., in situ) and new data sources (e.g., satellites) we can use the power of information theory and rigorous hypothesis testing to elucidate the causes of behaviors that may not be evident in the analysis of individual sites or catchments. In this sense, a move to the fourth paradigm means that we seek modeling-driven monitoring and, simultaneously, monitoring-driven modeling. The formal hypothesis-driven framework will indicate where we have weak understanding of processes because we cannot explain the data obtained at high resolution. In other cases, comprehensive integrated simulations and big-data relationships would allow the identification of where the measurement errors are too large (i.e., data have little information content, entropy) and point out what kind of sensors or new measurements and sensors are needed to improve our physical understanding. These are the feedback loops in Fig. 1, and these represent two important paths to optimizing the use of models and data to enhance hydrologic science.
As a next step, we propose a focused community effort to shape the development of the fourth paradigm for hydrology. To this end, a workshop following the publication of this special issue would be a good first step.
Data availability. No data sets were used in this article.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Observations and modeling of land surface water and energy exchanges across scales: special issue in Honor of Eric F. Wood". It does not belong to a conference.