Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA. Hydrology and Earth System

. Hydrologic similarity between catchments, derived from similarity in how catchments respond to precipitation input, is the basis for catchment classiﬁcation, for transferability of information, for generalization of our hydrologic understanding and also for understanding the potential impacts of environmental change. An important question in this context is, how far can widely available hydrologic information (precipitation-temperature-streamﬂow data and generally available physical descriptors) be used to create a ﬁrst order grouping of hydrologically similar catchments? We utilize a heterogeneous dataset of 280 catchments located in the Eastern US to understand hydrologic similarity in a 6-dimensional signature space across a region with strong environmental gradients. Signatures are deﬁned as hydrologic response characteristics that provide insight into the hydrologic function of catchments. A Bayesian clustering scheme is used to separate the catchments into 9 homogeneous classes, which enable us to interpret hydrologic similarity with respect to similarity in climatic and landscape attributes across this region. We ﬁnally derive several hypotheses regarding controls on individual signatures from the analysis performed here.


Introduction
Catchments provide a sensible (though not the only possible) unit for a hydrological classification system. Despite the degree of uniqueness and complexity that each catchment exhibits (Beven, 2000), we generally assume that some level of organization and therefore a degree of predictability of the functional behavior of a catchment exists (Dooge, 1986). This organization may be a result of natural self-organization or the co-evolution of climate, soils, vegetation and topography (Sivapalan, 2005). The uniqueness of catchments limits the success of hydrological regionalization, but the long-term use of statistical methods in hydrology suggests that some information transfer is possible. Hydrology has thus far not established a common catchment classification system that would provide order and structure to the global assemblage of these heterogeneous spatial units (McDonnell and Woods, 2004;Wagener et al., 2007) and which would provide a first order grouping of hydrologically similar catchments with implications for hydrological theory, observations and modeling (Gupta et al., 2008;McMillan et al., 2010).
Identifying and categorizing dominant catchment functions as revealed through a suite of hydrologic response characteristics, such as those extracted from observed streamflow-precipitation-temperature datasets, is one strategy to quantify the degree of similarity that may exist between catchments (McIntyre et al., 2005;Wagener et al., 2007;Oudin et al., 2008Oudin et al., , 2010Samaniego et al., 2010;Lyon and Troch, 2010;Haltas and Kavas, 2011). Understanding how and why certain functional behavior occurs in a given catchment would ultimately shed new light on the reasons for the degree of similarity or dissimilarity that is exhibited between catchments (Gottschalk, 1985;Dooge, 1986). A range of benefits would be obtained if both catchment functions and their causes could be understood and formalized in a similarity framework, and therefore in a classification scheme (Grigg, 1965(Grigg, , 1967: 1. To give names to things, i.e., the main classification step.
2. To permit transfer of information, i.e., regionalization of information.
3. To permit development of generalizations, i.e. to develop new theory.
In the light of increasing concerns about non-stationarity of the responses of hydrologic systems (Milly et al., 2008;Wagener et al., 2010), we add a fourth benefit: 4 To provide a first order environmental change impact assessment, i.e., the hydrologic implications of climate, land use and land cover change.
All four of the above listed benefits are objectives of a catchment classification system to achieve order and new understanding while also providing predictive power. Achieving a generalization of knowledge beyond individual catchments or beyond a particular dataset has been a particular struggle in hydrology, as well as in other sciences related to the natural world (Beven, 2000;Harte, 2002). We believe that the task of catchment classification will be an essential element in this much hoped for generalization. So how should one define hydrologic similarity or dissimilarity in a catchment classification system? Past strategies for classification have largely focused on physical similarity (e.g., similarity in physical characteristics, or how the catchments look) or on similarity of some (narrowly defined) characteristic of the streamflow record (mainly based on flow regimes). Below we argue that both approaches fall short in achieving all of the four benefits listed above, and that the general idea of catchment function (Black, 1997;Sivapalan, 2005;Wagener et al., 2007) can bridge the gap between these strategies and help fulfill the needs of a more general classification system. Previous studies have demonstrated the usefulness of empirical analysis of large datasets through clustering to identify homogeneous groups of hydrologically-relevant entities. Examples include Ramachandra Rao and Srinivas (2006) for catchments, Bormann et al. (1999) for hydrologic response units, Bormann (2010) for soil types, McNeil et al. (2005) for water bodies and Panda et al. (2006) for chemical water types. At the catchment scale, Winter (2001) introduced the idea of hydrologic landscapes, which are defined on the basis of similarity of climate, topography and geology, assuming that catchments that are similar with respect to these three criteria will behave similarly in a hydrological sense. This approach clusters the USA into 20 non-contiguous regions using over 40 000 units of about 200 km 2 size (Wolock et al., 2004). In a similar manner, Buttle (2006) suggests that, within a particular hydro-climatic region, three factors should provide first-order controls on the streamflow response of catchments: (1) typology -hydrologic partitioning between vertical and lateral pathways, (2) topology -drainage network connectivity, and (3) topography -hydraulic gradients as defined by basin topography. Borman et al. (1999) applied a classification to hydrological quantities and physically based model (soil-vegetationatmosphere-transfer-scheme) to examine the hydrologic behavior of catchments with respect to different ecotypes (hydropedotopes) of one catchment in Germany. Borman (2010) utilized a hydrologic classification system based on soil texture groupings, assuming soil to be a major control of hydrologic similarity. Similarly, Ramachandra Roa and Srinivas (2005) investigates catchments contained within Indiana using physical features (area, channel length, channel slope, etc.) in order to classify catchments as physically similar. These studies make the implicit assumption that the physical (climate and landscape) properties considered are the dominant controls on the "hydrologic behavior" of a catchment and are therefore sufficient to group catchments that are hydrologically similar. However, Merz and Blöschl (2009) showed that land use, soil types, and geology did not seem to fully define the process controls on catchment behavior when analyzing over 400 catchments in Austria. In addition, the uniqueness problem discussed above can lead to unexpected behavior of some catchments that is difficult to predict a priori without very detailed information about the catchment. Therefore, to fully permit (hydrologic) information transfer and to achieve a generalization of the relationships between catchment attributes, climate and hydrologic responses, an explicit quantitative assessment of such relationships (a mapping) is required and has to be tested, rather than an implicit one as used by Winter (2001) and Buttle (2006).
Alternatively, assessing similarity in terms of certain streamflow characteristics has been particularly popular in aquatic ecology, due to the importance of flow characteristics for aquatic habitats (e.g., Poff et al., 2006;Olden and Poff, 2003;Monk et al., 2007), and in regime studies (e.g., Haines et al., 1988;Bower et al., 2004;Moliere et al., 2009). However, these studies are typically not aimed at understanding the behavior of the catchment, including the causes of a particular regime occurring beyond climatic differences between regions. Black (1997) introduced the idea of hydrologic function, defined as the actions of the catchment exerted on the precipitation it collects. Wagener et al. (2007Wagener et al. ( , 2008 expanded this idea by viewing catchments as non-linear space-time filters, which perform a set of common hydrologic functions, broadly consisting of partitioning, storage, and release of water. Partitioning is defined as the process whereby incoming precipitation is partitioned at the land surface into several components (e.g., infiltration, interception and surface runoff). Storage refers to the mechanisms by which incoming precipitation is held in temporary storage before its eventual release from the catchment (e.g., soil moisture, groundwater or interception). Release of stored water is defined as the pathway (and state) through which water ultimately leaves the catchment (e.g., evaporation, transpiration or surface runoff). Wagener et al. (2007) suggested that (to a degree) these functional characteristics should be revealed and hence observable in selected signatures of the catchment responses to precipitation input, i.e., in characteristics of the streamflow hydrograph, soil moisture and vegetation patterns, and other hydrologic variables. Different observed characteristics will enable a more or less detailed view of catchment function. Other observations, e.g., isotopic tracers, likely provide additional information on internal pathways than what could possibly be derived from streamflow data alone (e.g., McGlynn et al., 2003;McGuire et al., 2005;Tetzlaff et al., 2009;Broxton et al., 2009). However, the limited availability of such tracer data makes it necessary to understand how far more generally available data such as streamflow can provide first-order insight.
This paper provides the first ever cluster analysis to be performed with respect to hydrologic similarity, derived from the notion of catchment function, across a large geographic region with strong environmental gradients. The objective is to understand how catchments group and whether hypotheses regarding controls on similarity can be generated. We use an empirical approach to cluster catchments based on hydrologic similarity as defined by six key signatures. None of the signatures themselves are novel, but their combined use to quantify hydrologic function, and hence hydrologic similarity, is. The choice of streamflow as output variable, with all its limitations as discussed above, means that while we can utilize many catchments, the similarity analysis is limited to a first-order classification. Some functional equifinality, i.e., a limited ability to uniquely characterize hydrologic function, will necessarily remain. The results are valid within the hydro-climatic and landscape characteristic gradients in our dataset. We use the clustering result to speculate on more general signature controls that will have to be generalized through additional study, e.g., using numerical models as used by Carrillo et al. (2011).

Study catchments and data
A total of 280 catchments, spanning the eastern half of the United States, were used in this study. Catchments range in size from 67 km 2 to 10 096 km 2 (though only a few very large catchments are included), and show aridity indices (long-term ratio of annual potential evapotranspiration to annual precipitation rates) between 0.41 and 3.3, hence representing a heterogeneous dataset (See further details in Supplement). The size of the catchments ensures that hillslope scale controls do not affect any of the signatures, which has been shown to decline beyond some 10 km 2 catchment size (Robinson et al., 1995). Ecoregions are delineated based on climatic and land cover information. The ones found in our study region are type 1 eco-regions 5, 8 and 9, which are defined as Northern Forests, Eastern Temperate Forests, and Great Plains, respectively (Omernik, 1987).
Time-series data of daily streamflow, precipitation, and temperature for all catchments were provided by the MOPEX project .
The catchments within this dataset are minimally impacted by human influences. Streamflow information within this dataset was originally provided by the United States Geological Survey (USGS), while precipitation and temperature were supplied by the National Climate Data Center (NCDC). A total of 10 hydrologic years of data was used (1949 to 1958) in order to calculate the signatures. This time period was assumed to be long enough to capture climatic variability, but short enough to not be affected by climatic trends. An analysis of the impact of trends on the classification is outside the scope of this paper. To ensure precipitation quality, the MOPEX dataset assumes a minimum acceptable precipitation gauge density within each catchment defined following the equation, where N is the number of precipitation gauges andA is the area of the catchment (km 2 ) (Schaake et al., 2000). The use of this guideline provides mean areal precipitation estimates at each time step and should result in less than 20 % error 80 % of the time . The MOPEX dataset has been used widely for hydrologic model comparison studies (see references in Duan et al., 2006).

Signatures
Signatures quantify characteristics of the hydrologic response that provide insight into the functional behavior of the catchment. In this paper we will limit ourselves to signatures derived from widely available time-series information such as streamflow, precipitation, and temperature as the basis of a first-order analysis. These key signatures were chosen, from a much larger list of possible indices (see list in Yadav, 2007), by selecting those that are largely uncorrelated and that have an interpretable link to catchment function as discussed in the next sections. None of the chosen signatures scales with catchment size. The chosen signatures are: runoff ratio, baseflow index, snow day ratio, slope of the flow duration curve, streamflow elasticity, and rising limb density.
In the remainder of this section we will provide a brief definition of each of the six signatures.

Runoff ratio
Runoff Ratio (R QP [-]) is defined as the ratio of long-term average streamflow, Q, to long-term average precipitation, P , It represents the long-term water balance separation between water being released from the catchment as streamflow and as evapotranspiration (assuming no net change in storage) (Milly, 1994;Olden and Poff, 2003;Poff et al., 2003, Sankarasubramanian et al., 2001Yadav, 2007). A high runoff ratio identifies a catchment from which a large amount of water exits as streamflow (streamflow dominated or bluewater dominated), whereas a low value of runoff ratio identifies a large amount of water exiting the catchment as evapotranspiration (ET dominated or green-water dominated).

Slope of the Flow Duration Curve
The Flow Duration Curve (FDC) is the distribution of probabilities of streamflow being greater than or equal to a specified magnitude. FDC is typically derived from hourly or daily (and sometimes monthly) streamflow data (e.g., Vogel and Fennesy, 1994;Jothityangkoon et al., 2001;Jencso et al., 2009). To quantify an index of flow variability, the slope of the FDC (S FDC [-]) is calculated between the 33rd and 66th streamflow percentiles, since at semi-log scale this represents a relatively linear part of the FDC (Yadav et al., 2007;Zhang et al., 2008). A high slope value indicates a variable flow regime, while a low slope value means a more damped response. Damped response can arise as a result of a combination of persistent (wide-spread and year-round) rainfall and/or the dominance of groundwater contribution to streamflow. The signature is defined as, where S FDC is the slope of the flow duration curve, Q 33 % is the streamflow value at the 33rd percentile, Q 66 % is the streamflow value at the 66th percentile.

Baseflow index
Base Flow Index (I BF [-]) is the ratio of long-term baseflow to total streamflow (e.g., Arnold et al., 1995;Vogel and Kroll, 1992;Kroll et al., 2004). A high value of I BF defines a catchment with higher baseflow contribution, i.e., more water using long flowpaths through the catchment. A range of algorithms has been proposed and compared to perform a separation of quick flow and baseflow from observations of streamflow alone (Kroll et al., 2004;Eckhardt, 2005;Institute of Hydrology, 1980;Arnold et al., 1995;Arnold and Allen, 1999;Wittenberg and Sivapalan, 1999;Laaha and Blöschl, 2006). In this study we use the one-parameter single-pass digital filter method (DFM) based on previous studies reported by Arnold et al. (1995) and Lim et al. (2005). We do not consider the specific choice of filter crucial in this study since we focus on the relative differences between catchments, rather than on the actual values. The filter applied is defined as follows, where Q D,t is the direct flow value at time-step t, Q t is the total flow at time step t, and c is a parameter. The parameter c was set at a value of 0.925 based on a comprehensive case study performed by Eckhardt (2007). The value of the baseflow Q B,t at time-step t is then given by, The baseflow index is therefore, where the summation is carried out over all time steps of the study period.

Streamflow elasticity
Streamflow elasticity (E QP [-]) describes the sensitivity of a catchment's streamflow response to changes in precipitation at the annual time scale. It can be calculated by taking the inter-annual difference between annual streamflow divided by the inter-annual difference between annual precipitation, which is then normalized by the long-term runoff ratio (Schaake, 1990;Dooge, 1992). Based on the study by Sankarasubramanian et al. (2001) we assume that the median value provides a more stable value of this index, i.e., where E QP is the streamflow elasticity, dQ (dP ) is the difference between the previous year's streamflow (precipitation) and the current year's streamflow (precipitation), P is the mean annual precipitation, and Q is the mean annual streamflow. The median value of E QP is considered as a robust measure, since it filters out outliers, which may significantly affect the mean value (Sankarasubramanian et al., 2001;Sankarasubramanian and Vogel, 2003). E QP is the percentage change in streamflow divided by the percentage change in precipitation. A value of 1 indicates that a 1 % precipitation change leads to a 1 % change in streamflow. A value greater or less than 1 would, respectively, define the catchment as being elastic, i.e., sensitive to change of precipitation, or inelastic, i.e., insensitive to a change of precipitation.

Snow day ratio
The snow day ratio (R SD [-]) is defined as the number of days that experience precipitation when the average daily air temperature is below 2 • C, divided by the total number of days per year with precipitation. This value provides an indicator of the amount of precipitation that falls and is stored as snow. It can be related to the seasonality of the catchment response (Woods, 2003). The ratio is defined as, where N S is the number of days with precipitation and a daily average temperature below 2 • C, and N P is the number of days with precipitation. A high value of R SD suggests more snow storage with a significant impact on the intra-annual variability of streamflow.

Rising limb density
The sixth signature considered in this study is called Rising Limb Density (R LD ). R LD describes the flashiness of the catchment response and is defined as the ratio of the number of rising limbs (N RL ) and the total amount of time the hydrograph is rising (T R ) (Morin et al., 2002;Shamir et al., 2005). The equation is given as, R LD is a descriptor of the hydrograph shape and smoothness without consideration for the flow magnitude. A small the signature value indicates a smooth hydrograph.

Methods: cluster analysis
Cluster analysis is the process of grouping similar entities (catchments) according to one or more chosen similarity measures (signatures in our case), while concurrently separating those entities that are different. There are three common types of clustering algorithms: agglomerative hierarchical clustering, k-means clustering, and fuzzy partition clustering. All three strategies of unsupervised clustering require some subjective choices that define the clustering process, e.g., the distance metric used, and there is consequently not one single solution to this kind of analysis. The objective in this study is therefore to use an empirical analysis to investigate how the similarity between catchments defined by the six signatures might create groupings, and is not to derive at an ultimate classification result, which would always depend on the choices we made and the dataset used anyway. To account for the uncertainty in the classification process, we used a fuzzy partitioning algorithm that enables us to analyze the uncertainty in the resulting classification.
The method chosen for this study is a fuzzy partitioning Bayesian mixture clustering algorithm implemented in the AutoClass C software package (version 3.3.4) (Stutz and Cheeseman, 1995;Cheeseman and Stutz, 1996;Archcar et al., 2009;Kennard et al., 2010). Bayesian mixture modeling is a probabilistic approach in which marginal likelihoods for different classification realizations are estimated and ranked against all other realizations. The classification with the highest posterior probability is ultimately chosen as the most likely realization (Webb et al., 2007). Each catchment is therefore assigned to a particular class with a certain probability, called here the probability of class assignment. The number of classes is automatically decided during the classification process. A catchment could be allocated to different classes due to the probabilistic nature of the algorithm, and it is only the primary class assignment that is listed, which pertains to the class assignment with the highest probability. The input variables characterizing the catchments, i.e., the signatures, were log transformed and modeled as normally distributed continuous variables with an associated degree of uncertainty. Additionally, these variables are scaled such that the magnitude differences between signatures do not cause any additional weighting in the calculation of distance metric. The output of the clustering process is analyzed with respect to the probability of each catchment being member of a particular class, the class strength (calculated as a heuristic measurement, where a high class strength means a narrow range of signature values), and the importance of each signature in separating the different classes (calculated using the Kullback-Leibler distance metric (Kullback and Leibler, 1951)). Another advantage of the clustering algorithm used here is the ability to consider correlation between signatures. We account for the covariate information from two correlated similarity measures, in our case signatures SFDC and IBF, using a multi-normal model. We therefore chose this particular algorithm due to its ability to consider uncertainty and correlation in contrast to other clustering strategies (Jain et al., 1999).
Due to the probabilistic nature of the AutoClass-C algorithm, classification realizations will change over multiple runs. To test for the stability of the results across these different realizations, we use the Adjusted Rand Index (ARI, Rand, 1971;Hubert and Arabie, 1985), which takes a value of 0, if the agreement between two classification schemes is no better than mere chance, and 1, indicating perfect agreement between the two classification schemes. We use ARI to test the similarity of classification results when the algorithm is initiated multiple times. Figure 1 shows the relationships between the signatures both visually and numerically. In addition to the linear coefficient of correlation, C Lin , the Spearman rank correlation coefficient, C SR , has also been calculated to show potential non-linear relationships. In the set of signatures used, Baseflow Index (I BF ) and Slope of the Flow Duration Curve (S FDC ) show the highest linear correlation (0.67). While the correlation between these two signatures is partially created by a smaller number of very high S FDC values, the correlation is considered during clustering as discussed in the methods section.

Signature relationships and spatial variability
The different spatial patterns that the six signatures produce across the study domain can be seen in Fig. 2. Runoff ratio, RR, shows high values in the humid region along the Appalachian mountain and connected plateau regions, which decrease with increasing distance from this area, especially towards the central US (see also Sankarasubramanian and Vogel, 2003). Figure 2b shows that the smallest values of the slope of the FDC are located on the southeastern side of the Appalachian mountain range and west of this area. Values of streamflow elasticity and rising limb density show much greater heterogeneity than the first two signatures (Fig. 2c  and f). High values of baseflow index can be found along the Eastern coastal US and around the Great Lakes region (Fig. 2d), where more permeable soils and bedrock dominate (Wolock et al., 2004;Santhi et al., 2008). Values decrease when moving towards the east where soils and bedrock are more impermeable. As expected, the ratio of snow days correlates highly with latitude (Fig. 2e). These spatial patterns further underline the relative independence of the different signatures and attest to their suitability for the similarity analysis. Figure 2 also shows that, taken individually, there are strong regional patterns in the variations of several of the signatures (Runoff Ratio, Ratio of Snow Days, Baseflow Index, Slope of FDC). The variability in the spatial patterns seen also suggests a difference in the variables that control the signatures. Some characteristics change slowly in space (e.g., Runoff Ratio), which is likely due to the spatially slowly changing climate. Other signatures show much more abrupt changes, which suggests controls related to soil or geological characteristics.

Cluster analysis
The cluster analysis was applied to all 280 catchments within the six-dimensional signature space. The analysis aimed at addressing the following questions: (1) how do the catchments group with respect to the signatures used? (2) What spatial patterns of clusters emerge? (3) What hypotheses regarding the function of catchments and what physical or climatic controls on this functional behavior can be derived?
The cluster analysis identified 9 different classes of varying size for the most likely classification. The classification process was repeated 20 times and the Adjusted Rand Index (ARI) between classification schemes of 15 of the 20 runs was above 0.90. Furthermore, 7 of these 20 runs were found to be nearly identical and one of these 7 runs was therefore  used as the final classification. Results were also screened for extremely small or large classes, and for generally providing high probability of class memberships to ensure that no unreasonable solutions were used.
The heuristic measures describing the algorithm and classification performance (discussed in the methods section) are visualized in Fig. 3 for the chosen result. Figure 3a shows a histogram of the probabilities that a catchment belongs to the assigned class. The histogram indicates that the vast majority of catchments is assigned with probabilities above 0.9 and hence that they are very likely classified correctly. The num-ber of catchments per class varied between 5 and 82 (Fig. 3b). All classes show a relatively high class strength (Kennard et al., 2010), i.e., the variability of signature values within each class is rather low. Classes 2 and 3 have the highest values, while those for class 5, 8, and 9 are somewhat lower.
The relative value of attribute influence of each signature describes the contribution of each signature to the classification. This measure represents the separation of classes due to each signature, and is calculated from the average Kullback-Leibler distance between attribute distributions in individual classes and the overall distribution found in the K. Sawicz et al.: Catchment classification    full dataset (Webb et al., 2007). The attribute influence increases as the variance between the signature means of each class increases. Its values range from 1 (highest contribution) to 0 (no contribution). The order of signature influence on the clustering result was: Streamflow Elasticity (1), Ratio of Snow Days (0.98), Runoff Ratio (0.862), Slope of the FDC (0.462), Baseflow Index (0.462), and Rising Limb Density (0.201). These values suggest that all the signatures provided information for the classification, though RLD was not very influential. It also suggests that mainly climate-controlled signatures dominate the classification, further adding to the evidence supporting the dominant role climate has in controlling catchment behavior (see also Rosero et al., 2010).
Cluster results regarding the distribution of signature values in each cluster are shown as a box and whisker plots in Fig. 4. The catchments in Fig. 4 are sorted from left to right by increasing median value of the signature shown. The spatial distribution of the clusters is shown as a map with a corresponding heatmap in Fig. 5. The heatmap shows the distribution of signature values indicated by colors for all catchments. Figures 6 and 7 are used to analyze possible controls on the signature-based classes identified. Geographic location is used to structure the discussion of the resulting classification, starting in the Northeastern US.  Catchments in the northeastern United States (class C2) are characterized by high ratios of streamflow to precipitation (high RQP) and large amounts of snow (high RSD) (Fig. 4). These catchments are located in humid continental climate with low energy availability and hence low evapotranspiration. Snow storage is important in controlling seasonal variability of runoff, i.e., these catchments have the highest ratio of snow days in the dataset. Catchments located in class C2 can be found in an area ranging from Maine to Pennsylvania. These catchments have the smallest basin sizes in the dataset with the highest slope (max SLOPE and min DA), with long and frequent storms (max SD and NP), and the lowest maximum temperatures (TMAX). This class consists mainly of catchments with a very low precipitation seasonality index (PSI; Figs. 6 and 8), meaning that precipitation amounts are distributed relatively evenly throughout the year (a uniform distribution would be 0). PSI values are generally low for the eastern US (less than 0.6) and lowest in the Northeastern US, compared to the southwestern US where rain falls mainly in winter (Pryor and Schoof, 2008). All catchments in this class are energy limited with the smallest aridity indices in the study region (AI = PE/P below 1), suggesting that these catchments are controlled by low average energy availability throughout the year, but with strong seasonal variability, as well as a relative uniform distribution of moisture input in time.
Catchments slightly further to the south (in Pennsylvania and Virginia), cluster C3, extend westward to Indiana with lower runoff ratios (median of 40 % versus about 55 % for class C2) and less dominant snow storage, while the values  of SFDC stay relatively similar to those catchments further north (Fig. 4). There are generally strong similarities in physical and climatic characteristics between the catchments of classes C3 and C2: they are the smaller catchments, with a high fraction of poorly drained soils (high HGC), the longest storm durations and the lowest aridity indices. Differences to the previous class are mainly topography and land use, since C3 catchments are at a lower elevation, with less slope, and have more agricultural land use and therefore lower root zone depths (Figs. 6 and 8).

Figure 6
C8 C4 C7 C3 C5 C9 C6 C1 C2 Within the extent of C3 catchments, a small collection of catchments from C9 can be found in Northwest Ohio. This class is separated by very low streamflow elasticity values, along with the smallest variability of this signature in any class. Class 9 does show the lowest median value of Slope of the FDC, and the highest median value of Baseflow Index, however the variability of these signatures within this class is the largest of any class. Class 9 also shows a larger amount of snow than C3, suggesting a climatic separation between these regions. Climatically, C9 experiences the lowest temperatures and shortest storm duration in the dataset, along with some of the highest precipitation seasonality found within our dataset.
Continuing southeastward down the Appalachian Mountain entering North Carolina, we witness a decrease in the values of Ratio of Snow Days in class C1, although the variability of this signature increases in this class, which also contains the largest number of catchments. We also see a small decrease in the Slope of the FDC (and conversely, a slightly higher baseflow index). Catchments belonging to this class are spread along the same latitude and are mainly separated from classes C2 and C3 by a lower snow day ratio attributable to their location (median of about 20 %, Fig. 4). From Fig. 6, we can also see that these catchments experience low precipitation seasonality indices, a low fraction of poorly drained soil along with high percentage of sand, and hence high soil permeability, as well as the lowest relief 3.88 6.93 ratio (RRM) of all classes. Classes C1 and C5 (the next class further south) are the baseflow dominated catchments in the dataset.
In the catchments further south of class C1, i.e., cluster C5, we find a persistence of high IBF and low SFDC values. The "number of snow days' " signature decreases even further, suggesting a climate-based separation between the classes C1 and C5. Catchments in C5 also have some of the lowest elasticity values (Fig. 4c), which is in line with their low flow duration curve slope (suggesting high storage in the catchment). The higher baseflow fraction creates a smaller streamflow response to changes in precipitation. Catchments within this class experience high maximum temperatures and large amounts of rainfall (>1200 mm yr −1 ).
West of class C5, in the southern Mississippi river basin, are catchments of class C6. While climatically similar, e.g., similarly water limited with negligible snow as class C5, the baseflow index decreases, with decreasing percentage sand levels and deeper mean root zone depths (Fig. 6). This class is also quite similar to the catchments grouped in class C1, even though located further north, with respect to geologically controlled signatures (e.g., I BF ), but with lower runoff ratio. The catchments located furthest west in the study region (belonging to classes C8, C7 and C4) are characterized by the lowest runoff ratios. Generally less than 20 percent of precipitation is released from the catchments as streamflow (Fig. 4a). These are water-limited catchments (AI > 1) that receive less precipitation than the other areas of the study region, while the fraction of snow increases when moving from south to north (C8 to C7 to C4) (Fig. 4e). The catchments of this class, along the western boundary of the study region, are approaching a more arid area of the Köppen classification system (Peel et al., 2007). This area also exhibits a high precipitation seasonality index, i.e., a non-uniform distribution of precipitation through the year. These catchments are located in areas that are primarily cultivated land use, demonstrated by a very higher percentage of agriculture found within 800 m of the stream (riparian zone) (Figs. 6 and 8). Catchments south of Iowa, class C7, exhibit the highest flow duration curve slopes, SFDC, and elasticity values, EQP, as well as the lowest baseflow indices in the dataset. The latter is likely related to the very low percent of soils classified as sand, and the highest fraction of very poorly drained soils.

Discussion
Overall it seems that signatures, which vary along a climatic gradient (RQP, EQP, RSD) are exerting a stronger control on separating the catchments into different classes than the signatures that are likely to be more impacted by topographic, geological and land cover variability (IBF, SFDC, RLD). This highlights one problem with this type of empirical analysis in which the result is controlled by the particular gradients present in the analyzed dataset, hence making it difficult to generalize beyond the data at hand. This result further suggests that a general regionalization of signatures across the region might not be the best strategy for some of the signatures, but that the region has to be broken up into smaller subregions (see the example by Laaha and Blöschl, 2006). The degree of equifinality of controls might also be reduced if further variables characterizing the functional behavior of the catchments would be included. For example, one would expect tracer data to provide a better separation of flowpaths/residence times and hence enable a refinement of the hydrologic function of the catchments.
This conclusion was not unexpected and, as stated on the outset of this paper, no cluster analysis can produce a general classification system because the results are depending on the dataset used and subjective decisions made (mainly choice of algorithm and distance metric). However, the clustering results help to understand controls across the study region, and potentially enable us to derive a small number of hypotheses. One should analyze these hypotheses further in a more idealized setting (i.e., not empirical) to understand the generality of the results found here. Multiple authors have advocated the use of "virtual experiments" for this purpose, i.e., by analyzing modeled or synthetic realities rather than actual systems (e.g., Bashford et al., 2002;Weiler and McDonnell, 2004;Winter et al., 2004;van Werkhoven et al., 2008). So what does the empirical analysis above suggest? The main issue we focus on is the suggested variability of controls on similar hydrologic signatures, and hence on hydrologic function in the context of this paper.
Streamflow elasticity with respect to precipitation is modified by the permeability characteristics of a catchment. Our results suggest that high elasticity values (clusters 8 and 7) relate to low I BF values and vice versa (clusters 9, 5, 2, 1) (Figs. 4 and 7). Cluster 8 and 7 have low % Sand and the highest percentage poorly drained soils (HGD), and hence the smallest potential for buffering precipitation variability. Clusters 5, 2 and 9 have high % Sand, and 5 and 2 also have a high percentage well drained soils (HGA), and therefore a high potential for buffering. This result is similar to the conclusions of Sankarasubramanian and Vogel (2003) who analytically derived a parameter (parameter b of the abcd model) they refer to as soil moisture holding capacity, which they found to buffer streamflow variability and that they consider regionalizable using soil permeability. The classes with the highest elasticity values (8,4,7) are also the classes with the shallowest roots (lowest RZD) and the lowest runoff ratio (highest fraction of evaporated precipitation). Class C2 on the other hand has the highest root zone depth and the highest runoff ratio (highest of precipitation becoming streamflow). This interaction between climate, soils and vegetation is also shown in the five-dimensional plot of Fig. 9. It shows that deep-rooted vegetation coincides with high runoff ratios (energy limited catchments), but only if the catchments have mainly poorly or very poorly drained soils. Results like these   hint at a co-evolution of soil-climate-vegetation, which is further explored numerically in the parallel classification study by Carillo et al. (this issue).
Spatial proximity is a valuable first indicator of hydrologic similarity because it reflects the strong climatic control on catchment behavior, which varies slowly in space. Many researchers have commented on the value Blöschl, 2004, 2005;Parajka et al., 2005;Oudin et al., 2008) or lack of value (for example with respect to drought characteristics, see Tallaksen and van Lanen, 2004) of catchment spatial proximity in predicting hydrological similarity. The empirical results shown here suggest that spatial proximity clearly plays an important role as an indicator of similarity. However, the results also suggest that spatial proximity is generally reflecting similarity in other characteristics. The different clusters show strong spatial connectedness (clusters 4, 7, and 2), show large patches with "outliers" (6,5,3,8,9), or are relatively widely distributed (1 and 6). Combining Figs. 4, 5 and 7, we can see that cluster 4 is a connected group of catchments with very low (high) values of runoff ratio (ratio of snow days), and with very little variability in both signatures. An analysis of Fig. 6 shows little variability and extreme (within the dataset) values in landscape (lowest root zone depth, highest % HGC and HGD) and climatic (lowest MAP, highest PSI) characteristics for this cluster. This result suggests that the catchments in this cluster are very different from the rest. Another connected cluster is C2 in the northeastern US, which exhibits the highest runoff ratio and the highest ratio of snow days. This cluster also shows extreme values with little variability, but this time mainly for climatic characteristics (lowest T MAX and aridity index, AI; low PSI and highest N P ). It does also have the highest root zone depth. Cluster C5, on the other hand, has distributed patches in different parts of the study region. This cluster has the highest baseflow index values in the dataset (aside from C9), and shows little variability as well as the highest values for % Sand. At the same time, it shows considerable variability in climate (e.g., T MAX and MAP) and landscape characteristics (% AG and RZD).
The discussion in this section supports the earlier statement that such an empirical analysis cannot be the endpoint for classification, but rather a step along the way. The focus on streamflow means that we are limited in the degree of detail regarding hydrologic function that we can extract from such an integrated measure. However, it also allows for the regionalization of the signatures used, and enables an extension of the classification scheme to ungauged basins (Yadav et al., 2007). The limited availability of detailed descriptors of geology (certainly for a dataset covering a large region) suggests that we are also limited with respect to understanding subsurface controls (see similar issues in Oudin et al., 2008). And finally, the variability and environmental gradients in the dataset define what controls could even occur. There is of course no guarantee that different datasets, with different gradients, would not show other relationships between signatures and climate/landscape; or that these relationships would not change with the scale of analysis (Kennard et al., 2010). Therefore it is in the physical interpretation where the potential for generalization lies, rather than in the actual empirical result, and therefore more model/theory based analyses such as that of Carrillo et al. (2011) can be very useful as a supplement to the empirical studies.

Conclusions
The lack of a generally accepted catchment classification framework brought the question of what defines hydrologic similarity to the forefront of hydrologic science.  suggested that a classification framework, which is both descriptive and predictive, can be derived if it is based on the notion of catchment function and contains an explicit mapping between function (as observed in so-called signatures), climate and landscape characteristics. Here we provide a first test of this proposition in an empirical study utilizing 280 catchments across the Eastern US. This work provides insight into hydrologic similarity of catchments in the Eastern United States, and offers suggestions for controls of their hydrologic behavior.
We defined six signatures that can be derived from precipitation-temperature-streamflow data and used a Bayesian clustering algorithm to identify groups of similar catchments. Nine clusters with a relatively clear separation were identified. Spatially, most of the clusters exhibited some degree of connectivity suggesting that spatial proximity is a good indicator of similarity. It is likely that this result is due to climatic and some landscape characteristics changing slowly in space. Further, the results suggest that permeable soils provide a buffer to how strongly a catchment responds to variability in climate. Our result therefore suggests that soil properties will modify the impacts of climate change on hydrologic regimes, which means that changes in precipitation and temperature will not impact the streamflow response equally. Assessing the implications of climate variability and change on hydrologic similarity will be the content of future research. Overall, the physical interpretation of why the members of a particular class behave similarly is very encouraging and demonstrates the merit of this kind of clustering analysis for understanding hydrological similarity and its causes. Expanding this kind of study using much larger, even global, datasets has the potential to provide further insight into catchment similarity, and, in combination with numerical modeling, can result in a general catchment classification framework.
Limitations of the study presented here are its purely empirical nature and the focus on streamflow as the only hydrologic response variable. However, signatures such as flow duration curves have been used for many years to define the hydrologic character of catchments and hence provide an excellent starting point for catchment classification. We further believe that the limitations of empirical studies can be aided by numerical experiments in which idealized systems are tested using catchment models. See the companion paper by Carrillo et al. (2011) for further an example application of this type.