Interactive comment on “ Controls on hydrologic drought duration in near-natural streamflow in Europe and the USA ” by E . Tijdeman

Abstract. Climate classification systems, such as Koppen–Geiger and the aridity index, are used in large-scale drought studies to stratify regions with similar hydro-climatic drought properties. What is currently lacking is a large-scale evaluation of the relation between climate and observed streamflow drought characteristics. In this study we explored how suitable common climate classifications are for differentiating catchments according to their characteristic hydrologic drought duration and whether drought durations within the same climate classes are comparable between different regions. This study uses a dataset of 808 near-natural streamflow records from Europe and the USA to answer these questions. First, we grouped drought duration distributions of each record over different classes of four climate classification systems and five individual climate and catchment controls. Then, we compared these drought duration distributions of all classes within each climate classification system or classification based on individual controls. Results showed that climate classification systems that include absolute precipitation in their classification scheme (e.g., the aridity index) are most suitable for differentiating catchments according to drought duration. However, differences in duration distributions were found for the same climate classes in Europe and the USA. These differences are likely caused by differences in precipitation, in catchment controls as expressed by the base flow index and in differences in climate beyond the total water balance (e.g., seasonality in precipitation), which have been shown to exert a control on drought duration as well. Climate classification systems that include an absolute precipitation control can be tailored to drought monitoring and early warning systems for Europe and the USA to define regions with different sensitivities to hydrologic droughts, which, for example, have been found to be higher in catchments with a low aridity index. However, stratification of catchments according to these climate classification systems is likely to be complemented with information of other climate classification systems (Koppen–Geiger) and individual climate and catchment controls (precipitation and the base flow index), especially in a comparative study between Europe and the USA.

Abstract.Climate classification systems, such as Köppen-Geiger and the aridity index, are used in large-scale drought studies to stratify regions with similar hydroclimatic drought properties.What is currently lacking is a large-scale evaluation of the relation between climate and observed streamflow drought characteristics.In this study we explored how suitable common climate classifications are for differentiating catchments according to their characteristic hydrologic drought duration and whether drought durations within the same climate classes are comparable between different regions.This study uses a dataset of 808 near-natural streamflow records from Europe and the USA to answer these questions.First, we grouped drought duration distributions of each record over different classes of four climate classification systems and five individual climate and catchment controls.Then, we compared these drought duration distributions of all classes within each climate classification system or classification based on individual controls.Results showed that climate classification systems that include absolute precipitation in their classification scheme (e.g., the aridity index) are most suitable for differentiating catchments according to drought duration.However, differences in duration distributions were found for the same climate classes in Europe and the USA.These differences are likely caused by differences in precipitation, in catchment controls as expressed by the base flow index and in differences in climate beyond the total water balance (e.g., seasonality in precipitation), which have been shown to exert a control on drought duration as well.Climate classification systems that include an absolute precipitation control can be tailored to drought monitoring and early warning systems for Europe and the USA to define regions with different sensitivities to hydrologic droughts, which, for example, have been found to be higher in catchments with a low aridity index.However, stratification of catchments according to these climate classification systems is likely to be complemented with information of other climate classification systems (Köppen-Geiger) and individual climate and catchment controls (precipitation and the base flow index), especially in a comparative study between Europe and the USA.

Introduction
Droughts are natural disasters that originate from a temporary deficit of water or abnormal temperatures.They are multifaceted phenomena and are often grouped into four main types: meteorological, agricultural, hydrologic and socio-economic.Hydrologic drought relates to "effects of dry spells on surface and subsurface water" (Wilhite and Glantz, 1985).In the absence of human influences, hydrologic droughts are often triggered by anomalies in climatic conditions.Their duration regularly depends on the persistence of these anomalies and on seasonal transitions, such as a shift from the rain to snow season or a shift from the wet to dry season (Van Loon and Van Lanen, 2012).However, climatic conditions alone do not determine the onset, persistence and recovery of a hydrologic drought.Storage-related processes (like snow accumulation or groundwater storage) play an important role as well (e.g., Haslinger et al., 2014;Staudinger et al., 2014;Van Loon and Laaha, 2015).
Knowledge of a region's hydro-climate is important for drought-related research (Tallaksen and Van Lanen, 2004); e.g., short-term precipitation deficits can lead to a hydrologic drought event in a catchment with little storage, whereas a catchment with a lot of storage is likely to be little af-E.Tijdeman et al.: Controls on hydrologic drought duration in Europe and the USA fected by such a dry spell.The Köppen-Geiger climate classification system (Geiger, 1961) is a popular way to describe a region's (hydro-)climate in a broad range of disciplines (Rubel and Kottek, 2011).However, it may not be the most optimal way of grouping catchments with similar hydrologic behavior, partly because it fails to distinguish between catchments with different "filtering behaviors" (Coopersmith et al., 2012).More recent hydro-climatic classification schemes build on the ideas of the Köppen-Geiger climate classification system.For the USA, such classification schemes are based on controls like amount, seasonality and timing of precipitation, (potential) evaporation, timing of maximum runoff and fraction of precipitation falling as snow (e.g., Berghuijs et al., 2014;Coopersmith et al., 2012).The latter two studies suggest that in the USA, climate is the dominant control on hydrologic behavior; however, Berghuijs et al. (2014) also found similarity between clusters of catchments and soil, ecosystem and vegetation classes.
Apart from climatic controls, catchment controls also play a role in the propagation from climatic input to streamflow (e.g., Barker et al., 2016;Haslinger et al., 2014) and could thus be useful to group catchments with similar hydrologic behavior.For example, variability in precipitation and temperature is dampened when it propagates to streamflow (Gudmundsson et al., 2011b).The latter study suggests that this is related to physical catchment characteristics.Gudmundsson et al. (2011a) found support for stronger control of physical catchment characteristics during situations of low flow, which was shown by reduced cross-correlation of low vs. high flows.
In order to improve our understanding of these climatic and catchment controls on hydrologic droughts, the drought characteristics of interest need to be quantified.Commonly, hydrologic droughts are characterized by duration, deficit volume, frequency and areal extent (Andreadis et al., 2005).Quantifying these properties helps to compare historical drought events and can be used to place current and predicted drought events in a historical context.One method to compare these characteristics is by severity area deficit (SAD) curves, which have been used to compare major soil moisture and runoff drought events in the USA (Andreadis et al., 2005) and major soil moisture drought events on a global scale (Sheffield et al., 2009).Knowledge about past drought characteristics can further be used to create probabilistic return periods of hydrologic drought events with certain characteristics, using so-called severity area frequency (SAF) curves (e.g., Hisdal and Tallaksen, 2003).Furthermore, these drought characteristics have been utilized to study the propagation of drought through the hydrologic cycle (overview in Van Loon, 2015) and to investigate the impact of climatic and catchment controls on droughts (e.g., Van Lanen et al., 2013;Van Loon et al., 2014).
Climate-related differences in modeled drought characteristics were found between the major classes of the Köppen-Geiger climate classification system, where droughts in snow, polar and arid climates have longer durations compared to the equatorial and temperate climates (Van Lanen et al., 2013).The different major classes of the Köppen-Geiger classification can be further divided into different sub-classes that take into account seasonality in precipitation and the occurrence of cold or hot seasons (Kottek et al., 2006).Van Loon et al. (2014) found that for these subclimates, droughts with long durations occurred more often within classes with seasonal properties.Droughts starting before annual recurring periods of low precipitation or high or low temperature are less likely to recover due to either a low influx of precipitation, temporary storage of precipitation as snow or a high level of evaporation (Van Loon and Van Lanen, 2012).Climate classification systems, like the Köppen-Geiger climate classification, are based on long-term average climatic conditions.However, drought durations are modified when meteorological droughts propagate through the hydrologic cycle.For example, drought duration increases with an increasing groundwater response time (Van Lanen et al., 2013;Van Loon et al., 2014).Both these studies showed that this drought prolonging effect was visible for different climates, suggesting a combined influence of both climatic and catchment controls on drought duration where neither climate nor physical catchment structure seemed to be dominant.
Studies based on modeled catchments may lead to a better theoretical understanding of controls on hydrologic droughts since they enable isolated research on the effect of one control at a time.However, modeling incorporates uncertainties, e.g., in climatic forcing and due to modeling assumptions (Sheffield et al., 2009).It is therefore questionable how representative models are of the real world.This highlights the importance of using observed streamflow data in research about controls on hydrologic droughts.However, outside the modeling environment, a comparative study on the isolated effect of one individual control is nearly impossible due to the unique combination of catchment and climate properties of each real-world catchment.For example, in Austria, propagation of drought (from precipitation to streamflow) was found to be more dependent on climatic forcing under humid conditions and on storage properties under more arid conditions (Haslinger et al., 2014).Therefore, research about controls on observed hydrologic drought durations is limited to finding the dominant ones.Tallaksen and Hisdal (1997) showed for a set of 52 Nordic catchments that the distribution of drought durations is variable over different catchments, which they hypothesized to be controlled by climate.In contrast, Van Loon and Laaha (2015) showed that storage-related processes mainly control the duration of drought for a set of Austrian catchments.They showed that the base flow index (BFI, representing several different storage-related processes) has the highest correlation with average streamflow drought duration.Elevation is another catchment control that is hypothesized to exert a control on streamflow droughts since it can be related to seasonal snow storage (Van Loon and Laaha, 2015).However, the influence of elevation might not be uniform around the world due to differences in geographical settings.For example, in some areas, there is a relation between aridity and elevation and in others there is a relation between snow processes and elevation (Salinas et al., 2013).Catchment area is negatively correlated with the variance in catchment runoff (Skøien et al., 2003).It is therefore hypothesized that low flow conditions are generally more persistent in larger catchments, although the latter study also found proof that the temporal smoothing of catchment runoff when it propagates from precipitation is mainly attributed to runoff generating processes.Catchment area also showed a positive correlation with mean drought duration, although it was not the most dominant catchment control (Van Loon and Laaha, 2015).
To extend the knowledge about controls on streamflow droughts and to evaluate the suitability of climate classification systems for describing regions with different hydrologic drought characteristics, large-scale studies are needed based on observed streamflow data.Therefore, we evaluated the suitability of four climate classification systems for differentiating catchments according to hydrologic drought duration in near-natural streamflow records from Europe and the USA.Furthermore, we tested whether drought duration distributions of the same climate classes were comparable between the USA and Europe, which answers the question of whether or not these four classification systems are transferable between these regions.A similar analysis was done for five different individual climate and catchment controls.However, these controls do not have commonly accepted grouping approaches; i.e., we needed another (more arbitrary) grouping approach for these individual controls.Therefore, individual controls are complementary in the interpretation of the suitability of the four climate classification systems for differentiating catchments according to drought duration.For both analyses, we used a hypothesis testing approach to systematically compare cumulative drought duration distributions (hereafter called drought duration curves) between classes of the four climate classification systems and classes of individual controls.Duration is preferred over other drought characteristics like severity or magnitude since this characteristic is less influenced by systematic measurement errors and relies on ranks of data rather than on accurate gauged quantities.
Based on the above-mentioned studies, we hypothesize that the following climate or catchment characteristics exert a control on drought duration.
-Occurrence and length of a precipitation deficit season -Occurrence and length of a cold season -Climatic controls (precipitation (P ) and temperature (T )) -Catchment controls (base flow index (BFI), area (A) and elevation (E)) The following four climate classification systems are therefore hypothesized to be suitable for differentiating catchments with different hydrologic drought duration characteristics since they include one or more of these controls: the Köppen-Geiger climate classification system (KG), the aridity index (AI), the number of months with an average temperature below zero (T < 0) and the number of months with a climatic water deficit, i.e., when the average potential evaporation is larger than the average precipitation (E POT > P ).However, none of these climate classification systems considers catchment controls, so their suitability for differentiating catchments according to drought duration in observed streamflow was investigated in this study under a wide variability of catchment characteristics.
2 Data and methods

Streamflow data and potential controls
The analysis was based on 808 near-natural streamflow records from Europe (n = 347) and the contiguous USA (n = 461).The streamflow records for the USA were selected from the Hydro-Climatic Data Network (HCDN-2009, Lins, 2012) and for Europe from the European Water Archive (EWA, Stahl et al., 2010).Only records meeting the following criteria were selected for further analysis.
1. Forty years of continuous daily data for the time period 1965-2004 for Europe and 1970-2009 for the USA.Different time periods were chosen to optimize the number of stations while incorporating recent times.
2. The percentage of zero streamflow occurrence at each weekly time step is ≤ 20, since the chosen drought identification method was not designed to deal with more frequently occurring zero streamflow.
Individual controls were assembled from various sources for both regions.Climatic (annual and monthly P and T ) and topographic (mean E and A) controls were obtained for the USA from the GAGES-II dataset (Falcone, 2011).For Europe, climatic controls were obtained from the E-OBS dataset (Haylock et al., 2008) and topographic controls originate from pan-European River and Catchment Database CCM2 (Vogt et al., 2007).The BFI was calculated from the entire daily streamflow records based on the calculation procedure described in Gustard and Demuth (2009).Four climate classification systems were calculated from the individual climatic controls as follows: -KG: according to the method of Kottek et al. (2006).
-AI: following the method of de Martonne (1926) (P divided by (T + 10)) with a grouping interval of 10 (similar to the map presented at the FAO website; Grieser et al., 2006).-T < 0: sum of months with average T below zero; and -E POT > P : sum of months with average E POT (calculated following the method of Thornthwaite, 1948) above the average P .
The KG classification system classifies catchments with twoor three-letter codes.For the considered regions, distinctions are made based on the minimum of the average monthly temperature (first letter C for a minimum temperature > 3 • C and D for minimum temperature ≤ 3 • C), seasonality in precipitation (second letter f for precipitation all year round and s for a relatively low amount of precipitation in summer) and summer temperatures (the third letter a stands for hot summers, b for warm summers, and c cool summers).Figure 1 shows the locations of the selected catchments and their classification according to the KG and AI climate classification systems.

Drought duration curves
The goal of this step is to extract drought duration distributions from the streamflow records.Daily streamflow records were transformed to weekly data (sum of total streamflow volume per week).Defining droughts at this temporal resolution is in line with other studies (e.g., Tallaksen and Stahl, 2014) and with the US drought monitor classification scheme (Svoboda et al., 2002).Hydrologic drought events were identified from these weekly records using the threshold level approach following the principles of Zelenhasić and Salvai (1987); a drought event starts when the streamflow record is at or below a certain threshold level and ends when this record passes the threshold again.The threshold level used in this study was the 20th percentile of streamflow, which was calculated for each week.This is a common threshold used in various other large-scale drought studies (e.g., Andreadis et al., 2005;Tallaksen and Stahl, 2014;Van Lanen et al., 2013;Van Loon et al., 2014).Drought durations, defined as the sum of weeks the streamflow record is continuously at or below the threshold, were extracted for each record.Similar to flow duration curves, these weekly values of drought durations were sorted from shortest to longest.For each drought duration, the fraction of non-exceedance was calculated.The resulting drought duration curves were calculated by linear interpolation of these cumulative drought duration distributions in such a way that each percentile (ranging from 1 to 100) has a value.As an example, the drought duration curves of all catchments (or drought duration curve ensembles) for the USA and Europe are presented in Fig. 2a.In this study we only take into account long-duration droughts that are defined in a relative way.Reasons to only focus on longduration droughts are related to the hypothesis that these droughts affect natural and socio-economical systems more severely.Furthermore, drought duration curves are more different from each other after the 81st percentile (Fig. 2a).We hence only consider the drought duration curves between the 81st and 100th percentiles for further analysis.For simplicity, we hereafter use the term drought duration curves when referring to drought duration curves between the 81st and 100th percentiles.

Grouping drought duration curves
To test whether drought duration curves differ between classes of the four climate classification systems and five individual controls, we grouped them accordingly.For the four climate classification systems this means that drought duration curves were grouped according to the predefined classes.Since no such straightforward classification systems exist for the selected individual controls, we had to use another approach.In a first step, we combined all values of an individ-Table 1. Considered classes of the four climate classification systems (Köppen-Geiger (KG), aridity index (AI), number of months with an average temperature below zero (T < 0) and number of months where the average potential evaporation was larger than the average precipitation (E POT > P )) and five individual controls (precipitation (P ), temperature (T ), area (A), elevation (E) and the base flow index (BFI)) and the corresponding number of catchments in each class (USA/Europe).

Comparing DDC
DDC of the different classes were compared with each other both visually and statistically.For visual comparison, the DDC ensemble average per class (e.g., per KG class) was calculated.Instead of showing the absolute values of the average DDC per class, we plot them as departures from the overall average to make differences easier to discern (Fig. 2c1).For the statistical analysis, we systematically compared, for each climate classification system or individual control, the DDC values of each class at each percentile between 81 and 100 with all other classes (boxplots Fig. 2c2).This percentilebased comparison was preferred over a statistical comparison of average DDC ensembles because the latter does not take into account the variability in DDC ensembles at the different percentiles (Fig. 2a).As a final measure of statistical similarity in DDC of the different classes, we used the number of percentiles with nonsignificant differences (P ≥ 0.05) according to either the KS or MWU test (Eqs. 1 and 2).
where S KS and S MWU are the number of similar percentiles ranging between 0 and 20 (0 = 0 percentiles similar and 20 = all percentiles similar) and P KS,i and P MWU,i are the P values of the two tests at percentile i (Fig. 2c2).A high value of S KS and S MWU thus indicates more similarity between the DDC of two classes.In addition to the comparison of DDC between all classes of each climate classification system and individual control of both the entire dataset and the two regional subsets, DDC of the same climate classes were compared between Europe and the USA (e.g., DDC of KG class Cfb in the USA vs. DDC of the same class in Europe).For the visual comparison, the difference in average DDC of the same classes between the USA and Europe was used (average DDC USA minus average DDC Europe).For statistical comparison, the number of percentiles with similar DDC values between classes with the same classification (according to both S KS and S MWU ) was again used as a measure of statistical similarity between DDC.

Climate classification systems
Entire dataset Subset USA Subset Europe USA − Europe 3 Results

Visual comparison of DDC
Figure 3 presents the average DDC (for long-duration droughts) of all classes of the four climate classification systems.In general, the patterns displayed for the entire dataset and for the two regional subsets (USA and Europe) are comparable.However, the average DDC of catchments from the same climate classes in the USA are mostly higher, i.e., biased towards longer drought durations (average DDC of the USA minus average DDC of Europe is mostly positive (Fig. 3, right column)).
The KG reveals the lowest average DDC for catchments in the non-seasonal temperate and snow climates (Cfc, Cfb and Dfb) for both the entire dataset and the two regional subsets of the USA and Europe.Higher average DDC are displayed for catchments in the hot summer, cold and seasonal climates (Cfa, Dfa, Csb, Dfc, Dsb, Dsc).Catchments in the Dfc and Dfb climate of the USA have higher average DDC compared to Europe, whereas the average DDC of catchments in the Cfb climate in Europe are higher.The AI shows the highest average DDC for catchments in the lowest (most arid) AI classes.Generally, average DDC decrease with increasing AI classes, apart from an occasional exchange between some of the neighboring classes.Average DDC are higher for catchments in the same AI classes in the USA (USA minus Europe is positive), especially for catchments in the lower AI classes.For T < 0, average DDC are generally highest for catchments with most months T < 0, intermediate for catchments that have the least months T < 0 and lowest for catchments that have 3 or 4 months T < 0. This ordering of DDC was found for both the entire dataset and the two regional subsets; how-  Figure 4. Averages of the ensembles of subsets of drought duration curves between the 81st and 100th percentiles (average DDC) for catchments in different classes of individual controls (rows) for the entire dataset (first column), the USA (second column) and Europe (third column).Average DDC are displayed as departures from the overall average of DDC for the specific selection of catchments, i.e., the overall average of all catchments (first column), all catchments in the USA (second column) and all catchments in Europe (third column).The fourth (right) column shows the difference in average DDC of catchments in the same classes of individual controls for the USA and Europe (average DDC USA minus average DDC Europe).ever, differences in average DDC between classes are small compared to the differences in average DDC between classes of other climate classification systems.E POT > P displays an ordering of average DDC with a general pattern of higher average DDC for the catchments with a high number of months E POT > P and lower average DDC for catchments with a low number of months E POT > P .Similar to the ordering of average DDC of the AI, the systematic ordering of average DDC (from high for catchments in low classes to low for catchments in high classes of E POT > P ) is occasionally in-terrupted due to an exchange between average DDC of catchments in neighboring classes.Catchments in lower classes of E POT > P are comparable between the two regions, whereas catchments in classes with more months E POT > P show distinctly higher average DDC for the USA.
Figure 4   classification systems, not all individual controls exert a similar control on drought duration in both regions.
For the individual control P of both the entire dataset and two regional subsets (USA and Europe), the class of catchments with the highest average DDC is the class with the lowest P and vice versa.Average DDC decrease from lowest to highest P class.Classes of T show the highest average DDC for catchments in both the lowest and highest T classes.Longer drought events are thus found for catchments with temperatures from the tails of the temperature distribution.However, differences in average DDC of catchments in different classes of T are not as distinct as for precipitation classes.Even smaller differences in average DDC are found for catchments in the different classes of A. In Europe, small catchments display the lowest average DDC, and large catchments the highest average DDC.This is different in the USA, where both small and large catchments exhibit the highest average DDC.Similar to A, E shows differences in order-ing of average DDC between the two regions.For the USA, the highest average DDC are displayed for catchments in the highest E class, whereas the highest average DDC of Europe are displayed for catchments in the lowest E class.These distinct differences are averaged out for the entire dataset.For the BFI, a high BFI coincides with higher average DDC, and a low BFI with lower average DDC.

Statistical comparison
Figure 5 shows the measures of statistical similarity (S KS and S MWU ) between ensembles of DDC for catchments in different climate classes.Patterns are again most of the time comparable between the entire dataset and the two regional subsets (USA and Europe).Differences occur for some specific combinations (e.g., DDC of catchments in the Dfc climate are comparable with DDC of catchments in the Dsb climate within the USA according to S KS ; however, DDC of catchments in these two climates are not comparable according to the same measure of similarity for the entire dataset, where the DDC of catchments in the Dfc climate of the USA are combined with the lower DDC of catchments in the European Dfc climate).
For the KG, DDC of catchments in the Cfc climate have significantly lower DDC values at most percentiles compared to all other climates.DDC of catchments in the Cfb climate are only similar to DDC of catchments in the Dfb climate according to both S KS and S MWU .DDC of catchments in this Dfb climate show little similarity to DDC of catchments in the other, seasonally influenced, climates again indicating the distinction between shorter droughts for catchments in climates affected by no or small seasonal influences (Cfc, Cfb and Dfb) and longer droughts for catchments in the other climates.However, DDC of catchments in these other climates (Cfa, Dfa, Csb, Dfc, Dsb) mostly do not show notable differences between each other according to both measures of statistical similarity.Out of these climates, catchments in the Dsb climate, which reveal the highest average DDC, also have the most distinctive DDC and only show similarity in DDC to catchments in the Dsc climate (and at some percentiles with catchments in the Csb and Dfa climates) for the entire dataset and to the Dfc climate for the regional subset of the USA.Regarding the differences between the USA and Europe, catchments in the Dfb and Cfb climates have similar DDC between the two regions according to both S KS and S MWU (presented in the diagonal of the matrices in the two right columns of Fig. 5).Catchments of the Dfc climate of the USA show significantly higher DDC values for most percentiles.The differences in DDC of catchments in different AI classes are most distinct between the lowest AI classes.The higher the AI class, the more neighboring classes of catchments show similarity in DDC, whereas for catchments in the lower AI classes, only DDC of catchments in direct neighboring classes occasionally show similarity.For the comparison between Europe and the USA, the lower AI classes (<50) show catchments with higher DDC in the USA according to both measures of similarity, whereas catchments of higher AI classes did not show many notable differences between the two regions.The small differences in average DDC of catchments in different classes of T < 0 are also reflected by the corresponding measures of statistical similarity, especially for Europe.For this region, DDC of catchments in almost all classes are similar to each other.Catchments in the same classes of T < 0 are mostly comparable between the USA and Europe.Differences in DDC for catchments in different classes of E POT > P are notable.S KS and S MWU indicate similarity only in DDC of catchments in neighboring classes.Differences between the USA and Europe are only found for the DDC of catchments in the two highest classes of E POT > P .For the other classes, the DDC are similar.
Figure 6 displays the statistical comparison of DDC grouped by individual controls.DDC of catchments in different classes of P are mostly different from each other according to both S KS and S MWU .Classes 3 and 5 (higher P ) are comparable between the two regional subsets, whereas classes 1 and 2 (lower P ) have higher DDC for catchments in the USA according to both measures of similarity.DDC of catchments of intermediate T classes are similar to each other as well as DDC of catchments of the lowest and highest temperature classes for the entire dataset and for the regional subset of the USA, confirming that long-duration droughts are longer in both colder and warmer catchments.These differences are less distinct for Europe; both S KS and S MWU indicate a high number of similar DDC classes.Differences in DDC between Europe and the USA are found for classes of catchments with a lower T .Catchments grouped by A hardly show differences in DDC.Only for the entire dataset do the largest catchments have different DDC.According to both S KS and S MWU , catchments in the highest E class of the USA have higher DDC compared to DDC of catchments in the other E classes, whereas for Europe, catchments in the lowest E class have higher DDC.The patterns of statistical similarity specific to the two regional subsets are not found for the entire dataset.For the BFI, DDC of catchments in different classes are often different from each other according to both measures of statistical similarity, besides some similarity between neighboring classes. .The darker the color, the more similar the percentiles (legend is presented in Fig. 5).The left two columns show these measures of similarity for the entire dataset (in green) and the right two columns for the two regional subsets: USA (blue, cells above the diagonal of each matrix) and Europe (red, cells below the diagonal of each matrix).Measures of similarity between DDC of catchments in the same climate classes of Europe and the USA are displayed in the diagonal cells of the matrices (purple).No data (grey) indicates the combinations that were not considered (i.e., when the number of catchments was smaller than 10 in one of the two regions).
to their characteristic drought duration distribution, which is in line with the results found in Barker et al. (2016) and Van Loon and Laaha (2015).These individual controls could therefore be seen as the dominant control on the drought duration, which confirms the findings of Van   seen in the boxplots of Fig. 7.In the end, these differences in dominant individual controls over different classes of climate classification systems affect their overall suitability for differentiating catchments according to drought duration in observed streamflow.Furthermore, it partly explains why DDC of catchments in the same climate classes are not always comparable between the two regional subsets (USA and Europe).
For the KG climate classification system, catchments that were located in the two climates that were not influenced by seasonality in precipitation or the occurrence of a cold or hot season, Cfb and Cfc, show the lowest average DDC (shortest droughts).According to the two measures of similarity used in this study, catchments in the Cfc climate (generally wetter than most other climates; Fig. 7) were distinctly different from DDC of catchments in the other climates, and the catchments in the Cfb climate were only comparable with DDC of catchments in the Dfb climate.Catchments in this Dfb climate were expected to have longer drought durations due to the occurrence of a cold season causing low streamflow due to temporary snow storage (Van Loon et al., 2014).Our tests show that although this influence is visible in the average DDC, it is not often statistically significant when comparing DDC values at the different considered percentiles.Also notable was the difference in average DDC for catchments in the Cfb climate between Europe and the USA.This was the only combination of climate classes where average DDC of catchments in Europe were distinctively higher, possibly explained by the wetter condition in the Cfb climate for the catchments in the USA (Fig. 7).Catchments in the Dfc climate, on the other hand, have higher average DDC for the USA compared to Europe, which is likely related to differ-ences in dominant climate and catchment controls between the two regional subsets (lower P and higher BFI for catchments in the USA; Fig. 7).
Hot summer climates without seasonality in precipitation (Cfa, Dfa) consist of catchments with higher average DDC compared to the DDC of catchments with warm summer climates (Cfb, Dfb), which is in contrast to Tijdeman et al. (2012).This difference could possibly be attributed to the fact that the study by Tijdeman et al. (2012) is based on global data, whereas this study only deals with catchments in the Dfa and Cfa in the USA.The differences in P between the hot and warm summer climates (Fig. 7) in the USA (Cfa and Dfa have lower P values) may not reflect those on a global scale.Other reasons might be related to modeling assumptions needed in large-scale gridded models.Nevertheless, results of this study indicate that the occurrence of a hot summer is an important control on long-duration droughts as well.Measures of statistical similarity show little differences between DDC of catchments in the hot summer climates and DDC of catchments in the other seasonal climates (Csb, Dfc, Dsb, Dsc).Results thus indicate that the KG is mainly suitable for making the distinction between catchments in climates with and without seasonal influences.
Catchments in the KG climate classes that showed the highest average DDC were catchments in the snow climates with cool winters or seasonality in precipitation (Dfc, Dsb and Dsc), which matches findings by Tijdeman et al. (2012), Van Lanen et al. (2013) and Van Loon et al. (2014).Therefore, a climate classification system that specifically aims to reflect the length of the cold season (months with an average temperature below zero (T < 0)) was expected to be suitable for differentiating catchments according to drought duration.However, this was not the case, and differences between average DDC were small and the measures of statistical similarity did not indicate strong differences between classes of catchments, especially for Europe.These European catchments with most months of T < 0 are partly located in Scandinavia and the Alps, which have been related to short drought durations before (Hannaford et al., 2011).Altogether, a climate classification system that only includes cold season dynamics while ignoring other drought prolonging processes (e.g., total amount and seasonality in precipitation or the occurrence of hot summers) is not the most suitable for differentiating catchments with different drought duration characteristics.
More suitable for such a differentiation of catchments are the climate classification systems that take into account the dominant annual precipitation control (months with average potential evaporation larger than the precipitation (E POT > P ) and the aridity index (AI); note that the KG does not have such an annual precipitation term).E POT > P not only takes into account the total precipitation, but it is also influenced by seasonality in precipitation and the occurrence of hot summer temperatures.This climate classification system shows a sorting of average DDC over the different classes of E POT > P that followed the hypothesized pattern of higher DDC for catchments in the higher E POT > P classes and lower DDC for catchments in the lower E POT > P classes, which makes it a suitable climate classification system for differentiating catchments according to drought duration.The same classes for Europe and the USA show similarity in DDC for catchments located in the lower E POT > P classes; however, catchments located in the higher E POT > P classes show significantly higher DDC values at most percentiles for the USA.One possible explanation could be the difference in distribution of KG climates between these regions for these higher E POT > P classes (Fig. 8).Catchments located in high E POT > P classes of Europe are mainly from the Cfb climate, whereas catchments in these higher classes of the USA mostly consist of hot summer (Dfa and Cfa) and seasonal (Csb, Dsb) climates, which have been shown to have longer drought durations.
Another possible factor that might explain these differences in classes is the difference in latitude between Europe and the USA, where for the same E POT > P classes, the lower-latitude USA has shorter summer days with higher temperatures compared to longer summer days with lower temperatures in Europe.In addition, Van der Schrier et al. (2011) showed that annual actual evaporation calculated with a simple water balance model that uses the Thornthwaite formula to compute E POT leads to an underestimation of evaporation in parts of the USA and an overestimation in northwestern Europe.Defining evaporation with another method may therefore lead to more comparable classes between the USA and Europe.
The AI also proved to be suitable for differentiating catchments according to drought duration, with a sorting of average DDC over the different AI classes that clearly followed the expected pattern of higher average DDC for catchments in lower AI classes and lower average DDC for catchments in higher AI classes.The AI was applied in previous studies, focusing more on the arid spectrum (low values) of this index (e.g., Spinoni et al., 2015), where all non-arid regions (higher AI) are generalized to one humid class.Nevertheless, results of this study indicate that the wetter range of this index is also suitable for differentiating catchments according to drought duration.When comparing DDC of catchments in Europe with the USA, catchments in the lower three AI classes (<50) of the USA have higher average DDC.This difference was not explained by differences in dominant controls P (lower in Europe) and BFI (higher in Europe) for catchments in these climate classes (Fig. 7).The difference in KG climates falling into the lowest three AI classes (Fig. 8) is more likely to explain this difference in DDC.Catchments in the lower AI classes of Europe mainly encompass the Cfb climate, whereas catchments in the USA are represented by a mixture of different climates, including the climate classes that have shown a drought prolonging control.
Overall, results of this study show that long-duration droughts are modified by both climate and catchment controls.Still, different climate classification systems have been shown to be suitable for differentiating catchments according to long-duration droughts in observed streamflow under a wide range of catchment properties.This suggests that, for the selected catchments, catchment controls were not dominant over climatic controls, which is in line with the previous catchment classification studies of Berghuijs et al. (2014) and Coopersmith et al. (2012).Climate classification systems are thus useful for identifying regions with different sensitivities to long-duration droughts in observed streamflow, but they do not necessarily distinguish regions with unique hy-drologic drought duration characteristics.This is confirmed by differences in DDC of catchments in the same climate classes in Europe and the USA (e.g., the KG climates Cfb and Dfc), likely to be caused by differences in dominant individual controls P and BFI.Most suitable for differentiating catchments according to drought duration within both Europe and the USA are climate classification systems that include an absolute water balance term (AI or E POT > P ).However, both these classification systems show differences in DDC of catchments in the same classes of Europe and the USA for low AI and high E POT > P classes.Combining information of the different climate classification systems and individual climate and catchment controls seems to be the most suitable way for large-scale drought studies to stratify regions, especially when comparing the USA with Europe.

Evaluation of the method
This study compared DDC of catchments of classes of four climate classification systems and five individual controls using a dataset of near-natural streamflow records.Being based solely on observations means that catchments in this dataset are not uniformly distributed for the two regions.For example, for Spain, only a small number of streamflow records was available that met the selection criteria of being near-natural without falling dry too often.Despite this unequal coverage, the dataset used includes catchments with a large variety of climatic and catchment properties, which allowed for a detailed comparison within and between classes of catchments.Furthermore, this study only considered near-natural catchments, which are potentially biased towards smaller headwater catchments.For larger catchments, catchment controls such as lakes and wetlands might have a stronger effect.However, the anthropogenic controls on streamflow drought characteristics in these catchments might dominate the natural ones and, therefore, these catchments were excluded in this study.For the final selection of catchments, the BFI was calculated following the approach of Gustard and Demuth, 2009.It should be kept in mind that this approach (which uses turning points in minimum flow of a 5-day filter to define base flow) was originally designed for rainfall dominated regimes and might represent base flow differently for some of the snow or glacier melt dominated catchments with long-lasting seasonal melt peaks and recessions that are thus more related to climate than to catchment controls (Gustard and Demuth, 2009).Although out of the scope of this research, a more catchment control-specific representation of base flow could be obtained with other calculation procedures.
Droughts were identified from near-natural streamflow records using a threshold-based approach.This study focused solely on drought duration.However, there are other characteristics that quantify properties of hydrological drought, such as (standardized) deficit volume (which is of interest for, e.g., the water supply sector).Although other drought characteristics were out of the scope of this research, the proposed method lends itself to investigating the effect of climate and catchment controls on other drought properties such as deficit volume.
The drought identification method was specifically chosen to avoid artificial drought events caused by methodological choices rather than by water deficits (Beyene et al., 2014).Drought durations computed with this method were transformed to cumulative distributions and displayed as a function of their fraction of non-exceedance (comparable to Tallaksen et al., 2009).Another approach would be to show these cumulative drought duration distributions as a function of the total number of drought events as in Fleig et al. (2011).This approach conserves the frequency of drought events, but for this research, the used approach was preferred to allow for a systematic comparison between all classes of DDC.However, since the used approach loses information about the frequency, it is essential to have a drought identification method that does not introduce artificial drought events and thus conserves an equal fraction of time in drought for all streamflow records.Therefore, procedures that influence this fraction, e.g., smoothing of the threshold, pooling of drought events or the exclusion of minor drought events were not applied in this study.
For the statistical comparison of DDC, both the KS and MWU tests were applied.Using two tests increases the robustness of the analysis as they focus on different aspects of the distribution.However, one assumption of the MWU test (equal shape in distribution of DDC values of two classes) did not hold true for all combinations of classes and percentiles.Therefore, results of this test were interpreted as the difference in mean ranks and not as a difference in the median (Bergmann et al., 2000).The strength of the statistical design of this study is that it indicates whether differences occur between neighboring classes (possibly related to our grouping criteria) or non-neighboring classes.This systematic statistical comparison also provides more insight into which classes are similar to each other for predefined climate classification systems, e.g., which KG climates have similar DDC.This information would be lost if, for example, a Kruskal-Wallis test was applied, which only detects whether one group is different from the total.

Conclusions
This study evaluated four different climate classification systems and five classified individual controls for their suitability for differentiating catchments according to drought duration characteristics.Results show that from the individual controls, precipitation and the base flow index were the most suitable differentiators.Climate classification systems that included an absolute precipitation term, the aridity index and months with average potential evaporation larger than the precipitation were most suitable for differentiating catch-Hydrol.Earth Syst. Sci., 20, 4043-4059, 2016 www.hydrol-earth-syst-sci.net/20/4043/2016/ ments according to drought duration.The Köppen-Geiger climate classification system was able to differentiate catchments according to drought duration between seasonally influenced climates (dry, cold or hot seasons) and climates with no or little seasonal influences.However, the high number of seasonal climate classes with similar DDC does not make this climate classification the most suitable differentiator.DDC of catchments of the same climate classes were not always comparable between Europe and the USA.For the Köppen-Geiger climate classification system, this is likely related to differences in dominant controls (precipitation and base flow index) over the same Köppen-Geiger classes.The higher number of catchments located in climates that are influenced by seasonality in precipitation and temperature in the USA for low aridity index classes and classes with a high number of months with average potential evaporation larger than the precipitation is likely the cause of differences in DDC between these classes of catchments in the two regions.
Although climate classification systems that include an absolute precipitation control are most suitable for differentiating catchments according to drought duration, their power to differentiate is likely to be improved when complemented with information of other climate classification systems and individual climate and catchment controls.Furthermore, such a combination of information of different climate classification system and individual controls likely results in a better comparability of the same classes between Europe and the USA.Knowledge about differences in sensitivities to hydrologic drought events can be applied in drought monitoring and early warning systems, e.g., by tailoring such systems to regions with a similar sensitivity to hydrologic drought.Furthermore, being able to better differentiate catchments according to drought duration allows for more accurate stratification in comparative drought studies.However, further research is needed to combine these insights into one classification system that is specifically designed to classify the sensitivity to observed hydrologic drought duration.

Figure 1 .
Figure 1.Catchment locations and two climate classifications (Köppen-Geiger and the aridity index).A description of these two climate classification systems is presented in Sect.2.1.

Figure 2 .
Figure 2. Conceptual approach.(a) Total ensemble of drought duration curves for both Europe (left) and the USA (right), as an example.(b) (Left): example of the grouping of drought duration curves based on precipitation classes with boxplots of precipitation values for catchments in both the USA (blue) and Europe (red) and background colors indicating the class ranges.(b) (Right): corresponding exemplary ensembles of DDC groups for precipitation classes 1, 2 and 3 for the USA.(c1): visualization of the average DDC of catchments in the three exemplary classes displayed as departures from the overall average of DDC of the USA.(c2): statistical comparison of distributions of DDC at each percentile between 81 and 100 (in the boxplots displayed for percentiles 81, 91 and 100).Significance of differences in DDC values per percentile is indicated in the matrices below (1 = significant, 0 = not significant).The final measure of similarity (sum of significance scores over the 81st-100th percentiles) is shown on the right.

Figure 3 .
Figure3.Averages of the ensembles of subsets of drought duration curves between the 81st and 100th percentiles (average DDC) for catchments in different classes of climate classification systems (rows) for the entire dataset (first column), the USA (second column) and Europe (third column).Average DDC are displayed as departures from the overall average of DDC for the specific selection of catchments, i.e., the average of all catchments (first column), all catchments in the USA (second column) and all catchments in Europe (third column).The fourth (right) column shows the difference in average DDC of catchments in the same climate classes for the USA and Europe (average DDC USA minus average DDC Europe).
from the average or difference between USA and Europe in weeks)

Figure 5 .
Figure 5. Number of percentiles with similar DDC values of catchments in different classes of climate classification systems according to the KS and MWU tests, reflected by two measures of statistical similarity (S KS and S MWU ).The left two columns show these measures of similarity for the entire dataset (in green) and the right two columns for the two regional subsets: USA (blue, above the diagonal of each matrix) and Europe (red, below the diagonal of each matrix).Measures of similarity between DDC of catchments in the same climate classes of Europe and the USA are displayed in the diagonal cells of the matrices (purple).No data (grey) indicates the combinations that were not considered (i.e., when the numbers of catchments were smaller than 10 in one of the two regions).

Figure 6 .
Figure 6.Number of percentiles with similar DDC values of catchments in different classes of individual controls according to the KS and MWU tests, reflected by two measures of statistical similarity (S KS and S MWU ).The darker the color, the more similar the percentiles (legend is presented in Fig.5).The left two columns show these measures of similarity for the entire dataset (in green) and the right two columns for the two regional subsets: USA (blue, cells above the diagonal of each matrix) and Europe (red, cells below the diagonal of each matrix).Measures of similarity between DDC of catchments in the same climate classes of Europe and the USA are displayed in the diagonal cells of the matrices (purple).No data (grey) indicates the combinations that were not considered (i.e., when the number of catchments was smaller than 10 in one of the two regions).

Figure 7 .
Figure 7. Distribution of individual controls P (upper row) and BFI (lower row) over classes of different climate classification systems for the USA (blue), Europe (red) and the entire dataset (white).Background colors indicate the ranges of classes of the individual controls (see Fig. 4 for class ranges).Box: percentiles 25, 50 and 75.End of whiskers: percentiles 5 and 95.Points: outliers.
presents the average DDC of catchments grouped by individual controls.Average DDC of catchments in the same classes are again most of the time higher for the USA compared to Europe.However, in contrast to the four climate Hydrol.EarthSyst.Sci., 20, 4043-4059, 2016www.hydrol-earth-syst-sci.net/20/4043/2016/ Tijdeman et al.: Controls on hydrologic drought duration in Europe and the USA Zaidman et al. (2002))Van Loon et al. (2014)that drought duration is modified by both catchment (groundwater response time) and cli-mate (seasonality in precipitation and the occurrence of hot or cold seasons) controls.Our results also fit with findings byZaidman et al. (2002), who found that the 1976 drought in Europe was more persistent in regions with a high BFI or low P .The distributions of dominant individual controls, however, are not always comparable between the classes of the four different climate classification systems, as can be E.
Distribution of different KG climates for all catchments with an AI smaller than 50 (left) or E POT > P of 5 or more months (right) for both the USA and Europe.