Technical Note: Testing an improved index for analysing storm discharge–concentration hysteresis

Abstract. Analysis of hydrochemical behaviour during storm events can provide new insights into the process controls on nutrient transport in catchments. The examination of storm behaviours using hysteresis analysis has increased in recent years, partly due to the increased availability of high temporal resolution data sets for discharge and water quality parameters. A number of these analyses involve the use of an index to describe the characteristics of a hysteresis loop in order to compare storm behaviours both within and between catchments. This technical note reviews the methods for calculation of the hysteresis index (HI) and explores a new more effective methodology. Each method is systematically tested and the impact of the chosen calculation on the results is examined. Recommendations are made regarding the most effective method of calculating a HI which can be used for comparing data between storms and between different water quality parameters and catchments.


Introduction
The analysis of hysteresis patterns is a key tool for the interrogation of in-stream physical and chemical responses to storm events, which have been shown to be important periods for the transport of nutrients and sediment within catchments (Bowes et al., 2003;Jarvie et al., 2002;Jordan et al., 2007;Burt et al., 2015;Evans and Johnes, 2004). In the context of this paper, hysteresis is defined as the nonlinear relationship between discharge and concentration of nutrients or sediment. When discharge-concentration data are plotted a cyclic pattern is often observed, and the size and shape of the loop is dependent on the lag in response between the discharge and water quality variables. Quantification of hysteresis allows multiple storm behaviours to be examined between and within catchments as well as under varying antecedent conditions, for discharge and a wide range of hydrochemical parameters. This can provide insight into catchment function, allowing the development and testing of process-based hypotheses. This type of analysis has been used in recent years by many authors investigating nutrient concentrationdischarge relationships in catchments of differing environmental character (e.g. Bowes et al., 2015;Darwiche-Criado et al., 2015;Cerro et al., 2014;Rodriguez-Blanco et al., 2013;Oeurng et al., 2010;Eder et al., 2010;Evans and Johnes, 2004) but, traditionally, has been used for the examination of turbidity or suspended sediment data (e.g. Ziegler et al., 2014;House and Warwick, 1998;Williams, 1989;Tena et al., 2014;Klein, 1984;Whiting et al., 1999). Hysteresis analysis has been used to support the investigation of the temporal variations in nutrient transport to streams as a means of characterising the likely contributing source areas and flow pathways linking source to stream in complex landscapes (Outram et al., 2014;Bowes et al., 2015;Lloyd et al., 2016a). Hysteresis patterns with similar characteristics can be observed for a variety of different reasons; however, it is generally assumed that clockwise hysteresis, caused by concentrations increasing more rapidly than discharge during the rising limb, suggests a source close to the monitoring point. Conversely, anticlockwise hysteresis generally signifies a longer lag between the discharge and concentration peak, suggesting that the source was located further from the monitoring point. Williams (1989) provides a detailed summary of  Figure 1. (a) Impact of storm initial concentration, (b) storm initial discharge on the value of the calculated HI when the mid-point in discharge and raw data is used and (c) an idealised and normalised storm illustrating the impact of measuring different quantiles of flow on the HI calculated, where HI L and HI LA are the original and adapted Lawler et al. (2006) methods, respectively and HI new , the proposed new method. Colours represent different discharge intervals measured. different shape hysteresis plots and the possible underlying mechanisms.
For hysteresis analysis to be effective and easy to interpret there is a need to develop a robust method of classifying storms according to their hysteretic behaviour. Many papers have classified storms into clockwise or anticlockwise responses, and described the strength of the hysteresis as small or large (Bowes et al., 2015;Evans and Davies, 1998;Butturini et al., 2008). Other authors have used an index approach, which allows a dimensionless quantification of the hysteresis, and thus comparison of hysteresis indices between catchments of differing size, morphology, and hydrological function. An index approach is also useful as it provides information about both the direction and strength of the hysteresis. Hysteretic indices proposed by Butturini et al. (2008) provide semi-quantitative methods to describe whether the measured parameter is enriched or diluted during a storm event and to assess the area inside the hysteresis loop, along with its direction. Langlois et al. (2005) propose a quantitative method which involves splitting the discharge hydrograph into the rising and falling limb and fitting regression lines to each data set. The hysteresis index is calculated as the ratio (rising:falling) of the areas under the regression curves. Whilst this index provides a quantitative solution, the authors suggest that the method should only be applied to simple uni-directional loops, i.e. not those which exhibit figure-of-eight or more complex behaviours. A quantitative index was also proposed by Lawler et al. (2006), which uses the ratio of the turbidity (or other parameter) concentration on the rising and falling limb, at the mid-point in the discharge. The mid-point in discharge is defined as 50 % of the range in discharge during the storm event. This index has been used by a number of other authors (McDonald and Lamoureux, 2009;Outram et al., 2014), as it is flexible and can be applied to hysteresis loops of all shapes. However it is not without limitations. In a recent paper, Aich et al. (2014) highlight that the index of Lawler et al. (2006) in its current form becomes skewed at higher concentrations, with a smaller index calculated for loops of the same shape and area in the case of storms commencing at a higher concentration (Fig. 1a). In addition, the calculation of the index using only the mid-point (50 %) in discharge can be problematic. Lawler et al. (2006) state that the mid-point was used as it avoids the often noisy sections at the beginning and end of the loops. However, the result of the calculated index may be misleading in many figure-of-eight scenarios, especially those which cross close to the mid-point in discharge (see Fig. 1b). The example shown in Fig. 1b illustrates that a hysteresis index (HI) calculated at the mid-point in discharge would suggest that there was very little hysteresis, even though there is a strong effect but in different directions during different periods of the storm event. As suggested by Lawler et al. (2006), the HI can be calculated at multiple increments through the flow range and an average HI value gained. Against the above background, this technical note reports the impact of the chosen method on the index values generated from a series of storms of varying size and hysteretic shapes, using an adapted version of the Lawler et al. (2006) index (HI LA ). The paper also introduces a new method for calculating the hysteresis index (HI new ) and, as a result of this analysis, suggests a recommendation for the most appropriate calculation for a HI for storm-driven nutrient transport in catchments.

Data sets
The example uses a series of storms extracted from high temporal resolution (15 min) data collected on the River Wylye at Brixton Deverill (Wiltshire, UK) as part of the Defra Demonstration Test Catchment project (McGonigle et al., 2014) from March 2012 to March 2014. Detailed descriptions of the field site and the data sets are available in previously published work (Lloyd et al., 2016a, b) . For the purposes of this study, discharge data were obtained from the Environment Agency gauge (Gauge Number 43806) and turbidity data were collected using a YSI 6-Series sonde, which was  cleaned and calibrated once a month over the monitoring period. Turbidity (measured in Nephelometric Turbidity Units (NTU)) was chosen for this study as it is the most widely examined parameter in terms of hysteresis and the storms selected from the data set exhibit a wide range of turbidity values and hysteretic shapes. A total of 66 storms were extracted for this analysis from the 2-year observational data. A storm was classified as an increase in discharge of more than 20 % above baseflow and the end of the storm was determined by either a return to baseflow conditions or when discharge began to rise again if another storm occurred before the system had returned to baseflow conditions. Previous work had quantified the uncertainty associated with the discharge and turbidity measurements (Lloyd et al., 2016a, b) and this provided 100 resampled iterations of each measured parameter for every storm, accounting for observational uncertainties, for this analysis. Figure 2a-f(I) shows some example storms, where the boxes represent the 5th-95th percentile uncertainty range for each data point.

Lawler et al. (2006) method and modification
The HI was then calculated according to the standard method of Lawler et al. (2006) (HI L ) for combinations of all 100 iterations of each of the storms to provide a distribution of HI when the mid-point in discharge was calculated (50 %). The Lawler et al. (2006) method was also adapted (HI LA ), where HI was calculated at every 25, 10, 5, and 1 % increments of the discharge (see Fig. 3 for visualisation) as shown below.
If T RL >T FL (clockwise hysteresis): Or, if T RL < T FL (anticlockwise hysteresis): where T RL is the value of turbidity at a given point in flow on the rising limb of the hydrograph and T FL is the value on the falling limb. When multiple sections per storm were calculated, the average value was taken to represent the HI of the complete storm event. In some cases there were no corresponding values on both the falling and rising limbs -when this occurs the maximum number of available pairs of data were used to calculate the index. This usually only occurred at lowest discharges and when a large number of intervals were being analysed. This meant that the number of missing pairs was small compared with the available pairs (< 5 %) and as a result had little impact on the overall calculation. The analyses were completed for both the raw data and for normalised storms to assess the impact of the different analysis methods on the HI values obtained. The data were normalised using the following equations: where Q i / T i is the discharge/turbidity at timestep i, Q min / T min is the minimum storm parameter value and Q max / T max is the maximum storm parameter value.

Proposed new hysteresis index method (HI new )
A new method of calculating a HI was also tested (HI new ) with the aim of eliminating the impact of a changing baseline value on the ratio as multiple measurements are taken from the same storm. The new index uses the difference between the turbidity values on the rising and falling limbs of the normalised storms, rather than a ratio, and effectively normalises the rising limb at every measurement point, thereby resulting in an index between −1 and 1: As with the other methods, the analysis was carried out using different intervals of discharge (25, 10, 5, and 1 %) and the mean was used as the final HI value for the storm. The impact of this number of chosen intervals of discharge on the magnitude of the resulting HI was tested. The resulting distributions of HI values for each method were then scrutinised using boxplots. Differences between the distributions of data for each storm were analysed statistically using ANOVA where normality and variance assumptions were met, and the non-parametric alternative Kruskal-Wallis H on ranked data where the ANOVA assumptions did not hold. When a significant difference between the groups was detected, a pairwise Tukey test was used to establish which of the groups were contributing to the effect. The main aim of the analysis was to determine the point at which sufficient intervals of discharge were used so that there was no statistically significant difference between the different data sets for each storm.

Results and discussion
A total of 66 storms were analysed using the three methods for calculating the HI, which included 35 anticlockwise loops, 11 clockwise loops, 12 figure-of-eight loops which were mainly anticlockwise, and 8 figure-of-eight loops which were mainly clockwise (loop shapes were identified by visual inspection). The peak turbidity during the storms ranged between 10 and 392 NTU (mean = 91 NTU) and the starting values were between 2 and 31 NTU (mean = 8 NTU). Figure 2 shows six example storms (a-f, panel I) from the range of behaviour identified above, each with varying shape and size.  Figure 2a-f(II) shows that the distributions of HI values (using HI L ) measured at only 50 % of discharge are often very different from the analyses which measure multiple sections across the loop (HI LA ). The more complex the shape of the loop, the more measured sections are needed to represent it adequately. The analysis shows that by using 5 % increments of discharge (19 sections), 98 % of the storms analysed showed stable distributions and therefore no significant changes were observed when additional increments were included. While including more increments of the loop in the analysis does improve the HI results, it does not solve all of the issues highlighted earlier. Both HI L and HI LA are sensitive to the size of the storm and, as a result, for a similar pattern in hysteresis but a larger magnitude of storm, a comparatively smaller value would be calculated for the index, as shown in Fig. 1a. This means that the results generated for a series of storms are very difficult to interpret and it is difficult to compare between individual storms and catchments. By normalising the storms as described above and continuing to use the HI LA method, the comparability of the outputs between storms is improved as they are all assessed on the same scale. However, if multiple increments of discharge are included, which has been shown to be beneficial, then effectively each of the individual measured sections of the storm need to be normalised, otherwise the problem is reduced but not eradicated. This problem is illustrated in Fig. 1c, which shows an example of an idealised and normalised storm where the width of the loop remains constant through most of the storm. However, at different quantiles of flow, HI value varies due to the loop gradient, the HI is inflated towards the lower and reduced at higher quantiles of discharge. The HI new was designed to overcome this problem. The new index uses the difference between the normalised turbidity values on the rising and falling limb at each increment of discharge rather than the ratio, thereby directly quantifying the width of the loop. Figure 4 shows how the new index effectively normalises the rising limb and examines the relative behaviour of the falling limb, thereby identifying the proportion of the storm occurring in a clockwise or anticlockwise phase. For this new method to be robust, it is necessary to normalise the data as described earlier before the analysis. Figure 2a-f(III) show the example storms in their normalised forms. The new index produces a value between −1 and 1, where 0 represents no hysteretic pattern and positive values clockwise and negative values, anticlockwise hysteresis. A figure-of-eight storm will be represented as a weighted average of the intervals of discharge measured when the storm was in a clockwise phase and when it was in an anticlockwise phase. Therefore, for example, if the storm exhibits anticlockwise behaviour for a large proportion of the storm event the average HI new will produce a negative number. It should be noted that in the unusual case that an exactly symmetrical figure-of-eight storm is presented the index would produce a value of 0, suggesting no hysteresis. Using the HI value in conjunction with loop area will however provide clarification, as a storm which has an HI of 0 but a positive loop area has to be a complex loop shape. The advantage of our new technique is that the user can choose to interrogate other output metrics within these results, such as the quantified loop area and the distribution of HI values calculated for each section of the loop in addition to the averaged HI value. By looking at the distribution of values it is simple to identify complex loop shapes such as figure-of-eight (due to both positive and negative values calculated for the various loop sections) and this ensures correct interpretation of the HI values. Although we do not explore the advantage of these further analyses here, we suggest that they potentially provide a richer analysis of hysteresis dynamics that we aim to explore in future papers. We suggest that the new index provides a consistent approach to the core loop characteristics and therefore is more easily interpretable by the user when comparing behaviour between storms or field sites. Figure 2a-f(IV) show the resulting distributions of HI new generated using varying increments of discharge. The analysis shows that the distribution of calculated values was generally more stable compared with the HI LA method and, in many cases, fewer increments of discharge were necessary to produce a statistically stable representation of the storm loop shape (Table 1). The results demonstrate that increasing the increments to every 10 % of discharge allowed 95 % of storms and using 5 % increments allows 100 % of storms to be robustly characterised in terms of their loop shape, meaning that the addition of more sections did not significantly alter the distribution of HI results.

Conclusions and recommendations
The concept of using an index to aid the quantification of storm hysteresis has been established for over two decades. However, few papers have chosen to use them, perhaps due to the limitations associated with the most common meth-ods. This technical note was designed to test systematically, for the first time, the way that the HI is calculated and to quantify the impact of the chosen method on the results. This technique is useful when the user's interest is in the relative characteristics of the loop geometries. The analysis has led to a number of recommendations concerning how the HI should be calculated in order to produce results which are both statistically robust and comparable between storms and field sites: 1. Storms should be normalised before analysis so that multiple storms can be robustly compared.

2.
A difference method, such as the new index (HI new ) proposed here, should be used in preference to a ratio method as it produces results which are easier to interpret, allowing quantification of the extent of the hysteresis effect that can be directly compared between contrasting catchments even when the magnitude of the storms varies greatly.
3. Multiple sections of each loop should be analysed so that the extent and direction of the hysteresis can be accounted for throughout the flow range. Sections should be calculated at least every 10 % of the discharge range, although every 5 % is recommended as it is likely, based on our analysis, to produce robust results for almost all storm sizes and shapes.
4. The distribution of HI values calculated across the sections should be examined in addition to the averaged value, as this aids robust classification of complex loop shapes, including figure-of-eight loops.
Undertaking the analysis of hysteresis loops using these guidelines improves the clarity of the hysteresis index as a diagnostic tool for the analysis of storms and how dischargeconcentration patterns vary. The new index (HI new ) is able to describe robustly the shape and direction of a hysteretic pattern in storms of any size, and can be used to compare storms from multiple catchments. This means that the index becomes more useful as it has the potential to become a standardised analytical technique that can be utilised by the water quality research community. Lloyd et al. (2016a) illustrates the use of the new hysteresis index to investigate storm behaviours across different water quality parameters and between contrasting catchments. The cited study exemplifies the power of having such a summary statistic, as different parameters and field sites can be rapidly and robustly compared. The information provided by the HI new can be used in conjunction with other common metrics such as storm maximum concentration to produce a useful and robust quantitative representation of storm hydrochemical behaviour. This is timely given the marked increase in the number of catchment-scale water quality monitoring initiatives, which are now employing high temporal resolution monitoring to improve understanding of pollution sources and delivery pathways. Our ongoing research is exploring the use of this new index in understanding differences in catchment dynamics associated with storm behaviours.