Does nonstationarity in rainfall require nonstationary intensity – duration – frequency curves ?

In Canada, risk of flooding due to heavy rainfall has risen in recent decades; the most notable recent examples include the July 2013 storm in the Greater Toronto region and the May 2017 flood of the Toronto Islands. We investigate nonstationarity and trends in the short-duration precipitation extremes in selected urbanized locations in Southern Ontario, Canada, and evaluate the potential of nonstationary intensity–duration–frequency (IDF) curves, which form an input to civil infrastructural design. Despite apparent signals of nonstationarity in precipitation extremes in all locations, the stationary vs. nonstationary models do not exhibit any significant differences in the design storm intensity, especially for short recurrence intervals (up to 10 years). The signatures of nonstationarity in rainfall extremes do not necessarily imply the use of nonstationary IDFs for design considerations. When comparing the proposed IDFs with current design standards, for return periods (10 years or less) typical for urban drainage design, current design standards require an update of up to 7 %, whereas for longer recurrence intervals (50–100 years), ideal for critical civil infrastructural design, updates ranging between ∼ 2 and 44 % are suggested. We further emphasize that the above findings need re-evaluation in the light of climate change projections since the intensity and frequency of extreme precipitation are expected to intensify due to global warming.

. IPCC AR5 conceptual representation of changes in probability density functions of daily temperature (ac) and precipitation. The previous and the new distributions are marked by the solid and the dashed lines respectively. The frequency (probability of occurrence) of extremes is denoted by the shaded areas (Source: IPCC AR5 Working Group I report, Figure 1.8, page no. 134).

Multiplicative Random Cascade (MRC) Models for Temporal Disaggregation of Rainfall
Multiplicative Random Cascades were first developed for studies of turbulence (Mandelbrot, 1999;Yaglom, 1966) with a motivation to have mathematical models, which produce time series that have statistically scale-invariant properties. In general, random cascade model for rainfall assumes a division of known rainfall total R L occurring over an interval of time among a number of smaller intervals of fixed size, which implies a successive fine graining process that starts from an original, large-scale resolution R L and continues till a target small-scale resolution is reached. The approach is based on scaling laws, which describe the scale-invariant properties or relationships that connect the statistical properties of rainfall for different timescales (Willems, 2012). The number of subintervals is defined by the branching number b, is set to 2, which is a redistribution of total rainfall in period i at a resolution r, , , between the amount associated with the first and last half respectively.
Here we implement a micro-canonical (exact conservation of mass in each cascade branching) cascadebased temporal disaggregation model as proposed by (Olsson, 1998), in which daily rainfall is disaggregated using a uniformly distributed generator, dependent on rainfall intensity and position of the rain sequence. The technique was later successfully implemented by (Güntner et al., 2001;Jebari et al., 2012;Rana et al., 2013) for temporal disaggregation of point rainfall and the development of IDF-curves from short-duration rainfall extremes. In the disaggregation process, each time interval (box) at a given resolution (for example 1 day) is split into two half of the original length (1/2 day). The procedure is continued as a cascade until the desired time resolution is reached, i.e., to ¼ day, then to 1/8 of a day and so on. Each step is termed as a cascade step, with cascade step 0 as the longest time period with only one box (i.e., a day). The distribution of the volume between two sub-intervals (or smaller boxes) is computed by multiplication with the cascade weights ( , , ir R  to the next half. In each branching two possibilities exist: (1) W 1 = 0, W 1 = 1 (2) 0 < W 1 < 1. The occurrence of (1) and (2) may be expressed in terms of probabilities, P 01 = (1 0 ⁄ ) or (0 1 ⁄ ) = P(W 1 = 0 or W 1 = 1) and P xx = ( ⁄ ) = P(0 < W 1 < 1) = 1 -P 01 .
Depending on the range of resolution involve, P 01 either be assumed as resolution independent or parameterized as a scaling law: 01 ( ) = 1 2 where c 1 and c 2 are constants. The distribution of , ir W is termed as cascade generator, assumed to follow 1-parameter beta distribution (Olsson, 2012). Following (Olsson, 1998), the probabilities P, the probability distribution of cascade generator are assumed to be related to (1) position in rainfall sequence, and (2) rainfall volume. The wet boxes, with a rainfall volume V > 0, can be characterized by their position in the rainfall series: (1) the starting box, box preceded by a dry box (V = 0) and succeeded by a wet box (V > 0); (2) the enclosed box, box preceded and succeeded by wet boxes; (3) the ending box, box preceded by a wet box and succeeded by a dry box, and (4) the isolated box, box preceded and succeeded by dry boxes. On the other hand, based on volume dependence, if the volume is large then it is more likely that both halves of the subintervals 4 contribute to nonzero volume than if the volume is small. Following (Olsson, 1998), a partition into three volume classes (v c = 1, 2, 3) was used, separated by percentiles 33 rd and 67 th of the values at the cascade step. Next, the variation of ( ⁄ ) with volume is parameterised as,   implementation issues of MRC-based disaggregation tool, interested readers are requested to refer (Olsson, 1998).
For calibration and application of the disaggregated model from an original resolution R l , two situations are considered: (1) when representative data at target resolution R s is available; such as in this case, disaggregation from daily to hourly time steps, in which hourly data were available. Hence, parameters are calibrated over the actual resolution interval R l < r < R s using 5 cascade steps. This implies, 5 successive "halving" from one day to generate 45-minute (2700 seconds) data.
(2) When no representative high-resolution data are available, then parameters are estimated by coarse graining from lower resolution R l to a higher resolution R s by successive disaggregation steps. In this study, it is the disaggregation from daily to minute scales (or sub-hourly time steps), in which no sub-hourly data were available for any of the representative sites. In such case, parameters are calibrated from daily rainfall data using 7 cascade steps. This implies disaggregating by halving from 1-day to 11-minute 15 seconds (675 seconds) data. After calibration, Monte Carlo simulation is performed to gradually fine-grain the data and generate realizations at desired resolution, R s . Since R s in these cases are not directly achieved by exact resolution doubling from R l , the target resolutions are obtained by geometric interpolation of the disaggregated model output at the final time step. In the next sub-sections, we demonstrate the performance of the MRC-based disaggregation tool using two different sets of observations.  Table S1 shows results of disaggregation experiments. We find a satisfactory performance between 5 observed and simulated model output, especially for the simulation of the percentage of zero values. However, we find a slight underestimation for simulated standard deviations of event volume and duration, whereas variance of mean inter-arrival time is overestimated.  Figure S2 presents variations of probability with volume classes, which often show substantial differences between the classes.

S 1.1 Performance Evaluation of MRC-based disaggregation Tools for McMaster Weather Station Data
The figure shows a linear relationship between precipitation, P and volume class, v c changes with cascade steps.
The mean regression line a p + b m *vc, (where a p is the intercept and b m is the slope) is shown as a dashed line with squares ( Figure S2). Figures S3 and S4 show variations of intercepts with cascade steps. The probabilities P(x/x) and P(0/1) for daily to minute-scale disaggregation of four different type of boxes are shown in Tables S2 and S3. P(x/x) can be modelled assuming linear dependence on the cascade step, which is given as, P(x/x)= a p + b m *vc with a p is estimated as, a p = c 1 + c 2 *C s , where c 1 and c 2 are the slopes and the intercept of the linear regression. Since P(0/1) [or P(1/0)] is relatively independent of cascade step, it can be modeled using a p + b m *vc. Finally, P(0/1) [or P(1/0)] can be estimated as, P(0/1)=1-(P(x/x)+P(1/0) [or P(1/0)=1-(P(x/x)+P(0/1)]. The empirical histograms (the observed, shown in bars) and the fitted beta distributions (shown in lines) [Figures S5 and S6] show a good agreement in the overall fit. However, at higher cascade steps, i.e. finer time scales, the number of bins in the histogram is small, and the fits appear naturally uncertain.
6 Figure S7 shows time series of disaggregated versus observed annual maxima (the 15-min and 1-hour) for Toronto International Airport. The Quantile Mapping (QM) is employed to adjust occasional overestimation due to disaggregation. Except for a few extremes, we find a close agreement between the observed versus bias-corrected disaggregated annual maxima time series. For example, the algorithm overestimates the wettest event in Toronto (137.4 mm of rainfall) during Hurricane Hazel (1954) at hourly disaggregation time steps. However, due to lack of observation, we could not validate 15-min disaggregation model performance. Figure S8 shows 1-hour disaggregated model performance for the nine sites. Although we find evidence of occasional overestimations in disaggregated model output, the adjustment of extremes by QM could correct biases to some extent. However, the algorithm fails to correct extreme wet biases for London International Airport (during 1954), Trenton Airport     Figure S3. The third column indicates the fraction of 0/1-divisions of all "non-x/x-divisions" summed over volume classes.
11 Figure S4. Same as in Figure S3 but for daily to minute-time step disaggregation. 13 Figure S6. Same as in Figure S5 but for daily to minute-time step disaggregation.

SI 2.1 Detection of Monotonic trend: Mann-Kendall Test Statistics
The null hypothesis 0 H of the test assumes that no temporal trend exists in the data and the alternate hypothesis H 1 assumes that a significant temporal trend (upward or downward) exists. The test statistic Z MK is computed as (Hirsch et al., 1982)     1 0; where x j and x k are the data points in time periods j and k ( jk  ) respectively, and n is number of observed data points. For n≥10, the test statistic S is approximately normally distributed with the mean of E(S) =0, and the variance of, where g is the number of tied groups and t i is the number of data points in the i th group.

SI 2.2 Detection of Abrupt or Step Change SI 2.2.1 General Formulation of Sequential Change Point Test
When a sequence of random variables is divided into two segments represented by 0 1 ,..., t xx and Now a two-sample hypothesis test can be used to test for a change point at a location k using a test statistic x  is a significant change point at level  . The analysis was performed using R statistical software with add-on package "trend". A nonparametric change point for location shifts is then defined as by replacing max,t N with this U statistic.

Change Point in Scale: Mood Test
The Mood Test uses a test statistic which measures the extent to which the rank of each point deviates from its expected value (Ross et al., 2011)

SI 2.3 Non-parametric Trend Free Pre-Whitening (TPFW) for Correction of Autocorrelation
To correct the autocorrelation present in the data, we followed the non-parametric procedure of trend-free prewhitening (TPFW) as suggested by the (Petrow and Merz, 2009):


At first, the trend of annual maxima time series of a particular duration is estimated by the nonparametric trend slope estimator,  as suggested by (Sen, 1968), which is the median of all pair wise slopes  Secondly, the computed trend is removed from the original series: Where, t X is the original time series and t is the time.


Next, the lag1-autocorrelation [  , using MATLAB function 'autocorr ()'], is computed from t Y . If no statistically significant (significance is checked at 5 and 10% significance level) autocorrelation is found, the trend and change-point detection algorithms are directly applied to the original time series. Otherwise, the lag-1 autocorrelation is removed from the time series: Finally, the removed trend in the first step is added back into the time series free from trend and The resulting time series Y  includes the original trend but free from autocorrelation. 20

SI 3.1 Estimation of GEV Parameters
GEV parameters are estimated using Bayesian Inference. For this, a Bayesian analysis is performed by imposing a prior distribution on the parameters. We estimated parameters using Bayesian Inference ( (Ter Braak and Vrugt, 2008;Ter Braak, 2006), in which multiple chains (here, we fix chain length 'n' as 5) are run in parallel. The resulting MC simulations are then run to an equilibrium. It is a standard practice to discard the initial iterations (often referred to as the burn-in period) of simulated samples since they are strongly influenced by starting values and do not provide usable information of the target distribution. Here we run DE-MC simulations for 3000 iterations and kept the 2001-3000 th iterations of each chain. The convergence of MC simulation is checked by the "potential scale reduction 29 factor ( )" as in (Gelman et al., 2011), which suggests the value of ̂ should remain below the threshold value of 1.1. The post burn-in random draws from posterior distribution are then used to construct predictive distributions. For annual maxima time series of each duration, the mean and associated 95% credibility interval of parameters (µ( ), ( )) are derived by computing 50 th (the median), 2.5 th and 97.5 th (bounds) percentiles of post burn-in random draw (for example, 50 th percentile of µ( 1 ), … . , µ ( 100 )). The derived model parameters are then used to compute corresponding design rainfall quantiles at T-year return period. For more details of BI and parameter estimation by DE-MC simulation, interested readers are requested to refer (Gelman et al., 2014;Renard et al., 2013).

SI 3.2 Model Selection SI 3.2.1 Bayes Factor
To evaluate the fit of the stationary model (null model, M1) relative to the nonstationary model (alternative model, M2) Bayes factor is computed based on the posterior distributions of sampled parameters, which is given as Where  denotes input data,  and  denote model parameters as described in the previous section. The term   Pr    can be expressed using Monte Carlo integration estimation as described in (Kass and Raftery, 1995). A value of 1   indicates nonstationary model, the nonstationary model, M2 fits the data better than the stationary model, M1.

SI 3.2.2 Akaike Information Criterion (AIC) for Small Samples
The AIC (Anderson et al., 1994;Bozdogan, 2000) is defined as follows: Where n is the number of observations, m denotes the number of fitted model parameters. MSE is the Mean Square Error of the fitted distribution against empirical distribution, which is expressed as (Dawson et al., 2007;Hu, 2007;Karmakar andSimonovic, 2007, 2007) (Yue, 2001)   0.44 1, 2,..., 0.12 Where, i is the rank in ascending order and i x is the i th largest variate in a data series of size n.
The AIC for small sample sizes ( 40 nm ) is given by (Hurvich and Tsai, 1995), As sample size increases, the last term of the second-order AIC statistics ( c AIC ) approaches zero, and the AICc tends to give the same conclusion as the traditional AIC (Burnham and Anderson, 2003). Since in the present study, length of data series varies from 46 to 66 years, we used AICc instead of traditional AIC, which is also a widely used method for model selection in hydrology (Caroni and Panagoulia, 2016;Gu et al., 2017;Panagoulia et al., 2014).

Estimation of Design Storm Intensity (DSI)
The DSI , often referred as return level in the literature is the value which is expected to be exceeded on an average once in every 1 ⁄ periods, where 1 − is the specific probability associated with the quantile (Gilleland and Katz, 2014). We obtain the 1 − (i.e., non-exceedance probability of occurrence) quantile of the annual maximum rainfall fitted with stationary GEV distribution using following expression (Coles and Tawn, 1996): is the DSI, associated with a Canada (CSA, 2010). The Gumbel probability distribution has following form (Wang et al., 2015) pp qK   (4.3) Where p K denotes frequency factor depending on the return period T, which is obtained using following relationship (Wang et al., 2015) 6 0.5772 ln ln 1 Environment Canada uses this method to estimate rainfall frequency at a given duration and obtain nationwide IDF curves. Figure S13. Uncertainty in DSI for 10-year return periods for stationary (blue) versus nonstationary (red) models The boxplots indicate 95% credibility interval in DSI estimate.       Figure S15. Estimated nonstationary versus EC-generated IDFs for return periods T = 2, 5, 10, 25, 50 and 100-year for the urbanized and moderately locations in Southern Ontario. The updated and EC IDFs are shown using solid and dotted lines respectively.

SI 5 Statistical Significance of Nonstationary versus Stationary DSI
The (statistically) significant differences in DSI is computed using the statistical test for the difference between two means or the standardized z-statistics, which is given by (Madsen et al., 2009;Mikkelsen et al., 2005) =̂− i.e., z = ± 1.64 correspond to 10% significance levels. The null hypothesis of the test assumes the T-year event estimate obtained using the best fitted nonstationary model is significantly different from its best fitted stationary counterpart.