Variations in the correlation between teleconnections and Taiwan ’ s streamflow

Interannual variations in catchment streamflow represent an integrated response to anomalies in regional moisture transport and atmospheric circulations and are ultimately linked to large-scale climate oscillations. This study conducts correlation analysis to calculate how summertime (July–September, JAS) streamflow data derived at 28 upstream and 13 downstream gauges in Taiwan correlate with 14 teleconnection indices in the current or preceding seasons. We find that the western Pacific (WP) and Pacific– Japan (PJ) patterns, both of which play a critical role in determining cyclonic activity in the western North Pacific basin, exhibit the highest concurrent correlations (most significant r = 0.50) with the JAS flows in Taiwan. Alternatively, the Quasi-Biennial Oscillation (QBO) averaged over the period from the previous October to June of the current year is significantly correlated with the JAS flows (most significant r =−0.66), indicating some forecasting utility. By further examining the correlation results using a 20-year moving window, peculiar temporal variations and possible climate regime shifts (CRSs) can be revealed. A CRS test is employed to identify suspicious and abrupt changes in the correlation. The late 1970s and 1990s are identified as two significant change points. During the intermediate period, Taiwan’s streamflow and the PJ index exhibit a marked inphase relationship (r > 0.8). It is verified that the two shifts are in concordance with the alteration of large-scale circulations in the Pacific basin by investigating the changes in pattern correlation and composite maps before and after the change point. Our results suggest that empirical forecasting techniques should take into account the effect of CRSs on predictor screening.


Caveats of using the CRS detection method:
As already indicated by the manuscript, there are several classes of approaches available for the detection of regime shifts, such as parametric methods (e.g., the classical t-test), non-parametric methods (e.g., the Mann-Whitney-Pettitt test), regression-based methods, cumulative sum methods, and sequential methods.Rodionov (2005) pointed out the pros and cons of some common approaches as well as his sequential method (Rodionov, 2004).The pros of his method are the automatic, early detection of a regime shift and the ability to monitor a possibility of a regime shift in real time.He has shown that his method can outperform Lanzante's method (another robust, non-parametric procedure developed by Lanzante, 1996).Therefore, since developed, Rodionov's method has been used in many studies in climate sciences.Nevertheless, the cons of his method are the requirement of some experimentation on the two parameters used (i.e., the cut-off length l and probability level p) and inability to account for a gradual regime shift and data with obvious autocorrelation (or red noise, but this issue was later ameliorated by a prewhitening procedure introduced by him, Rodionov, 2006).According to Rodionov (2004), while the probability level p is known to determine the critical value of t (the Student's t-distribution), "the cut-off length l determines the minimum length of the regimes, for which the magnitude of the shifts remains intact."It has been tested that, the larger l is set, the fewer change points can be identified.By contrast, a smaller l does not necessarily lead to more change points since only those significant change points can be identified based on the t-test.In other words, if there is no strong regime shift in the data series, the method with some variations in p and l simply cannot identify any change point (Rodionov, 2015 and verified by us too).In any case, we should still use Rodionov's method with caution since the CRS detection method is purely statistical, and the CRS results could be meaningless without solid evidence from related research.

Hindcasting experiment using linear regression:
Linear regression is widely used in climate forecasting studies (e.g., Hastenrath, 1995;Hastenrath et al., 2004;Chen and Georgakakos, 2015) to generate forecasts based on a calibrated, linear equation that depicts how a hydro-climatic predictand (Y, the streamflow in our case) responds to selected predictors (X, teleconnection indices in our case): where  represents coefficients estimated by ordinary least squares.In each catchment, we develop a linear regression equation as the prediction model.In terms of predictors (i.e., independent variables), we adopt 13 teleconnection indices described in the paper (with AAO excluded as relatively short in record) and perform stepwise model selection-based AIC (Akaike Information Criteria).Regarding model selection, we use both forward and backward directions to ensure a thorough search in the variable space.Afterwards, to avoid possible multicollinearity issues resulting from some highly correlated teleconnection indices, the variance inflation factor (VIF) is assessed: where 2 j R is the coefficient of determination from a regression of the j th predictor on any other predictors.According to the literature (Chen and Georgakakos 2014;Hidalgo-Muñoz et al. 2015), the VIF tolerance threshold is set to be 4 for small samples.The final model is thus determined and used to generate hindcasts (i.e., retrospective forecasts) for that catchment.The generation of hindcasts is subject to the leave-one-out cross-validation (LOOCV) procedure to circumvent artificial skill.Lastly, the LOOCV correlation and Gerrity Skill Score (GSS, Gerrity, 1992) are calculated to assess the prediction skill in that catchment.The GSS, unlike other conventional skill scores (e.g., the Heidke skill score), is an "equitable skill score" with a specific scoring matrix known for rewarding correct forecasts in the less ordinary categories (e.g., high and low flows in our case).The GSS does not reward random or constant forecasts.The GSS has been adopted by a number of practitioners of climate forecasts (e.g., CPC and WMO) and many studies (Chen and Georgakakos 2014;Hidalgo-Muñoz et al. 2015).The above framework is repeated for all 41 catchments.
Figure S2 shows some hindcasting results for selected upstream and downstream catchments.LOOCV correlations vary from one catchment to another and can be as high as ~0.6.As a more stringent metric, cross-validated GSS values are generally lower but most of the time pass the significant threshold (e.g., ~0.25 for data size 30, determined by the bootstrap-based hypothesis testing, Chen and Georgakakos, 2014).Overall, using large-scale circulation indices can produce fair to good prediction skills in summer streamflow prediction in Taiwan.Among those many climate indices, the PDO and PJ indices are selected most frequently as the predictors for the catchments [while the former is selected seven times (except for SGL), the latter is selected five times (except for Catchments 3 and 9 and BG) for the results shown in Figure S2].This result is, to a certain extent, consistent with our general correlation assessments (e.g., Table 3 in the main text) and indicates the general dominance of summer climate in Taiwan.Regarding the origin of the predictability, Chen and Chen (2011) indicated that the PDO coincides with the specific meridional SST contrast (i.e., warming in the tropical central and eastern Pacific and cooling in the extratropical North Pacific), which plays a dominant role in modulating summer rainfall in Taiwan.Choi et al. (2010), Kosaka et al. (2013), and Kubota et al. (2016) all provided sufficient evidence of the significant impact of the PJ on tropical cyclone activity and rainfall over the WNP during summer.Based on our findings, the predictability for summer rainfall can be extended to streamflow in Taiwan.
From Figure S2, we can note that the relatively better performance of each LOOCV time series seems to occur during the period from the late 1970s to the late 1990s, coinciding with the CRS epoch discussed earlier.Hindcasts during the pre-regime shift epoch seem to still be able to capture the general variability of the observed runoff (with relatively poorer performance), whereas hindcasts during the post-regime shift epoch appear to present more opposite signals and apparent departures from the observed runoff.It is worth noting that some of the departures occur in years when JAS typhoon activity is abnormally high.For example, in 2007, typhoons Pabuk, Sepat, and Wipha together generated the highest amount of cumulative rainfall for some watersheds over the past decade, and in 2008, typhoons Kalmaegi, Fung-Wong, Sinlaku, Hagupit, and Jangmi made a record of continuous invasions of intense typhoons (all Category 2 and above) in JAS.To further illustrate the effect of CRSs on streamflow prediction, we fitted a new regression model using the data from 1979 to 1998 and then evaluated how the fitted model performed in the remaining years.Using the SGL watershed as an example, Figure S2-i shows the new hindcasting result.In comparison with Figure S2-h, the new fitted model exhibits some definite improvement during the period from 1979 to 1998, showing the outstanding CV correlation and GSS values as 0.84 and 0.56, respectively.However, the fitted model can generate nothing but extremely poor hindcasts for the remainders.In fact, both skill metrics show a reverse sign, clearly illustrating distinct climate regimes over the temporal horizon.In contrast with the above experiment, if we fit another regression model using the data outside the time frame of 1979-98, the stepwise model selection scheme discloses no climate indices that are qualified to be a predictor.To summarize, the linear regression experiment indicates that the assumption of a stable predictor-predictand relationship could be quite problematic for hydro-climatic forecasting due to the observation of CRS.

Figure S2 .Figure S3 .
Figure S2.(a) to (h) are selected hindcasting results for upstream and downstream catchments in Taiwan using linear regression.Time series in red are model estimates based on the leave-one-out cross-validation (LOOCV) procedure.Cross-validated (CV) correlation and GSS values are also denoted in each plot.(i) is similar to (h), but the linear regression model is trained/fitted with the data from 1979 to 1998, and then the fitted model is tested with the rest of the data points (i.e., 1969-78 and 1999-2013)