Downscaling future precipitation extremes to urban hydrology scales using a spatio-temporal Neyman–Scott weather generator

Abstract. Spatio-temporal precipitation is modelled for urban application at 1 h temporal resolution on a 2 km grid using a spatio-temporal Neyman–Scott rectangular pulses weather generator (WG). Precipitation time series used as input to the WG are obtained from a network of 60 tipping-bucket rain gauges irregularly placed in a 40 km  ×  60 km model domain. The WG simulates precipitation time series that are comparable to the observations with respect to extreme precipitation statistics. The WG is used for downscaling climate change signals from regional climate models (RCMs) with spatial resolutions of 25 and 8 km, respectively. Six different RCM simulation pairs are used to perturb the WG with climate change signals resulting in six very different perturbation schemes. All perturbed WGs result in more extreme precipitation at the sub-daily to multi-daily level and these extremes exhibit a much more realistic spatial pattern than what is observed in RCM precipitation output. The WG seems to correlate increased extreme intensities with an increased spatial extent of the extremes meaning that the climate-change-perturbed extremes have a larger spatial extent than those of the present climate. Overall, the WG produces robust results and is seen as a reliable procedure for downscaling RCM precipitation output for use in urban hydrology.


Introduction
Pluvial flooding of urban areas is often caused by very local extreme precipitation at sub-daily temporal scale (Berndtsson and Niemczynowicz, 1988;Schilling, 1991).Tradi-tionally, historical gauge measurements of precipitation at minute-scale temporal resolution are thus used as input for the design and analysis of urban water infrastructure (Mikkelsen et al., 1998;Madsen et al., 2009;Arnbjerg-Nielsen et al., 2013).Climate change is, however, expected to change the occurrence rate and magnitude of extreme events causing urban pluvial flooding (Fowler and Hennessy, 1995;Larsen et al., 2009;Olsson et al., 2009;Sunyer et al., 2014a), and high-resolution input time series representing future climates are therefore needed.Even though the overall qualitative features of precipitation are reproduced realistically by regional climate models (RCMs) (Christensen and Christensen, 2007) they are not able to capture the very finescale spatio-temporal features of precipitation satisfactorily and yield output that is too spatially correlated (Tebaldi and Knutti, 2007;Gregersen et al., 2013).To overcome this, either dynamic downscaling with climate models has to operate at much finer scales in order to properly describe convective precipitation dynamics (Kendon et al., 2014;Mayer et al., 2015) or further statistical downscaling of the climate model output has to be performed (Olsson and Burlando, 2002;Wood et al., 2004;Cowpertwait, 2006;Molnar and Burlando, 2008;Willems et al., 2012;Sunyer et al., 2012;Arnbjerg-Nielsen et al., 2013).Fine-scale dynamic downscaling is computationally extremely expensive and statistical downscaling is therefore often favoured (Maraun et al., 2010).Several approaches exist within statistical downscaling, each with its pros and cons (Wilks and Wilby, 1999;Willems et al., 2012;Arnbjerg-Nielsen et al., 2013).In the present study a stochastic weather generator (WG) is used for statistical downscaling.
WGs can take different forms (Cowpertwait, 2006;Vrac et al., 2007;Burton et al., 2008;Arnbjerg-Nielsen and Onof, 2009;Chen et al., 2010;Cowpertwait et al., 2013) but they generally work by analysing observed precipitation (and possibly other weather-related variables) and use the obtained statistics to create artificial stochastic precipitation (or weather) time series that replicate the behaviour of the observations with respect to these statistics (Maraun et al., 2010;Sunyer et al., 2012).Perturbation of the WG to yield output time series representing future climates is then possible by application of climate change factors calculated from output from RCMs (operation at too large space scales and timescales) to relevant parameters of the WG (that operates at the right space scales and timescale).
Several WGs exist that model precipitation as a stochastic point process where the given observations are considered single realizations of an underlying precipitation process (Waymire and Gupta, 1981).Rodríguez-Iturbe et al. (1987a, b) developed the stochastic point process model in a way to better characterize and describe the precipitation process at the event level.Implementations of the stochastic point process models for spatio-temporal precipitation seem to work satisfactorily at temporal resolutions down to 1 h (Cowpertwait and O'Connell, 1997;Cowpertwait, 2006;Burton et al., 2008Burton et al., , 2010a;;Cowpertwait et al., 2013).Also, downscaling to finer resolution than 1 h is inherently problematic as the scaling properties change below this point (Nguyen et al., 2002;Molnar and Burlando, 2008).Thus, for downscaling of extreme precipitation at sub-daily level and subsequent application of climate change signals from climate models, stochastic weather generators implementing stochastic point process models seem useful (Cowpertwait, 1998;Furrer and Katz, 2008;Hundecha et al., 2009;Verhoest et al., 2010;Sunyer et al., 2012).The trade-off is that the models do not involve rainfall movement and, hence, that the spatio-temporal scale of the model has to be such that rainfall movement is not the main descriptor of the spatial rainfall pattern.
At the daily level, the Neyman-Scott rectangular pulses (NSRP) and the spatio-temporal Neyman-Scott rectangular pulses (STNSRP) models (Burton et al., 2008(Burton et al., , 2010a, b;, b;Cowpertwait et al., 2013) have shown good skill in downscaling point precipitation extremes.This applies for individual gauges (Sunyer et al., 2012) as well as for spatially averaged precipitation covering large areas considered to have a uniform climate described by relatively few gauges (e.g.five gauges for a 4000 km 2 basin in the Pyrenees; Burton et al., 2010a; three gauges used to calibrate a regional model covering a catchment of 342 km 2 in the Basque Country; Cowpertwait et al., 2013).This is, however, inadequate in urban hydrology where the rainfall dynamics that cause the effects under study occur on much smaller time and space scales.
In the present study, the STNSRP weather generator (WG) in the form of the software package RainSim (version 3.1.1, Burton et al., 2008) is used in a new, urban hydrology context focusing on much smaller space and timescales than what has been done in previous studies.Due to the limitations in scalability of both RCM model output and precipitation measurements discussed above, a temporal resolution of 1 h is adopted, even though a higher resolution would be preferable from an urban hydrology perspective.Hourly data from 60 rain gauges from a dense rain gauge network in Denmark are used to estimate parameters for the WG, which is used to generate synthetic precipitation data series on a regular dense grid covering approximately 2400 km 2 .The synthetic precipitation data are then evaluated with respect to its applicability for urban hydrological purposes.A 1 h temporal resolution on a 2 km grid is chosen as realistic and sufficient performance scales of the model for fine-scale precipitation data in urban hydrology.The evaluation of the WG is done from an engineering perspective with respect to its ability to reproduce rainfall features relevant for urban hydrological modelling.We will thus focus on -the WG's ability to produce realistic extreme event intensities at point scale; -the WG's ability to reproduce the seasonal distribution of extreme events at point scale; -the WG's ability to reproduce small-scale spatiotemporal correlation structures of the extreme events.
This study uses the presented WG to analyse climate change in precipitation at scales comparable to the observational data sets traditionally used today for urban water infrastructure design and analysis.The WG is perturbed with climate change information obtained from a collection of temporal high-resolution RCMs.Six RCM runs using three different RCMs, driven by three different global circulation models (GCMs) and covering three different emission scenarios (ranging from average to very high emissions) are included in the analysis and four of the RCM runs are run as highresolution models at an 8 km grid.Finally, climate change at urban scale is assessed based on the perturbed WG output.

Observational data
The model area is a 40 km × 60 km region covering the north-eastern part of Zealand (Denmark) including Copenhagen; see Fig. 1.This study uses two different observational data sets; Table 1 summarizes their main characteristics.The area is highly urbanized and has a dense but irregular network of rain gauges designed and used for urban hydrology applications.The main observational precipitation data set, SVK (abbreviation for Spildevandskomiteen, the Water Pollution Committee of the Society of Danish Engineers), is obtained from this dense network of high-resolution tipping bucket rain gauges (Jørgensen et al., 1998;Sunyer et al., 2013).Data from 60 stations that have been active between 2 and 34 years in the period 1979 and 2012 are included in the analysis; see Fig. 1 for locations within the study area.Figure 2 shows the temporal development of the number of active stations (top panel), the average distance between the nearest neighbouring stations through the measuring period (middle panel), and shows the distribution of record lengths by 2012 (bottom panel).Generally, there has been an increase in the number of stations and a densification of the network over the years.Some studies impose a minimum length of the time series to be included in regionalization studies, e.g.Madsen et al. (2009), but in this study the cross-correlation is of key interest and hence all gauges are included in the analysis regardless of their record length.The original data resolution is 1 min and 0.2 mm but for the present study, data are aggregated to an hourly time series.This data set is used to estimate most of the parameters of the WG.
The second observational data set included in the analysis is referred to as the climate grid Denmark (CGD) (Scharling, 2012).It comprises spatially averaged daily data in a uniform 10 km grid for all of Denmark from 1989 to 2010 inclusive (cf.Fig. 1).These data are generated based on a national network of gauges with 27 gauges within the study  site (Scharling, 1999) and are only used to estimate the spatial component in the WG.

Regional climate model data
Precipitation output from 12 different RCM runs representing present and future condition is used in this study; see Table 2. Four of the model runs are identical to the ones used by Gregersen et al. (2013), namely the two SRES A1B scenarios forcing the RCM RACMO (version 2.1, Meijgaard et  Christensen et al., 2006) and their present counterparts.All RCM runs are driven by the GCM ECHAM5 (Roeckner et al., 2003) and are part on the ENSEMBLES project (van der Linden and Mitchell, 2009).All have a spatial resolution of 25 km and a temporal output resolution of 1 h.These were the ENSEM-BLES runs we had available through personal contacts for the present study at 1 h resolution.The more generally available data series with only daily maximum 1 h intensity are not sufficient for the employed downscaling procedure.The other eight simulations used in this study are run with the RCM HIRHAM driven by the GCM EC-EARTH (Hazeleger et al., 2012) and the RCM WRF (Skamarock et al., 2005) driven by the GCM NorESM (Bentsen et al., 2013).The four future simulations use the RCP4.5 and RCP8.5 scenarios (van Vuuren et al., 2011); see Table 2.The spatial resolution of these simulations is 8 km and the output frequency is again 1 h (Fox Maule et al., 2014;Mayer et al., 2015).The SRES A1B and RCP4.5 scenarios are considered comparable moderate forcing scenarios, whereas the RCP8.5 scenario is a very strong forcing scenario.All future RCM runs are related to RCM runs driven by the same GCM for present conditions when climate factors are calculated (Table 2).As in Gregersen et al. (2013), climate change is considered uniform for all land cells over Denmark; this results in 87 considered grid cells for the ENSEMBLES SRES A1B simulations and 648 for the RiskChange RCP4.5 and 8.5 simulations.

Weather generator data
The last data set is the output from the applied weather generator (described in Sect.3).A total of 10 data sets comprising sets of 50-years time series in the 2 km grid (as shown on Fig. 1) are simulated as output from the WG.These data sets are used to corroborate the WG by refitting and rerunning it, evaluating the output variability, and comparing the output statistics to those of observations.
3 Weather generator Burton et al. (2008) provided a thorough description of the weather generator and its components, Burton et al. (2010a) an introduction to the application of the model, and Burton et al. (2010b) an introduction to incorporation of climate change into the WG; the remainder of this section is, thus, only giving a brief introduction to the WG and a more indepth description of the workflow associated with working with the model is given in the supplement.Generally, the approach by Burton et al. (2010a) is followed with inclusion of climate change as described by Burton et al. (2010b) using the software presented by Burton et al. (2008).

Parameters
The RainSim WG (version 3, Burton et al., 2008) describes the spatio-temporal rain field as discs of rain (rain cells) with uniform rain intensity that temporarily occur and overlap in space and time to produce output that realistically describes the statistical properties of precipitation.As the calibration data set consists of point observations, the time series from the simulations are not grid cell averages but strictly comparable to what a gauge would have measured if present in a grid point.
The WG parameters and their meaning and interactions are described in depth in Burton et al. (2008) where a schematic representation of the WG is also found (Burton et al., 2008: Fig. 1).A uniform Poisson process governed by λ describes the storm occurrences.For each storm a random number of rain cells are produced, which occur at independent time intervals after the storm origin and where the time intervals follow an exponential distribution with parameter β.A uniform spatial Poisson process governed by ρ describes the density of the rain cells in space.The cell radii are randomly drawn from an exponential distribution described by γ , and the duration and intensity of each rain cell is independent and follows an exponential distribution with parameters η and ξ , respectively.The rain intensity at a given point is therefore the sum of all overlapping rain cell intensities at a given time.In all, seven parameters describe the WG: λ −1 , the mean waiting time between storm origins (in hours); β −1 , the mean waiting time for rain cell origins after storm origin (in hours); η −1 , the mean duration of rain cells (in hours); ρ, the spatial density of rainfall cell centres (cells per km 2 ); ξ −1 , the mean intensity of the rain cells (in mm h −1 ); γ −1 , the mean radius of the rain cells (in km); -, the non-homogeneous intensity scaling field describing how the mean monthly rainfall intensity varies in space within the model area (-).
The non-homogeneous intensity scaling field, , is a proxy for the spatial variation of mean monthly precipitation and is used for relative scaling of the precipitation in space; for this study it is interpolated from the CGD data set using inverse distance weighting.Regional modelling of short-duration extreme precipitation for Denmark, using the SVK data set, has shown that the only significant parameter that can explain the geographical variation of point extremes statistically is the corresponding mean annual precipitation (Madsen et al., 2002(Madsen et al., , 2009)).Thus, taking as the only spatially varying parameter in the WG, and as such the only parameter describing spatial differences within the WG, is considered to be an acceptable approximation.The actual spatial variation of mean monthly precipitation calculated from the CGD data set is considerable (see Fig. 3), even though the model area is small in size and relatively flat.Especially in June and July there is a clear north-south gradient with 75-80 mm month −1 in the north of the area and 55-60 mm month −1 in the south.

Parameter estimation
The parameters for RainSim (see Sect. 3.1) are estimated based on daily and hourly statistics for each calendar month from the observed time series (SVK).The objective function is adopted from Burton et al. (2010b: Eq. 2) and the weights are chosen to best reproduce features at both hourly and daily levels, as described by Burton et al. (2008Burton et al. ( , 2010a, b), b).The custom weighing scheme used is constructed to support the features of rainfall that are important in the context of the present study (i.e. the higher-order moments are assigned more weight to secure a realistic fit for the extremes; see Table 3).The statistics used for fitting the WG are -the mean daily precipitation intensity from the individual gauges (24 h mean); -the variance of the intensity of the daily and hourly observations from the individual gauges (1 and 24 h variance); -the skewness of the intensity of the daily and hourly observations from the individual gauges (1 and 24 h skewness); -the probability of dry days and of dry hours based on the observations from the individual gauges and with thresholds of 1.0 and 0.1 mm, respectively, as suggested by Burton et al. (2008); -the lag-1 auto-correlation of the hourly precipitation intensity calculated from the observations at the individual gauges; -the cross-correlation between observations of hourly precipitation intensity at the individual gauges.
The chosen weighing scheme favours the higher-order moment statistics, variance and skewness, over the mean as the extreme characteristics of the simulated precipitation is prioritized.Furthermore, the cross-correlation and autocorrelation are given high weights to ensure a realistic representation of the spatio-temporal extent of the simulated precipitation.The different observation time series are furthermore weighted relative to each other according to the effective length of the time series to give more weight to longer time series.This is done to increase the data basis for crosscorrelation analysis, utilising the fact that a great deal of the short time series are from recent years and thus overlap in time; see Fig. 2. The standard fitting bounds suggested by Burton et al. (2008) are applied in the fitting procedure to ensure that the WG is fitted with values that are considered realistic by the model developers for a northern European climate.

Perturbation of the weather generator with climate change signals
The WG is perturbed with climate change signals by application of change factors, α i,j,k 's, to the statistics, Y Present i,j,k 's, calculated from the SVK data set and used for the original parameter estimation for the present climate.In this manner new statistics are produced for future climate, Y Future i,j,k 's, as (Fowler et al., 2007;Burton et al., 2010b) where one climate change factor, α i,j,k , is calculated for each statistic, i, for each month, j , for each RCM, k.The change factors are calculated using the methodology introduced by Burton et al. (2010b: Eqs.1-3), which includes transformations that ensure that the bounded statistics (probabilities of dry days and hours and auto-correlation) stay within their prescribed boundaries (further described in the supplement).
No change factor is calculated for the cross-correlation as this statistic is described poorly by the RCMs (Gregersen et al., 2013).

Evaluation of simulated time series
The evaluation of the simulated time series will be in line with previous studies such as Olsson and Burlando (2002), Cowpertwait (2006), and Molnar and Burlando (2008).This implies that simulated time series are not evaluated directly against the observations with the expectation of a perfect fit; the expectation is rather that the simulated series have the same statistical properties as the measured precipitation.In practise this is achieved by analysis of the statistics used in the fitting procedure and through analysis of statistics, which are independent of the fitting statistics as will be outlined in Sect.4.2.
For evaluation of all realizations of the WG the 60 grid cells closest to the observational gauges are extracted and evaluated point-wise with respect to all the fitting statistics as recommended by Burton et al. (2008).Furthermore, the WG is refitted to the simulated data sets to evaluate if the realization is representative and results in model parameters that are comparable to the parameters estimated from the SVK observational data set.
Ten realizations of the WG, named WG1 to WG10, are used in this study.The actual simulation time is very short, but the process of writing data to text files for the complete grid takes a long time, making it a rather cumbersome approach, which limits the number of realizations evaluated in this study.
The refitted WG data are evaluated with respect to the fitting statistics, Y WG i,j,k for each statistic, i, for each month, j , for each WG realization, l, through discussion of the density plots for the normalized error against the SVK data set: . (2)

Evaluation of extremes
Gregersen et al. ( 2013) compared extreme precipitation observations with RCM output.One issue is the difference in absolute magnitude of the extremes, which can partly be explained by the inherent difference between gridded data and point observations; another issue is the spatial correlation structure of the extremes, where extremes calculated from RCM output are much more spatially correlated at the subdaily timescale.In this study, a data set simulated with a WG will be considered better than using RCM data directly for the specified purpose, if it better resembles the observations with respect to both the absolute magnitude and the spatial correlation structure of the extremes.
The statistics used in this study to evaluate the WG's performance with respect to simulating extreme precipitation are based on the identification of independent rainfall events, as done when estimating intensity-duration-frequency relationships; see e.g.Madsen et al. (2002).Individual events are separated by dry periods equal to or longer than the chosen event duration (i.e. 1 h events have at least 1 h of dry weather between them and 24 h events have at least 24 h of dry weather between them) and the maximum-averaged event intensities over the chosen durations are calculated.Furthermore, the peak-over-threshold (POT) approach from Mikkelsen et al. (1996) and Madsen et al. ( 2002) is adopted with a global constant intensity threshold (i.e.type I censoring) to define the extreme events for each gauge/grid point.In this study, extreme precipitation events are evaluated for 11 distinct durations of 1,2,3,4,6,8,12,24,48,72, and 120 h with thresholds ranging (approximately log-linearly) from 7.6 to 0.34 mm h −1 (the same as used by Gregersen et al. (2013) for the SVK data set).Three different event-based indices of extreme precipitation are evaluated as explained below.

Magnitude of extreme events
To evaluate the magnitude of the extreme events intensityduration-frequency relationships are calculated for all data sets.First, the return periods of extreme events extracted from an observed or simulated rainfall time series are calculated using the California plotting position formula: where T m is the return period of the event (years) with rank m and T obs is the observation period (years) of the time series.
T m is obviously affected by sampling variability and is biased, especially for large return periods.There are more elaborate methods to estimate T m than Eq. ( 3), but we use Eq. ( 3) here because it allows for comparing extreme value curves from multiple sites (including sampling variability and spatial variability) in a straightforward way.Second, a generalized Pareto distribution is fitted to extremes from every single time series: where z T is the intensity for extreme event with return period T , z 0 is the threshold, µ is the mean intensity of the extreme events, λ is the mean number of extremes per year, κ is the shape parameter and T is the return period.Finally, based on z(T )'s intensity-duration-frequency curves are calculated for each data set.
For the climate change scenarios, climate factors for the intensity of the extreme events are calculated as a function of the return period for different T year event durations.This is done as a simple ratio between the present and future levels for a given return period as (5)

Seasonality of extreme events
The seasonality of the extreme events is determined to further evaluate the realism of the behaviour of the WG.This is done to evaluate whether the WG data set constructed with individual monthly model parameters results in a realistic distribution of the extremes throughout the year.The same extreme events used in the evaluation of the magnitude are used in this analysis.The determination is in practice performed by counting the number of extremes from the POT analysis that occur within each month for the SVK and WG data sets.These are then normalized and compared with a χ 2 test (Wilks, 2011) where the normalized counts C for the SVK data act as the expected values for the WG data set and where the summation is done over months giving a test statistic x: x then follows a χ 2 distribution with (12 − 1)(2 − 1) = 11 degrees of freedom.

Unconditional spatial correlation of extremes
The unconditional spatial correlation (Mikkelsen et al., 1996), ρ, between the intensities of extreme events that are considered concurrent at different sites A and B is estimated.
The methodology follows Mikkelsen et al. (1996) with the i th extreme intensity Z Ai measured at site A being concurrent with the j th extreme event Z Bj measured at site B if Eq. ( 7) is fulfilled.In this framework the precipitation process is considered to generate random occurrences of precipitation that are treated as correlated random variables, Z A and Z B , and two events are considered concurrent if they are overlapping in time or at most separated by a lag time t, which is introduced to account for the travel time of rain storms between sites.
Z Ai , Z Bj : Here t s 's are the start times of the events and t e 's are the end times of events.A lag time of t = 11 h + the duration of the event is adopted in accordance with Gregersen et al. (2013).The introduction of this lag time, in combination with lack of knowledge of the movement direction of precipitation, implies that an individual event at one site can be correlated to more than one event at another site.The unconditional covariance is then estimated by also accounting for non-concurrent extreme events at the two sites as with U being a Boolean operator taking the value of U = 1 if events are concurrent and U = 0 otherwise.Thus, E{Z|U }s are not a single values, but two values for U = 0 and U = 1, respectively, and a covariance between them can be calculated.
Finally, the unconditional spatial correlation is obtained by division of Eq. ( 8) with the sample standard deviations of the two sites (Mikkelsen et al., 1996): The unconditional spatial correlation values are grouped together in bins where the distance between the points considered are approximately the same, and an exponential model is fitted to describe the unconditional spatial correlation's dependence on distance between sites using the e-folding distance measure as proposed by Gregersen et al. (2013).

Weather generator parameter estimation
The parameter estimates (cf.Sect.3.2) for the model fitted to SVK data, the parameter estimates for the model refitted to the 10 realizations of the WG (WG1-WG10) and the used boundary values are given in Fig. 4. All parameters values shown in Fig. 4 are given in the supplement.All parameters vary over the course of the year, some more smoothly than others.Note that the β parameter (the parameter controlling the arrival time of cells after a storm origin) is constrained at its prescribed minimum value for 4 months (February, September, October, and December).However, rain events can easily last for several days at these times of the year in Denmark, and this fitting artefact is therefore considered to have limited influence on those features of rainfall, which are of interest for this study.Figure 4 shows that all the refitted values are different and especially the β parameter does not always seem to follow the same structural pattern as the SVK data set.As β −1 controls the arrival time of cells after storm origin, it will be heavily dependent on the actual realization of weather from the WG and this is not considered to be important for the realized extreme events.The ξ parameter seems to be slightly biased in the same direction for all WGs.ξ −1 controls the mean intensity of the rain cells and the difference in fit suggests that the rain in the WG data sets are slightly more intense during summer than what is seen in the SVK data set.Generally, the WG data sets, however, represent the SVK data set well.The fitting statistics (cf.Sect.3.1) resulting from the direct analysis of the observations (SVK data set) and the simulations (WG data sets that are simulated based on fitting the WG to the SVK and CGD data) are compared in Fig. 5 through the normalized error (Eq.2) and directly in Table 4. Generally, the fit seems reasonable for all variables with a mean of the normalized errors close to zero.For the moment statistics, the WG data sets seem to have a slight positive bias, and the variance and skewness distributions are also slightly positively skewed (Fig. 5a-e).However, the WG fit is still within the bounds reported for the SVK data set in Table 4.The lag-1 auto-correlation and the probabilities of dry hours seem to be fitted well even though the probability of dry days also seem to have some skewness in the error distribution.The probability of dry days is the only parameter that seems to differ between observations and WGs, indicate that the WG concentrates the precipitation on too few days.Also, it seems that none of the WG realizations performs differently than the others with respect to reproduction of the fitting statistics.Hence, the discrepancies observed in Fig. 4 do not seem to impede the use of the WGs as good proxies for observed precipitation.
The cross-correlation of the 1 h intensities is shown in Fig. 6 for each month of the year.The 10 WG data sets seem to reflect the overall behaviour of the SVK data set very well and also capture most of the variability seen in the SVK data set.The very low correlations observed in the SVK data set for some "traces" of points, especially in March, October, and November, are due to some time series only overlapping for very short time periods in recent years where the number of stations has increased dramatically (see Fig. 2); hence, the correlation is depending on only very few precipitation events.There is no evidence of a systematic pattern in these  and 6, the WG fit is considered satisfactory given the complex data set used and the purpose of this study.For analysis of extremes at event level, this WG reproduces well the higher-order moment statistics, which are the features expected to have the highest influence on the produced extremes.

Evaluation of extremes for present climate conditions
For durations of 1 to 120 h, the extreme events are extracted from the SVK data set at each gauge and from the WG data sets in each grid cell closest to the SVK observation points and ranked according to return period (Eq.3). Figure 7 shows intensity-duration-frequency curves estimated for WG realization along with the SVK data set.For both 100-and 10year events, the WG data sets result in comparable extreme intensity values for all considered durations well within the shown 68 % confidence interval (corresponding to a 1 standard deviation envelope) for the SVK IDF curve.
Figure 8 shows that the seasonal distribution of these extreme events is captured very well by the considered grids from the simulated WG data sets for all considered event durations.The χ 2 tests furthermore confirm that there are no significant differences between distributions for the WG and the SVK data sets for all event durations.
Figure 9 shows the unconditional spatial correlation for the SVK and for the selected WG grid points calculated according to Eq. ( 9) and grouped in selected bins.Table 5 furthermore compares the e-folding distances based on the fitted exponential models with a set of values calculated from RCM data representing a slightly larger area, as seen in from Gregersen et al. (2013).Gregersen et al. (2013) showed, using data from the whole of Denmark (range 0-350 km), that the spatial correlation pattern is not the same when considering output from climate models compared to SVK data as the climate model output maintains too long spatial correlation lengths at scales below approximately 150 km and 12 h (see Table 5).Both Fig. 9 and Table 5 indicate that the WG better reproduces the Table 5. e-folding distances for the SVK and WG maximumaveraged intensities of extremes for 1, 6, 12, and 24 h duration, based on the fitted exponential models (cf.Fig. 8) as well as for a regional climate model (HIRHAM/ECHAM) from the study by Gregersen et al. (2013) for comparison.
spatial correlation pattern of the SVK data within the spatial range (0-60 km) covered by the observations included in this study.The e-folding distances computed in this study for the SVK data set are somewhat lower than the ones calculated by Gregersen et al. (2013).This is a consequence of inclusion of fewer gauges and, most importantly, that the time series in the SVK data set for this study have been aggregated into hourly time series prior to the smoothing and POT analysis.Gregersen et al. (2013) conducted the smoothing and POT analysis directly on the original time series that have a 1 min resolution.The WG data sets represent the space-time features of precipitation of crucial importance for urban hydrology applications much better than the climate model output; the WG data set is considered realistic at this small-scale spatio-temporal resolution.
Overall, the results show that the WG is able to realistically simulate extreme precipitation statistics down to the hourly scale at a 2 km × 2 km spatial resolution.

Perturbation of the weather generator with climate change signals from RCMs
As the different realizations of the WG produce very similar output, only one 30-year realization is generated for each perturbation with climate change signals from each of the RCMs.Furthermore, all grid cells are used for both present and future evaluations as no comparisons are made to the observational data.For each RCM run and each statistic the change factors, α i,j,k 's, are calculated.All change factors and all parameters values for WGs representing future climate are given in the supplement.They are primarily above 1 for the momentderived statistics (Fig. 10a-e) but the different RCM runs appear different.For the 24 h mean (Fig. 10a) the α i,j,k 's are mostly above 1 with all RCM runs showing some months with values below 1 in an unsystematic pattern.For both the 24 and 1 h variances (Fig. 10b and d) the number of RCM runs and months that show a decrease is very limited and in general the variance will increase for all seasons.The HIRHAM RCP8.5 simulation differs from the other RCM runs with very high α i,j,k 's for the summer months.The 24 and 1 h skewness (Fig. 10c and e) show more clear seasonality than the mean and variance with higher α i,j,k 's from May to September for all RCM runs clearly indicating a shift in the distribution of precipitation intensities towards more extremes.Again the HIRHAM RCP8.5 run stands out with very high α i,j,k 's for the 1 h skewness for most of the year.This means that the extreme precipitation intensities are expected to be higher during summer and especially the sub-daily extremes for the HIRHAM RCP8.5 perturbation could have very high intensities as a combination of a large increase in both variance and skewness will result in many severe precipitation events with a high mean intensity.
For the lag-1 hour auto-correlation (Fig. 10h) the α i,j,k are mostly below 1 indicating more variations from one hour to the next and thus a possibility of more abrupt changes in the rainfall at the hourly level.For the probability of dry days and dry hours (Fig. 10f and g) the pattern is less clear.The RCM simulations show some variation around 1 (approximately between 0.7 and 1.7) but do not agree with respect to the season of these changes or their relative magnitude.This suggests that future rainfall will follow the same overall patterns as today but as all RCM runs have months with α i,j,k below 1 there will also be more severe periods since the precipitation is concentrated on fewer days and hours.For instance, the peaks for the WRF RCP8.5 perturbation in August for both probability of dry days and hours (Fig. 10f and g) in combination with the increases in variance and skewness (Fig. 10b to e) are expected to result in very severe extremes, as the increased rainfall amount is expected to occur on fewer days.On the whole, the α i,j,k 's indicate that for all RCM runs there will be more rainfall on average and it will be more variable resulting in more (and more severe) extreme events.This is in accordance with general findings from studies based on direct output from RCMs (Christensen and Christensen, 2007;Sunyer et al., 2014b).

Changes in climate-changed extremes from the weather generator
Calculating the climate factors, CFs (Eq.5), from the perturbed and original WG using the T year event estimates calculated with Eq. ( 4) shows that despite the differences observed in the α i,j,k for the input statistics (Fig. 10), the perturbation schemes based on RCM simulations modelling comparable climate change (HIRHAM SRES A1B, RACMO SRES A1B, HIRHAM RCP4.5 and WRF RCP4.5) result in  similar changes to extremes after downscaling with the WG (Fig. 11).Clearly, and as expected from the results in Fig. 10, the HIRHAM RCP8.5-perturbedWG results in a much more severe change in extreme precipitation than the other perturbation schemes for both 10-and 100-year return periods.It is interesting that the WG perturbed with HIRHAM SRES A1B results in a rather stable CF in the range 1.35-1.55with seemingly little dependence on return period and event duration, The WGs perturbed with RACMO SRES A1B, HIRHAM RCP4.5, and WRF RCP4.5 show similar CF values that are higher for 100-year extremes than for 10-year extremes but still not depend significantly on the event duration.Both the HIRHAM RCP8.5 and WRF RCP8.5-perturbedWGs yield CF values that depend on the event duration with higher CF for short-duration precipitation extremes.This indicates that this high-end scenario is changing the climate more drastically than the more moderate scenarios (SRES A1B and RCP4.5) and that the observed extreme effects are not linearly scalable from moderate to high end scenarios.For event durations above 48 h, the different WGs yield similar CFs, but surprisingly the high-end scenario WRF RCP8.5 perturbation scheme results in the smallest CF for the long duration events.This may indicate that the direct output from the RCMs underestimate the changes occurring at high spatio-temporal resolutions.Despite the observed differences between WGs perturbed with different RCM runs and different forcing scenarios the results show an upwards change for all event durations (see Fig. 11).The change seems to increase with the return period with a projected change factor of the order of 1.2-1.3 for T = 10 years and 1.4-1.5 for T = 100 years for the moderate scenarios (SRES A1B and RCP4.5).Furthermore, the RCP8.5 scenario-perturbed WG runs suggest that shortduration extreme events become relatively more severe compared to the WG runs perturbed with the other, moderate forcing scenarios.

Unconditional spatial correlation of climate-changed T year events
All the perturbed WG runs produce T year precipitation events with reasonable spatial correlation structure (Fig. 12, Table 6) including calculated e-folding distances, and it is noteworthy that the e-folding distance for present conditions is somewhat shorter for the full WG data set compared to the sub-sets closest to the observations shown in Fig. 9.The HIRHAM RCM and WRF RCM-perturbed WG runs present similar results for all event durations, whereas the RACMO SRES A1B-perturbed WG run yield slightly larger correlations lengths for the very short durations (Fig. 12a).Generally, all the perturbed WG runs have larger correlation lengths than for the present climate, suggesting that the WG implicitly expects that more severe events on average also results in events with a larger spatial extent.This behaviour has recently been observed by Kendon et al. (2014) using a high-resolution regional climate model (1.5 km resolution).This difference, however, is limited, and in general the WG produces extremes with a spatial extent much closer to that of observations than RCMs.Online resource 1 includes an animation of extreme precipitation events generated directly as output from the 25 km resolution RCM HIRHAM SRES A1B, the 8 km resolution RCM HIRHAM RCP4.5, and the 2 km WG evaluated in this study.From these it is clear that the small-scale variability is much more pronounced for the WG output than for the output of the RCMs, but also that the WG output lacks rainfall movement.At the hourly scale this is not a problem for a catchment of the size presented in the online resource (same as shown in Fig. 1).
Only few apparent effects are observed with respect to choice of RCM, GCM, and RCM spatial resolution and it is not possible to detect any systematic patterns.The WG seems to produce robust results with respect to change in extreme precipitation due to climate change that are similar for similar climate forcing scenarios.

Conclusions
Precipitation time series based on high-resolution gauge measurements are presently used as input to design and analysis of urban water infrastructure, and time series representing future climates are needed in the future.Current RCMs operating at 25 and even 8 km spatial scales, however, yield too spatially correlated output that poorly represents the fine-scale precipitation features relevant for urban hydrology.The study indicate that statistical downscaling of precipitation output from RCMs using a stochastic weather generator (WG) is therefore a better solution.
This study demonstrates that the chosen spatio-temporal Neyman-Scott rectangular pulses weather generator (WG) fitted to a dense network of 60 rain gauges in a 40 km × 60 km region simulates realistic extreme precipitation of relevance to urban hydrology.Output is generated at the 1 h temporal scale at a 2 km spatial grid, which is finer than what previous studies using this WG have focused on.Even though urban hydrology literature claims that rain data are ideally needed at a timescale of minutes, the hourly scale chosen here can still be of much use when assessing climate change impacts in urban hydrology as it is much finer than what regional climate models can currently provide.
The WG generally reproduces statistics of the observations such as mean, variance, and skewness of the rainfall intensity distribution well at both the hourly and daily levels.It also produces realistic levels of lag-1 auto-correlation, cross-correlation output at different grid points and probabilities of dry days and hours.Evaluating the WG from an urban hydrological engineering perspective yields the following conclusions: -The extreme events of the simulated time series show realistic levels of intensity as well as a reasonable spatial variability for the full 60 × 40 km model area.Thus, the WG handles the large data set of spatially distributed observational input in a robust manner.
-The seasonal distribution of the extremes are not significantly different in the generated WG data sets compared to the observed SVK data set, implying that the applied procedure of individual monthly model fits results in a realistic seasonal behaviour of the WG.
-The spatial extent of the extreme events in the WG data set, as evidenced by the unconditional spatial correlation of extremes, is close to that of the observational SVK data set with e-folding distances in the same order of magnitude.This is much better than what is observed for regional climate model (RCM) output at a 25 and 8 km grid scale in previous studies.
-This indicates that the WG is a good way to downscale spatio-temporal precipitation output from RCMs to relevant urban scales and that the simulated output can be used directly as input to urban hydrological models.
Output from six different RCM runs representing average to high-emission scenarios are used to perturb the WG for different possible future climate scenarios.Two have a 25 km × 25 km spatial resolution and four have a very high 8 km × 8 km spatial resolution, and all RCM data sets are available at hourly temporal resolution.A clear increase in the magnitude of extreme precipitation is observed for all climate change perturbations of the WG.
This study highlights that different RCMs run with the same greenhouse gas emission scenario can result in different precipitation output and hence different CFs for perturbation of the WG.Despite these observed differences, downscaling with the WG results in similar extreme precipitation behaviour for similar emission scenarios.
Most perturbed WGs confirm that there is a more severe climate change signal for extreme events.The two WGs perturbed by the RCP8.5 scenario show a more severe climate change signal for short-duration events.However, this finding is not shared by the other emission scenarios, suggesting that extreme precipitation at T year event level is not scalable between emission scenarios.The spatial correlation structure of the WG output is slightly altered by the perturbation indicating a built-in correlation between intensity and spatial extent and suggesting that precipitation extremes in a future climate may have larger spatial extent than extremes in the present climate.
The Supplement related to this article is available online at doi:10.5194/hess-20-1387-2016-supplement.

Figure 1 .
Figure 1.Locations of the rain gauges (SVK), the gridded data set (CGD), and extent of the modelled grid (WG) in the northeastern part of Zealand (Denmark) including Copenhagen in the south-eastern part of the map where the concentration of SVK gauges is the highest.

Figure 2 .
Figure 2. Temporal development in the number of stations in the SVK data set (top panel) and the average distance between closest neighbouring stations (middle panel), and the distribution of record lengths (bottom panel).

Figure 3 .
Figure 3. Spatial variation of the mean monthly precipitation calculated from the CGD data set for the model area.Isohyets are separated by 3 mm.

Figure 4 .
Figure 4. Monthly variation of the model parameters estimated from the SVK data set and from the simulated 10 WG data sets.Upper and lower fitting bounds are shown in light grey.

Figure 6 .
Figure 6.Variation of cross-correlation of the 1 h intensity with distance between pairs of gauges in the SVK data set (black dots) and grid points in the WG data set (coloured dots).

Figure 7 .
Figure 7. Mean intensity-duration-frequency curves for 100-and 10-year return periods calculated from the SVK data set and for all 10 WG realizations; 68 % confidence interval for the SVK data set.

Figure 8 .
Figure8.Monthly variation for 1, 6, 12, and 24 h durations of the frequency of extreme events in the SVK and WG data sets.

Figure 9 .
Figure9.Unconditional spatial correlation for the SVK and WG data sets, calculated from maximum-averaged intensities of extreme events for 1, 6, 12, and 24 h duration.Fitted exponential models that highlight overall tendencies are shown.

Figure 10 .
Figure10.Change factors, α's, calculated on a monthly basis for each statistic and each RCM.Each set of α's from an RCM act as a perturbation scheme for the WG.

Figure 11 .
Figure 11.Climate factors for different return periods for the different perturbed WG runs.T = 10 years (left panel) and T = 100 years (right panel).

Figure 12 .
Figure12.The unconditional spatial correlation of all T year events for perturbed WG output for event durations of 1, 6, 12, and 24 h.

Table 1 .
Main characteristics of the two observational data sets used in this study.

Table 2 .
Regional climate model (RCM) runs from which precipitation output is used to calculate perturbations schemes for the WG used in this study.All have a temporal resolution of 1 h.

Table 3 .
The relative weights used in the fitting procedure.

Table 4 .
Comparison between observational (SVK) data and the simulated (WGs) statistics.Data are averaged over the full course of the year and over the full model domain.For the SVK data set the 50th percentile (50) is reported as well as the 16th to 84th percentiles (16-84) interval to emulate the empirical standard deviation.For the WGs one central 50th percentile is reported across the ten simulations.

Table 6 .
e-folding distances for all aggregation periods for all WG output.