The effect of empirical-statistical correction of intensity-dependent model errors on the temperature climate change signal

This study discusses the effect of empiricalstatistical bias correction methods like quantile mapping (QM) on the temperature change signals of climate simulations. We show that QM regionally alters the mean temperature climate change signal (CCS) derived from the ENSEMBLES multi-model data set by up to 15 %. Such modification is currently strongly discussed and is often regarded as deficiency of bias correction methods. However, an analytical analysis reveals that this modification corresponds to the effect of intensity-dependent model errors on the CCS. Such errors cause, if uncorrected, biases in the CCS. QM removes these intensity-dependent errors and can therefore potentially lead to an improved CCS. A similar analysis as for the multimodel mean CCS has been conducted for the variance of CCSs in the multi-model ensemble. It shows that this indicator for model uncertainty is artificially inflated by intensitydependent model errors. Therefore, QM also has the potential to serve as an empirical constraint on model uncertainty in climate projections. However, any improvement of simulated CCSs by empirical-statistical bias correction methods can only be realized if the model error characteristics are sufficiently time-invariant.


Introduction
Society is increasingly demanding reliable projections of future climate change to analyze adaptation options and costs, to explore climate change mitigation benefits, and to support political decisions.Such climate projections are usually generated with general circulation models (GCMs) of rather coarse spatial resolution, which are refined by dynamical or statistical downscaling methods (e.g., Giorgi and Mearns, 1991;Fowler et al., 2007).Currently, an increasing number of climate change impact investigations rely on dynamical downscaling methods, i.e., the use of regional climate models (RCMs, e.g., Giorgi andMearns, 1991, 1999;Wang et al., 2004;Rummukainen, 2010).However, even the newest generation of RCMs features considerable systematic errors (e.g., Kotlarski et al., 2014), which complicates the direct application of RCM results in climate change impact research.RCM output is therefore usually post-processed with empirical-statistical "bias correction" methods (e.g., Déqué, 2007;Themeßl et al., 2011) before it is used as input for impact models, such as hydrological models.Bias correction methods have been demonstrated to successfully reduce systematic model errors (i.e., the difference between historical model output and meteorological observations), but the knowledge about how they influence the climate change signal (CCS; i.e., the long-term average difference between a future and a past climate simulation) is very limited so far.
A relation between model errors and CCS has been discussed by Christensen et al. (2008), who found that monthly temperature errors of RCMs over Europe often depend on the observed monthly mean temperature and that in warmer months, errors are often larger than in colder months (or vice versa).Such "intensity-dependent" errors can be shown to alter the temperature CCS (Christensen et al., 2008;Themeßl et al., 2012;Boberg and Christensen, 2012).
Bias correction methods like quantile mapping (QM) modify the CCS.For example, Themeßl et al. (2012) and Dosio et al. (2012) showed that QM modifies the CCS of RCMs in op-A.Gobiet et al.: The effect of empirical-statistical correction on the CCS eration over Europe in some regions and seasons, and found a lower summer temperature CCS in eastern Europe as well as a higher winter temperature CCS in Scandinavia after bias correction with QM.Currently, such modifications are often regarded as an undesired deficiency of bias correction methods (e.g., Hempel et al., 2013).However, Maurer and Pierce (2014) recently claimed that QM may have no negative effect on the quality of the CCS and demonstrated that QM does not deteriorate the multi-model mean precipitation CCS in a GCM ensemble.
In this paper we go a step further and argue that, under the assumption of time-invariant model error characteristics, the modification of the CCS by QM can be interpreted as improvement, rather than as deterioration, since it is capable of mitigating intensity-dependent model errors.To support this hypothesis, we develop a linearized analytical description of the effect of intensity-dependent model errors on the CCS.This framework allows the impact of such errors to be investigated, not only on the multi-model mean CCSs in an ensemble of climate simulations, but also on the inter-model variability, which is often used as a measure of uncertainty in climate projections (e.g., Hawkins andSutton, 2009, 2011;Prein et al., 2011).Furthermore, we compare the analytical correction of the CCS to the correction by QM.
In Sect.2, the QM method is described and its effect on the temperature CCS of the ENSEMBLES multi-model data set is demonstrated.In Sect.3, the error characteristics of the ENSEMBLES models are analyzed, and in Sect. 4 we present an analytical formulation of intensity-dependent model errors and their effects on the CCS.In Sect. 5 these effects are compared to the effects of QM on CCSs, and in Sect.6 a summary is given and conclusions are drawn.

Quantile mapping
The basic assumption of QM is that model errors depend on the value of the simulated variable.This concept of intensitydependent errors is a rough simplification of actual model error characteristics, since model errors are not only influenced by the local value of the simulated variable.However, we will demonstrate that errors and local values correlate well in many cases (Sect.3).The concept is simple yet powerful, since it separates, e.g., cold from hot regimes, or drizzling from heavy precipitation regimes and therefore accounts for potentially very different model errors under the associated regimes.It should be emphasized that intensitydependent model errors are equivalent to a misrepresentation of variability, i.e., to differences between the observed and modeled width of the density distribution.Figure 1 demonstrates that intensity-dependent error characteristics with a positive slope correspond to overestimated variability, if the model error is defined as the difference between the inverse modeled and observed empirical cumulative density functions (ECDF).Similarly, a negative error slope corresponds to underestimation of variability.QM is a distribution-based bias correction method (e.g., Panofsky and Brier, 1958;Wood et al., 2004) that maps a modeled historical ECDF to an observed ECDF, with the mapping function shown in Fig. 1c for an artificial example.It is a well-established method to prepare climate model output as input for hydrological models (e.g., Déqué, 2007;Maraun et al., 2010;Themeßl et al., 2011) and has been successfully applied to the sum of daily precipitation and air temperature of RCMs and GCMs by Dobler and Ahrens (2008), Piani et al. (2010a, b), Dosio and Parulo (2011), Dosio et al. (2012), Maurer and Pierce (2014) and others.Furthermore, Themeßl et al. (2011) showed for daily precipitation sums that QM outperforms six other prominent bias correction techniques.
In our study, a non-parametric version of QM is used (Themeßl et al., 2011(Themeßl et al., , 2012;;Wilcke et al., 2013), as suggested by Gudmundsson et al. (2012).The ECDFs are constructed from 930 values for each day of the year based on modeled and observed data of a 30-year reference period  and a 31-day moving window, centered on the day under consideration.Our implementation of QM is not restricted to the range of observed values in the reference period, since the correction is extrapolated beyond the calibration range by using the correction term of the highest and lowest quantile, respectively.Please note, that this implies constant (not intensity-dependent) error characteristics outside the calibration range.As discussed by Bellprat et al. (2013), such constant errors at high temperatures outside the calibration range may be more realistic in many cases than a linear extrapolation.
Some restrictions apply to the application of QM on climate scenarios: as pointed out by Eden et al. (2012), internal variability causes differences between a GCM simulation and observations, which cannot be separated from actual model errors, if QM is applied to GCM-driven RCMs, as in our case (see Sect. 2.2).By using rather long calibration periods (30 years) and by focusing on temperature, which is less affected by natural variability than, e.g., precipitation, we try to minimize this effect.In addition, our multi-model approach further reduces dependence on natural variability.However, in the interpretation of the results, some noise due to natural variability has to be taken into account.Similar to all empirical-statistical downscaling and bias correction methods, the application of QM on future climate simulations is based on the assumption of time-invariant model error characteristics.This stationarity assumption can obviously not be directly assessed for future periods and it can be expected to be violated to some degree.However, several studies demonstrate the skill of empirical-statistical bias correction methods, either for past periods independent of the calibration period under ongoing climate change (e.g., Piani et  , 2010a;Themeßl et al., 2012;Gudmundsson et al., 2012;Wilcke et al., 2013), or for future periods using a pseudoreality approach (Maraun, 2012).Furthermore, Teutschbein and Seibert (2013) show that correction methods like QM perform better under non-stationary conditions than widely used linear transformations or the delta-change approaches.This gives confidence that empirical-statistical bias correction with QM is useful not only for historical simulations, but also, though with degraded performance, for future climate simulations.However, in a strict interpretation, the results and conclusions of this study are only valid under the assumption of time-invariant model errors and it is still subject to further investigation to determine the severity of this restriction.Although such investigation is outside the scope of our study, we want to mention that the new centennial reanalyses of ECMWF (ERA-20C) and NOAA-CIRES (V2c) offer a promising new test bed for the investigation of the long-term stability of model error characteristics.

Model and observational data
We apply QM to a set of 15 GCM-driven regional climate simulations for Europe from the ENSEMBLES multi-model data set (van der Linden and Mitchell, 2009).The ENSEM-BLES models are operated on a 25 km grid and reach until 2100.In the following, we show the results for daily mean temperature, but the analysis of daily minimum and maximum temperatures gives very similar results.The application of our analysis to other parameters like, e.g., precipitation is basically straight forward, but the linearization applied in Sect. 4 can be expected to be less appropriate for precipitation than for temperature.Further investigation is needed to fully reveal the effect of QM on the precipitation CCS.The major motivation for focusing on temperature here is its relatively simple error characteristic and its significant climate trend, which facilitates the demonstration of the effect of QM on the CCS.
As observational reference, the ENSEMBLES gridded observational data set (E-OBS, Haylock et al., 2008) is used.
It is a European land-only daily high-resolution (25 km grid spacing) data set for five meteorological parameters, including daily mean temperature.

The effect of QM on the CCS in ENSEMBLES
Subsequently, we show the effect of QM on the multi-model mean CCS and on the standard deviation of CCSs for the periods 2021-2050 and 2070-2099, both compared with the reference period 1971-2000.In Fig. 2 the spatial patterns of the difference between the uncorrected and the corrected multi-model mean temperature CCS is shown for different seasons in the middle (left) and end (right) of the 21st century.In the end of the century, differences exceed +0.5 K in summer (JJA) in larger parts of southeastern Europe, France, and the Iberian Peninsula and −0.5 K in larger regions in Scandinavia, which roughly corresponds to 15 % of the uncorrected CCS.These results are consistent with the analyses of Boberg and Christensen (2012) and Dosio et al. (2012) and indicate that summer warming in southeastern Europe is projected to be less severe, and warming in Scandinavia is projected to be more severe, after bias correction with QM.However, the differences remain in the order of 10 % of the uncorrected CCS and the basic pattern of temperature change is not strongly altered by QM.
Figure 3 shows the spatial pattern of the difference between the uncorrected and the corrected standard deviation of CCSs as a measure of model uncertainty.In most regions, model uncertainty is larger in the uncorrected model ensemble (orange colors), particularly in regions where the CCS is overestimated (see Fig. 2).The overestimation locally peaks at 0.5 K.However, in some regions (e.g., Scandinavia) and periods (e.g., late 21st century winter) model uncertainty is smaller in the uncorrected model ensemble, locally peaking at about −0.4 K.
After having demonstrated and quantified the effects of QM on the CCS and the model uncertainty in the ENSEM-BLES multi-model ensemble, the rest of this paper is devoted to the explanation of these effects.The following characterization of model errors is based on daily mean temperature ECDFs, which are averaged over each month and subregion.For each model, only the range between the 10th and 90th percentiles is used in order to avoid the noisy tails of empirical distributions.The ECDFs of the grid points in each subregion are sampled over this range on a daily basis and the daily model error characteristics are derived for each grid point by subtracting the inverse observed from the inverse modeled ECDF (see Fig. 1).Further, the grid point error characteristics are averaged over In order to analyze whether such single-model error slopes cancel out in the multi-model ensemble, the ensemble average error characteristics (bold lines) in SC and EA are shown in Fig. 5 together with those of all 15 individual models (light lines).In SC, a considerable negative multi-model average slope exists in most parts of the year (minimum in July).Contrarily, positive slopes can be found in EA in summer (maximum in July).Several other regions, like AL, feature only minor multi-model average slopes, but in turn larger slope variability (see Figs. S9 to S12).

Analytical description of the effect of intensity-dependent model errors on the CCS
Having shown and quantified the intensity-dependence of model errors in the ENSEMBLES multi-model data set, we subsequently give a simplified analytical description to highlight the mechanism of how such errors act on the CCS in a multi-model ensemble.

CCS of a single climate simulation
Let y i j be the value of a meteorological variable (e.g., temperature, precipitation sum, or any other simulated variable) on day j simulated by model i. y i is the 30-year average for a specific time of the year, e.g., for a month or a season.It can be expressed as a combination of the observed average value x and the deviation of the model from this value due to errors (y i e ) and due to natural and model internal variability (y i v ): The CCS y i ( y i = y i future − y i past ) can then be written as (1) x denotes the deterministic part of the error-free CCS, y i e the effect of model errors, and y i v the random effect of internal variability.In many studies, the model error term is neglected ("delta-change approach"), since errors are expected to be time-invariant and to cancel out in the CCS.We demonstrate that this is not the case, even for time-invariant error characteristics, if they are intensity-dependent and the CCS is non-zero.The daily intensity-dependent errors can be written as a function of the meteorological variable under consideration: y i e,j = f (y i j ).For the sake of simplicity, we assume a linear error function with a constant bias b i , error slope s i , and residual ε i j : y i e,j = b i + s i y i j + ε i j . (2) This linear error function is a good approximation of the error characteristics of the ENSEMBLES multi-model data set in most cases, since the median coefficient of determination of the linear regression to the error characteristics shown in Sect. 3 is high (R 2 = 0.91).However, it is not always suitable as, e.g., in the case of the HC model in SC in winter and the SMHI model in IP in summer (Fig. 4).Averaging over 30 years, taking the difference between a future and a past period, and neglecting the residual, yields the linearized effect of the intensity-dependent model error on the CCS: The bias cancels out, since it is assumed to be time-invariant and not intensity-dependent.From Eqs.
(1) and (3) the simulated CCS can be written as Equation ( 4) shows that intensity-dependent model errors lead to a modeled CCS that is proportional to the error-free CCS ( x + y i v ) and a factor determined by the error slope (1/(1 − s i )). Figure 6a illustrates this effect in relative terms: positive error slopes lead to an exaggeration of the error-free CCS and negative slopes dampen it, but to a smaller extent.For example, for slopes of 0.1 and −0.1 the error would amount to about 11 and −9 %, respectively.The depicted range of error slopes from −0.7 to 0.5 has been selected according to temperature error slopes found in the ENSEM-BLES multi-model data set (see Sect. 5).

Multi-model mean CCS
For a multi-model ensemble, the ensemble mean CCS and the multi-model variance of the CCS is relevant.To derive the effect of intensity-dependent errors on these quantities, the error slope can be written as the sum of the ensemble mean error slope (s) and a model-specific residuum error slope (s i ).Combining this separation with the expanded form of Eq. ( 4) yields Accordingly, the multi-model mean CCS is In Eq. ( 6) we could disregard the internal variability y i v since it has the expectation zero (assuming a large number of models n).In addition, the expectation of the product s i y i equals the covariance of both terms, since the expectation of s i is zero under the assumption of normally distributed error slopes.However, it is not independent from y i as the error slope influences the CCS according to Eq. ( 4).In a similar form as Eq. ( 4), Eq. ( 6) reads Equation ( 7) shows that intensity-dependent errors influence the multi-model mean CCS via two terms: firstly, the error slope term, which scales with the error-free CCS ( x) just like in the single-model case, and secondly the covariance term, which adds an offset.Figure 6b visualizes the corresponding error in the CCS in relative terms: positive multimodel mean error slopes lead to an exaggeration of the CCS, and negative slopes dampen it, just like in the single-model case (black line).The depicted range of multi-model error slopes from −0.16 to +0.13 has been selected according to the multi-model mean temperature error slopes of the EN-SEMBLES multi-model data set (see Sect. 5).Positive and negative covariance terms create positive and negative offsets, respectively.Following Eq. ( 3), it can be expected that single-model error slopes and CCSs are generally positively correlated and that the covariance term is consequently positive.The depicted range of covariance terms corresponds to values found in the analysis of temperature errors of the EN-SEMBLES multi-model data set, ranging from −0.02 (blue colors) to +0.21 (pink colors) (Sect.5), and confirms this expectation.The absolute effect of the covariance term (Eq.7) is independent from the error-free CCS and thus gets smaller with higher CCS in relative terms, which is indicated by lighter (small CCS) and darker colors (large CCS).

Variance of CCSs in a multi-model ensemble
The effect of intensity-dependent errors on the second important quantity in a multi-model ensemble, the variance of CCSs (which is often interpreted as a measure of uncertainty), can be described with the linearized model as well.Using Eqs. ( 5) and ( 6), the variance can be expressed as var( Expanding and simplifying Eq. ( 8) gives (see the Supplement for a detailed derivation) var( y i ) = var( y i v ) + s 2 var( y i ) + var(s i y i ) Since var( y i v ) is the effect of natural variability, it can be interpreted as the variance of an error-free model ensemble.Compared to that, the variance of a model ensemble with intensity-dependent errors is always exaggerated by a positive offset s 2 var( y i ).For example, an ensemble mean error slope of ±0.1 results in about 1 % bias in variance.In addition, the positive additive term var(s i y i ), which represents the variability of the individual model's error slopes multiplied by the CCSs, further increases the positive bias.The last term 2scov( y i , s i y i ) is positive for positive slopes and negative for negative slopes, assuming a positive correlation of the simulated CCS and the residual error slope.It is difficult to estimate the relative importance of the different terms and in particular to judge if the possibly negative covariance term can counterbalance the otherwise positive terms, so all terms of Eq. ( 9) are quantified and analyzed for the ENSEMBLES multi-model ensemble in Sect. 5.

Linearized correction
The linearized error characterization leads to a simple way to correct the CCS of single models following Eq.( 3), the multi-model mean CCS following Eq.( 6), and the multimodel variance of CCS following Eq.( 9).Error slopes, climate change signals, their variability, and their covariance are calculated based on the comparison of historical simulations with observations and applied to results of future simulations.Such correction assumes not only a linear errorslope, but also time-invariant error characteristics.The linearly corrected multi-model mean temperature CCS is listed in Table 1 ( x LC ) and the variance of the CCSs in Table 2 (var( x) LC ).They are discussed in the following section.

Correction of the CCS and its uncertainty
In Table 1 the terms contributing to errors in the multi-model mean CCS (see Eq. 6) are listed for all subregions and seasons.Multi-model mean error slopes (s) are mostly negative in DJF and MAM, mostly positive in JJA and SON, and range from −0.16 in SC in MAM to 0.13 in EA in JJA.Accordingly, they inflate (positive slopes) or dampen (negative slopes) the CCS, depending on season and subregion.The errors stemming from the slope term (s y) range from −0.25 to 0.20 K in the mid-century and from −0.57 to 0.45 K at the end of the century.Contrary, the covariance term (cov(s i , y i )) is, with very few exceptions, positive and increases the CCS.It amounts 0.04 K on average, ranges from −0.02 to 0.21 K in both periods, but usually does not exceed 0.10 K. Compared to the slope term, the covariance term is smaller in most cases, but cannot be neglected, as it sometimes equals or even exceeds the slope term.Table 1 also lists the uncorrected ( y) and corrected multi-model mean CCS (linearized correction (LC): x LC ; quantile mapping: x QM ) for each season and subregion.The difference between uncorrected and corrected CCS averaged over all seasons and regions is small (0.01 K), but can reach up to about 0.5 K (about 15 % of the uncorrected CCS) in specific regions and seasons.Figure 7 displays this estimated error in the multi-model mean temperature CCS.With few exceptions, both correction methods feature the similar sign of correction and agree reasonably well in their magnitude.Major differences are found in the later period, when QM often indicates smaller errors than LC.This can be probably explained by the fact that LC extrapolates intensity-dependent errors, while our implementation of QM keeps the error constant outside the calibration range (see Sect. 2.1).This dampens the error slope under severe warming (i.e., at the end of the 21st century) when daily temperatures outside the calibration range frequently occur.Further discrepancies between QM and LC can be explained by the linear approximation of LC.Both correction methods agree that the uncorrected CCS is regionally biased up to +0.5 K in EA and FR in summer and about −0.5 K in SC.The qualitative agreement of QM with LC can be interpreted as a confirmation that the correction of intensity-dependent errors is the main reason of the modification of the CCS by QM.
In Table 2, the terms contributing to errors in the estimated variance of a multi-model ensemble (Eq.9) are listed: two positive offset terms, s 2 var( y i ) and var(s i y i ), and the term 2scov( y i , s i y i ), which generally has the same sign as the error slope due to the positive correlation between single-model CCS and error slope.While s 2 var( y i ) is very small in both periods (smaller than 0.01 K 2 in most cases), var(s i y i ) amounts to 0.041 K 2 on average (range: 0.006-0.128K 2 ) in the earlier period and to 0.223 K 2 on average (range: 0.026-0.697K 2 ) in the later period.Given a modeled average variance of 0.342 K 2 in the earlier and 1.028 K 2 in the later period, this means that this term leads to an overestimation of variance by 12 and 22 % on average, respectively.In specific regions and seasons, the overestimation can amount to 50 % and more (e.g., SC in the later  period).The average covariance term 2scov( y i , s i y i )is very small in both periods (−0.002 and +0.002K 2 , respectively) and ranges from −0.028 to 0.136 K 2 .In summary, the positive variability term, var(s i y i ), dominates and is mostly even enhanced by the covariance term.This leads to a general overestimation of ensemble variance.Table 2 also lists the uncorrected (var( y i )) and corrected variance of the CCSs in the multi-model ensemble (LC: var( x LC ); QM: var( x QM )) for each season and subregion.The average difference between uncorrected and corrected variance over all seasons and regions does not cancel out as in the case of the mean CCS, but amounts on average to 17 % in the case of LC and to 12 % in the case of QM.This demonstrates that time-invariant intensity-dependent errors inflate model uncertainty in multi-model ensembles.In Fig. 8 this error is expressed as standard deviation, which is overestimated by up to 0.4 K at the end of the century.This is particularly the case in regions where the mean CCS is overestimated like in EA in summer.However, the two correction methods disagree in some cases as, e.g., in SC in winter at the end of the century.These discrepancies are currently not fully understood and require further analysis.They could, e.g., be caused by the linearity assumption of LC, by the constant (not intensity-dependent) correction outside the calibration range of QM, or by time-variant model errors.

Summary and conclusions
The knowledge about the influence of empirical-statistical bias correction methods like QM on the CCS of climate simulations is very limited so far.For the ENSEMBLES multimodel data set it has been demonstrated that QM dampens projected summer warming in southeastern Europe and France by about 0.5 K and enhances projected warming in Scandinavia by about the same amount.This corresponds to about 15 % of the uncorrected CCS.Such modification is currently strongly discussed and is often regarded as deficiency of bias correction methods.However, we argue that under the assumption of time-invariant model errors, QM should generally lead to an improvement of the simulated CCS rather than deterioration.
To support this hypothesis, we analytically formulated the effect of intensity-dependent model errors on the CCS and showed that they erroneously modify the CCS.Positive error slopes lead to an exaggeration of the CCS and negative slopes dampen it.This is the case for a single model's CCS as well as for the multi-model mean CCS in a model ensemble, which is additionally exaggerated by high variability amongst the single model's CCSs.A comparison of this analytically determined error and the effect of QM on the mean CCS in the ENSEMBLES multi-model data set leads to largely similar results.This confirms that the effect of QM on the CCS is mainly caused by the correction of intensitydependent errors and that such modification can be regarded as improvement, if roughly time-invariant model error characteristics can be assumed.
With regard to the variance of the CCSs in a multi-model ensemble, the analytical description reveals that intensitydependent model errors lead to an overestimation of variance.Since variability of CCSs in a multi-model ensemble is often used as an indicator for model uncertainty, intensitydependent model errors can be regarded to be responsible for parts of the model uncertainty in the CCS.This further implies that the correction of intensity-dependent errors by QM should lead to a smaller variance and therefore constitute an empirical constraint on climate model uncertainty.However, we could only partly demonstrate this very desirable effect by the application of QM on the ENSEMBLES data set.In most regions and seasons, the analytical correction as well as QM reduce the variance as expected, but particularly in the winter season of longer-term simulations, QM often increases it, which could not be fully explained so far and needs further investigation.
Generally, our results indicate that empirical-statistical bias correction methods that correct for intensity-dependence in model errors can lead to improved estimates of future climate change.The improvements primarily refer to the mean CCS, but also an empirical constraint on uncertainty in multimodel climate projections seems to be feasible.A restriction to these results is the fact that any potential improvement can only be realized if the assumption of time-invariant model error characteristics sufficiently holds.It is still subject to further investigation to determine the severity of this restriction.
The Supplement related to this article is available online at doi:10.5194/hess-19-4055-2015-supplement.
Author contributions. A. Gobiet is responsible for the general concept and conduction of the study, for the analytical description presented in Sect.4, for the interpretation of the results and for writing the text.M. Suklitsch contributed the analysis of the error characteristics of the ENSEMBLES models and G. Heinrich the analysis of the effect of QM on the climate change signal in the ENSEMBLES data set.Both were also involved in the discussion of the results and contributed to parts of the text.

Figure 1 .
Figure 1.Intensity-dependent model errors of a model that overestimates daily temperature variability (artificial data).(a) Modeled (red, standard deviation of 5 • C) and observed (green, standard deviation of 4 • C) empirical density functions; (b) modeled (red) and observed (green) ECDFs; (c) model error at different modeled values.

Figure 2 .
Figure 2. Differences between uncorrected and corrected (QM) multi-model mean temperature CCS.The reference period is 1971-2000.The left panels refer to CCSs in the mid-21st century (2021-2050), the right panels to the end of the 21st century (2070-2099).Blue colors indicate areas where the uncorrected model is colder than the corrected model; red colors vice versa.

Figure 3 .
Figure 3. Differences between uncorrected and corrected (QM) multi-model standard deviation.The reference period is 1971-2000.The left panels refer to CCSs in the mid-21st century (2021-2050), the right panels to the end of the 21st century (2070-2099).Blue colors indicate areas where the uncorrected ensemble features a smaller standard deviation; orange colors vice versa.

Figure 4 .
Figure 4. Temperature error characteristics (model minus observation) of the HC (left panels) and SMHI (right panels) RCMs in eight subregions of Europe (sub-panels) and each month of the year.

Figure 5 .
Figure 5. Temperature error characteristics (modeled minus observed) of the ENSEMBLES models in SC (left panels) and EA (right panels).The light lines show the error characteristics of the individual models, the bold line shows the ensemble average.The number in the lower right corner of each panel denotes multi-model average error slope.

Figure 6 .
Figure 6.(a) Effect of the error slope on the single-model CCS.(b) Effect of the error slope on the multi-model mean CCS.Black line: covariance term of 0 K; blue lines: covariance term of −0.02 K; pink lines: covariance term of +0.21 K.The lightest colors correspond to an error-free CCS of 1 K, the darkest colors to a CCS of 4 K.

Figure 7 .
Figure 7.Estimated errors in the multi-model mean CCS due to intensity-dependent model errors.The reference period is 1971-2000.The orange colors refer to CCSs in the mid-21st century (2021-2050), the blue colors to the end of the 21st century (2070-2099).Light colors correspond to the estimation of the error by QM, dark colors to LC.

Figure 8 .
Figure 8.Estimated errors in the multi-model standard deviation of the temperature CCS due to intensity-dependent model errors.The reference period is 1971-2000.The orange colors refer to CCSs until the mid-21st century (2021-2050), the blue colors until the end of the 21st century (2070-2099).Light colors correspond to the estimation of the error by QM, dark colors to LC.

Table 2 .
Multi-model variance of the temperature CCSs (var( y i