Uncertainties in calculating precipitation climatology in East Asia

This study examines the uncertainty in calculating the fundamental climatological characteristics of precipitation in the East Asia region from multiple fine-resolution gridded analysis datasets based on in-situ rain gauge observations. Five observation-based gridded precipitation datasets are used to derive the long-term means, standard deviations in lieu of interannual variability and linear trends over the 28-year period from 1980 to 2007. Both the annual and summer (June5 July-August) mean precipitation is examined. The agreement amongst these precipitation datasets are examined using multiple metrics including the signal-to-noise ratio (SNR) defined as the ratio between long-term means and the corresponding standard deviations, and Taylor diagrams which allows examinations of the pattern correlation, the standard deviation, and the centered root mean square error. It is found that the five gauge-based precipitation analysis datasets agree well in the 10 long-term mean and interannual variability in most of the East Asia region including eastern China, Manchuria, South Korea, and Japan, which are densely populated and have fairly high density observation networks. The regions of large inter-dataset variations include Tibetan Plateau, Mongolia, northern Indo-China, and North Korea. The regions of large uncertainties are typically lightly populated and are characterized by severe terrain and/or extreme high elevations. Unlike the long-term 15 mean and interannual variability, agreements between datasets in the linear trend is weak, both for the annual and summer mean values. In most of the East Asia region, the SNR for the linear trend is below 0.5, i.e., the inter-dataset variability exceeds the multi-data ensemble mean. The uncertainty in the spatial distribution of long-term means among these datasets occurs both in the spatial pattern and variability, but the uncertainty for the interannual variability and time trend is much larger in 20


Introduction
Long-term means, standard deviations in lieu of interannual variability, and trends calculated from observed data are among the fundamental fields in representing the characteristics of regional climates.These climatological properties play crucial roles in defining climatological norms, occurrence of extreme events, detection of climate change, and projecting future climate variations and change as well as their impacts (Giorgi et al., 1994;Groisman et al., 2001;Kim, 2005).For example, reliability of the climate change detection is examined by comparing the long-term means and trends calculated from observations against those simulated in climate model sensitivity experiments (e.g., IPCC, 2001IPCC, , 2007)).In addition, the changes in key local hydrological fields such as precipitation are frequently measured relative to their climatological means.Thus, calculating reliable values of these properties is a critical step in climate research for identifying regional climate characteristics, through quantification of their changes due to external and/or internal forcings such as emissions of anthropogenic greenhouse gases, and the impacts of such changes on regionally important sectors.
Gridded representations of observed data on the basis of a variety of instruments, locations, platforms, retrieval algorithms, and analysis schemes are widely employed in climate research with various goals (Legates and Willmott, 1990;Mitchell and Jones, 2005;Shige et al., 2006;Schneider et al., 2014).Typically, only a limited number of such data sets were available, and most climate studies employed a single data set which includes features needed for their analyses.Recently, a number of researchers and institutions have introduced newly developed observation-based gridded analysis data sets of global or regional coverage with fine spatial resolutions (Legates and Willmott, 1990;Adler et al., 2003;Mitchell and Jones, 2005;Shige et al., 2006;Yatagai et al., 2012;Pai et al., 2013;Schneider et al., 2014).These newly introduced analysis data sets provide precipitation and/or surface air temperatures over extended periods of multiple decades at spatial resolutions of 0.5 • or finer, which are substantial improvements from previous generation data sets that are typically at much coarser horizontal resolutions, for example, the 2.5 • resolution GEWEX Global Precipitation Climatology Project (Adler et al., 2003).These recent fine-scale data sets allow us to better examine the regional precipitation and temperature climatology and to perform more reliable evaluations of today's high-resolution climate simulations, especially over the regions of complex terrain, that are important for climate-change impact assessments and climate model evaluations (Kim et al., 2013).These new data sets also introduce uncertainties in calculating regional climate characteristics because of the differences amongst them.Based on these concerns, two recent studies by Prakash et al. (2014) and Kim et al. (2015) examined uncertainty in calculating precipitation climatology over India and its surrounding regions using multiple precipitation analysis data sets.These two studies have revealed independently that there exist substantial amounts of differences amongst today's gridded precipitation data sets resulting in uncertainties in the calculated precipitation climatology and that the uncertainty and the spread amongst multiple data sets vary according to regions as well as seasons.Kim et al. (2015) further revealed that uncertainties in the calculated precipitation climatology defined relative to their climatological means are generally larger in the dry regions and/or local dry seasons.These two studies strongly suggest that uncertainty due to the differences between various data sets needs to be examined and quantified in all climate studies because the absolute accuracy of individual data sets cannot be quantified in practice.
In this study, we investigate the uncertainty in calculating fundamental properties of regional climate characteristics of precipitation over the Far East Asian region due to the differences amongst today's fine-resolution gridded data sets based on analyses of observed data.This study examines for the first time the uncertainty in calculating the stan-dard deviation, a widely used first-order statistical moment, and linear trend against that in calculating the average, the zero-order statistical moment.Examining the uncertainty in assessing the key precipitation characteristics from the current available precipitation data can help interpret future precipitation projections.In East Asia, with huge populations and frequent hydrologic extremes, assessing long-term variations in precipitation has been an important concern.However, the effects of inter-data-set differences on such assessments have not been studied so far.The uncertainty analysis for the East Asia region in this study is also applicable to any other parts of the world.The methodology and data are presented in Sect.2, and results are given in Sect.3. Section 4 summarizes and discusses the implications of the findings in this study.

Methodology and data
In this study, spatial variations in the long-term means, interannual variabilities, and linear trends over the region of interest are examined in terms of inter-data-set variability measured using signal-to-noise ratio (SNR) and the similarity with reference data.
Five gridded precipitation data sets are used to estimate the uncertainty in constructing regional climate characteristics over East Asia for the entire year and for the summer season (June-July-August).Only the data sets that cover more than 25 years are selected for analysis for reliable calculations of the temporal variability in lieu of interannual variability and linear trends.The period of the recent three decades examined in this study corresponds to a period of quite steady (near monotonic) and large increases in the global mean temperature.The analysis was limited to the 28-year period (1980 ∼ 2007) due to the length of the available data.Examination of the precipitation trend in the period of clear warming trend is a major scientific interest related to the link between the changes in precipitation and temperature.
Based on the selection criterion, five high-resolution gridded data sets are selected, including the Climate Research Unit of the University of East Anglia (CRU), University of Delaware (UDEL), Global Precipitation Climatology Center (GPCC), the Asian Precipitation − Highly Resolved Observational Data Integration Towards Evaluation of water resources (APHRODITE), and the Modern Era Retrospectiveanalysis for Research and Applications (MERRA) land, that are either based on rain gauge data or assimilations.These data sets and references are summarized in Table 1.We also examined uncertainties including the coarseresolution Global Precipitation Climatology Project (GPCP) data (Adler et al., 2003) to get essentially the same conclusions that are obtained with the original five data sets only; thus, the results including the GPCP data are not presented here to focus on fine-resolution data sets.Note that there are some factors leading to differences among the data sets -e.g., the horizontal and/or vertical resolutions, the gridding procedure, the analysis methods.Such inter-data-set differences may be an unavoidable source of uncertainty in this study.As seen in Table 1, observational data are available in various resolutions and discretizations.In fact, data sets of the same horizontal resolution can be defined in different grid structures.The gridding procedure might also be different for different data sets.The analysis data sets are usually based on different sets of station (observational) data, depending on the data availability at the time of analysis and specifics of the quality control procedures (e.g., Mitchell and Jones, 2005;Yatagai et al., 2012;Pai et al., 2013).Furthermore, the analysis methodology, essentially the interpolation scheme that varies for different analysis data sets, can contribute to the inter-data-set differences.However, assessing the effects of different data sets and/or the analysis schemes on the inter-data-set differences used here is beyond the scope of this study.
To alleviate the uncertainty related to the inter-data-set differences, we have interpolated all data sets onto a common grid so that we can compare all data sets at the same locations.The spatial interpolation procedure can affect the characteristics of spatial variability of the interpolated data.This can be an important concern in deriving the characteristics of horizontal variability, e.g., spatial power spectra, but it is not expected to have serious effects on deriving temporal variability of the interpolated data.Because all of the properties we describe in this study are related to the temporal variability (e.g., temporal means, standard deviations, and trends), we expect the differences in the horizontal resolutions and subsequent spatial interpolation have minimal impacts on the results.We have also created a multi-data-set ensemble by simple averaging of all observational data sets included in the analysis, using equal weights.The equal weighting is employed because the accuracy of individual data sets cannot be determined objectively.
Uncertainties in representing precipitation climatology due to the spread amongst today's observational data are examined in terms of the SNR.The SNR has been a key property in a number of climate studies in which the uncertainties of climate signals are estimated against noises stemming from various sources (e.g., Giorgi and Mearns, 2002;Covey et al., 2003;Meehl et al., 2005;Tebaldi and Knutti, 2007;Duan and Phillips, 2010).In climate and weather forecast research based on ensembles of multiple model or observation data sets, the SNR has been used to measure the reliability of the multi-data-set ensemble mean against the spread of the data sets in the ensemble.Within this context, the signal and noise are defined as the associated mean and standard deviation, respectively, of multiple data sets.The definition of "noise" can be complicated when the data reliability varies among data sets, and the weighting factor in constructing multi-data-set ensemble can vary for different data sets (Duan and Phillips, 2010).Such complications in calculating "noise" frequently occur in climate projections where outputs from various models of varying performance are used to construct an ensemble mean using the variable weighting (e.g., Giorgi and Mearns, 2002).Because it is practically impossible to rank the selected observational data sets in terms of their accuracy, the ensemble is constructed using an equal weighting.
The similarity between individual data sets and the reference data defined as the multi-data-set ensemble is measured in terms of the pattern correlation and the standard deviation of individual data sets relative to the reference data sets.Measurements of these two properties are presented using Taylor diagrams (Taylor, 2001;Gleckler et al., 2008).The Taylor diagram was first introduced by Taylor (2001) to provide a way to intuitively present two properties simultaneously; the correlation coefficient of a data set with the reference data are presented in the azimuth angle (the angle for perfect agreement is zero), and the relative magnitude of the standard deviation of a data set with respect to that of the reference data is expressed as the radial distance (e.g., see Fig. 5a).Thus, the radial distance of 1 and the azimuthal angle of 0 • implies that a sample datum has the same pattern and variability as the reference data.In addition, the distance between the point (0 • , 1.0) and a data point in this diagram corresponds to the centered root mean square error (RMSE).This diagram has become one of the most widely used methodologies in climate studies for presenting the evaluations of multiple models and/or variables or intercomparison of multiple data sets (IPCC, 2001;Taylor, 2001;Duffy et al., 2006;Gleckler et al., 2008;Kim et al., 2013Kim et al., , 2015)).and (c, f) the linear trend of precipitation.These properties are derived from the ensemble of the corresponding properties calculated from the data sets in Table 1.
Figure 2. The signal-to-noise ratio (SNR) for the properties shown in Fig. 1, calculated from the corresponding properties of the five precipitation analysis data sets in Table 1.

Regional climatology
Figure 1 presents the three basic characteristics of the annual and summer (June-July-August) precipitation climatology over East Asia -long-term means, interannual variability, and trends, calculated from the ensemble mean of the multiple data sets in Table 1.The mean annual precipitation in the region is characterized by the wet regions in southeastern China and Japan (Fig. 1a).Precipitation over the Korean Peninsula is characterized by maxima in the southwestern and central regions and a rapid decrease towards the northwestern part of the peninsula bordering with Manchuria.The driest region covers southern Mongolia, the Gobi desert, and northern Tibetan Plateau.Interannual variability of the annual precipitation (Fig. 1b) also shows similar distribution as the annual means.Linear trend of the annual precipitation varies substantially according to geography (Fig. 1c).The most notable features include the positive trend in the driest region, including southern Mongolia, the Gobi desert and northern Tibetan Plateau, and the negative trend along the wet Yangtze River.Strong positive trends are also found in much of the Korean Peninsula, the coastal region of northern China to the west of the Shandong Peninsula, most of southern China, and eastern Japan.Decreasing precipitation trends also occur in the region between 45 and 50 • N and extending from central Mongolia to the Russian Far East.The summer rainfall climatology (Fig. 1d-f) resembles the annual mean climatology but with larger magnitudes.This shows that the precipitation climatology over the East Asia region is primarily determined by the summer rainfall.

Uncertainties in precipitation climatology
The climatology presented in Fig. 1 varies for different data sets.This is inevitable because each data set utilizes different raw data, data quality control, and analysis methodology (Xie and Arkin, 1995).Because it is practically impossible to determine which data set is more accurate, assessing the reliability of climatological properties calculated from various data sets as well as the expected range of uncertainty due to the diversity of these data sets is crucial in calculating regional climatology (Kim et al., 2015).In this section, the range of uncertainty in the three precipitation characteristics is measured in terms of the SNR and the agreement between individual data sets and the multi-data ensemble mean in terms of the spatial pattern correlation and the magnitude of spatial variability following the methodology of Kim et al. (2015), using the Taylor diagram.
The SNR is calculated as the ratio between the multi-data ensemble mean and the inter-data-set variability, i.e., a measure of the magnitude of the multi-data-set ensemble mean relative to that of the inter-data-set variations.Thus, as SNR increases, these data sets agree more closely with each other.There is no established threshold value of SNR to distinguish "good" from "bad".However, we may use some subjective guidance to interpret the SNR values.For instance, if SNR < 1 the signal is smaller than the noise, and it becomes a clear case that the signal is not reliable.The case with SNR > 5 may indicate that the spread amongst the multiple data sets may be small enough so that we can take the multi-data ensemble as the representative value for the included data sets.
The SNRs for the annual mean precipitation (Fig. 2a) and its interannual variability (Fig. 2b) over the 25-year period exceed 5 in most of the study domain.Hence, the five data sets examined in this study agree well in terms of the annual mean precipitation and its interannual variability in the East Asia region.The regions of small SNR, i.e., showing poor agreements amongst the selected data sets, are located in the western part of the domain, which includes eastern Tibetan Plateau, the Gobi desert, and northern Indochina bordering with China.It is notable that the station density is relatively low in these regions.The SNR for the interannual variability is generally smaller than that for the mean; thus, uncertainty in calculating the interannual variability is larger than in cal-  culating the mean climatology.Unlike the annual mean and its interannual variability, the SNR for the linear tendency of the annual precipitation (Fig. 2c) is generally below 5 in most regions.Thus, long-term annual precipitation trend in the region is highly uncertain except in a few small areas.
Figures 2d-f show the SNR for the summer mean precipitation.Overall, the reliability of the three characteristics of the summer precipitation calculated from these five data sets is similar to that of the annual precipitation.The SNRs for the summer precipitation climatology are somewhat smaller than those for the annual precipitation climatology, but they still largely exceed 5 in about the same region as for the annual precipitation.For the interannual variability (Fig. 2b vs. Fig.2e) and linear trend (Fig. 2c vs. Fig.2f), the five data sets agree more closely for the summer mean values than for the annual mean values.It is noteworthy that the positive tendency of the summer rainfall in southern China (Fig. 1f) is highly reliable as all five data sets agree closely (i.e., relatively smaller inter-data-set variations compared with the multi-data-set ensemble mean).(c, f) the trends of precipitation.They are presented in terms of their spatial pattern correlations (the azimuthal direction), the standardized deviation, and the standard deviation of individual data sets normalized by that of the reference data (the radial direction).The area within the red polyline represents the range of spread amongst these data sets.
To evaluate the statistical significance of trends, we have plotted the p values from each data set in calculating the linear trend of the annual-mean precipitation and the summermean precipitation (see Figs. 3 and 4, respectively).The regions of large SNR correspond to the regions of small p values in calculating the linear trend.This suggests that some of the uncertainty in the multi-data-set ensemble may be inherited from the uncertainty in calculating the trend from individual data sets.Still, a significant portion of the region of small p values shows small SNR values.Thus, inter-data-set differences are the main cause of the uncertainty in calculating long-term trends.
Figure 5 measures the spatial variations in the three climatological properties represented by the five observational data sets using the Taylor diagrams and the simple multi-data-set ensemble as the reference.In these diagrams, the areas encompassed by the red polylines may be regarded as the range of uncertainty (see Kim et al., 2015).Thus, as the area is smaller, the uncertainty due to the differences between the examined data sets is smaller.The spread in the azimuthal and radial direction indicates the spread in the spatial pattern and in the magnitude of spatial variability, respectively.Similar to Fig. 2, the uncertainties in the spatial variations of the annual and summer mean precipitation and their inter-annual variability are much less than the uncertainty in the spatial variations of the linear trend.The distances from the reference data at the point indicated by a star (i.e., the reference point with both standardized deviation and correlation being equal to 1.0) to individual data sets for the means (Fig. 5a and d) are similar to those for their interannual variability (Fig. 5b and e), indicating a similar level of spread amongst these data sets in representing these two properties of the precipitation climatology in the region.Regarding the linear trend (Figs.5c and f), compared to the means and their interannual variabilities, the distances between the reference point and individual data sets are much larger.This is another indication of the larger uncertainties in the linear trend represented by these data sets.
One interesting feature in the examination of the uncertainties in the spatial variability in Fig. 5 is that the spreads in these data sets occur in both the spatial pattern and the magnitude for the annual and summer mean values; however, these data sets show more consistency in the spatial pattern than in the variability.Figures 5b and e show that the five data sets show similar spatial correlations with the reference data and that the predominant spread among these data sets is in the radial direction, i.e., the magnitude of the spatial variability.This feature is more pronounced for the linear trend (Fig. 5c and f), which shows nearly linear distribution of the data points in radial directions, i.e., much smaller spread in the azimuthal direction (pattern correlations) than in the radial direction (magnitude of variability relative to the reference data).

Summary and discussions
The uncertainties in three fundamental climatological characteristics of the precipitation over East Asia due to the differences among available fine-scale observation-based gridded analysis data sets have been examined using the metrics selected for objectively measuring the spread of these properties calculated from individual data sets.The three climatological characteristics include the means, interannual variabilities, and linear trends in the annual and summer mean precipitation, which are key fundamental climatological characteristics widely used in studies for examining regional climate characteristics and model evaluations.The spread and the magnitude of disagreements amongst the selected data sets are measured using the signal-to-noise ratio (SNR) and examined visually using the Taylor diagrams, which allow simultaneous evaluations of three properties -pattern correlation, standard deviations, and the centered mean square errors between multiple data sets and a reference data set.
The SNR values calculated from the five selected precipitation data sets show that the mean climatology of the annual and summer mean precipitation values and their interannual variability are highly reliable in much of East Asia except in southern Mongolia, the Gobi desert, and the Tibetan Plateau -the regions of sparse population and complex terrain.Precipitation measurements in regions of dry climate and complex terrain require high-density networks (e.g., Kim et al., 2015).Unlike the climatological mean values and interannual variability, linear trends calculated over the 28-year period are highly uncertain except in a few limited areas.It is striking that reliable estimations of the temporal trend of the annual mean precipitation (Fig. 2c) are very low compared to those for the means and the variability (Fig. 2a and b, respectively).Reliable calculation of linear trends is only possible over the southern China region for the summer mean precipitation.Thus extra caution must be taken when analyzing precipitation trends over the East Asian region.
The uncertainty characteristics also vary according to the climatological properties.Figures 1 and 2 discussed above show that the reliability of calculating temporal variabilities is much lower than that of time mean values, especially for linear trends.In addition, the spatial pattern and variability of the calculated linear trend (Fig. 5c) show much larger spread (i.e., uncertainty) among these data sets compared to the annual means (Fig. 5a) and interannual variability (Fig. 5b).The consistency in the spatial pattern between individual data sets and the reference data measured in terms of the correlation is near or over 0.95 for the temporal means and variabil-ity whilst it barely exceeds 0.8 for the linear trend.The range of spatial variability measured in terms of the standardized deviation (the ratio between the standard deviation of a data sets and the reference data set) for the linear trend is over 0.5 which is more than twice the range of the means and the variabilities.It is also observed that uncertainties in the spatial distribution of the annual and summer mean precipitation (Fig. 5a and d, respectively) occur in both the spatial pattern and the magnitude of variability.For the interannual variability and linear trends, the spread in the standardized deviation (i.e., the magnitude of variability) is much larger than that in the spatial pattern.These may suggest that all of these data sets are affected by some common factors in determining the characteristics of these data sets.For example, the station data sets included in each analysis data set may provide high consistency in the spatial distribution pattern, but different analysis schemes may lead to a larger spread in the magnitude of their variability because of different basis functions employed in different interpolation schemes (e.g., Xie and Arkin, 1995;Prakash et al., 2014).This is just a hypothesis and needs close examination in future studies.
The uncertainty in calculating precipitation climatology in the regions including southern Mongolia, the Gobi desert, and the Tibetan Plateau is of a special concern.These regions can respond sensitively to climate change because of disproportionally larger impacts of global warming on highelevation regions and snow-ice processes (e.g., IPCC, 2007;Waliser et al., 2011).Because of rapid variations in the spatial precipitation distributions according to terrain during storms, accurate measurement of precipitation in the regions of extreme terrain requires high-density gauge networks (Xie and Arkin, 1995).The sparse population density in these regions may require higher cost to build and maintain additional gauges to reduce the uncertainties.Remote sensing of precipitation will play important roles in monitoring precipitation over these regions of sparse observations in addition to the investments for installing and maintaining additional surface observing stations.

Figure 1 .
Figure 1.The climatological properties of the annual (upper panels) and summer (lower panels) precipitation for the period 1980-2007 over East Asia: (a, d) the mean climatology, (b, e) the standard deviation,and (c, f) the linear trend of precipitation.These properties are derived from the ensemble of the corresponding properties calculated from the data sets in Table1.

Figure 3 .
Figure 3.The p values in calculating the linear trend of the annualmean precipitation from each data set.

Figure 4 .
Figure 4. Same as in Fig. 3, but for the summer-mean precipitation trend.

Figure 5 .
Figure 5.The spread amongst the five precipitation data sets in representing the spatial variability of the three climatological properties of the annual (upper panels) and summer (lower panels) precipitation over East Asia: (a, d) the mean, (b, e) the interannual variability, and(c, f) the trends of precipitation.They are presented in terms of their spatial pattern correlations (the azimuthal direction), the standardized deviation, and the standard deviation of individual data sets normalized by that of the reference data (the radial direction).The area within the red polyline represents the range of spread amongst these data sets.

Table 1 .
The precipitation data sets employed in this study.