An assessment of the performance of global rainfall estimates without ground-based observations

Satellite-based rainfall estimates over land have great potential for a wide range of applications, but their validation is challenging due to the scarcity of ground-based observations of rainfall in many areas of the planet. Recent studies have suggested the use of Triple Collocation (TC) to characterize uncertainties associated with rainfall estimates by using three collocated rainfall products. However, TC requires the simultaneous availability of three products with mutually-uncorrelated errors, a requirement which is difficult to satisfy with current global precipitation datasets. 5 In this study, a recently-developed method for rainfall estimation from soil moisture observations, SM2RAIN, is demonstrated to facilitate the accurate application of TC within triplets containing two state-of-the art satellite rainfall estimates and a reanalysis product. The validity of different TC assumptions are indirectly tested via a high quality ground rainfall product over the Contiguous United States (CONUS), showing that SM2RAIN can provide a truly independent source of rainfall accumulation information which uniquely satisfies the assumptions underlying TC. On this basis, TC is applied with SM2RAIN on 10 a global scale in an optimal configuration to calculate, for the first time, reliable global correlations (versus an unknown truth) of the aforementioned products without using a ground benchmark dataset. The analysis is carried out during the period 2007-2012 using daily rainfall accumulation products obtained at 1◦x1◦ spatial resolution. Results convey the relatively high performance of the satellite rainfall estimates in Eastern North and South America, South Africa, Southern and Eastern Asia, Eastern Australia as well as Southern Europe and complementary performances 15 between the reanalysis product and SM2RAIN, with the first performing reasonably well in the northern hemisphere and the second providing very good performance in the southern hemisphere. The methodology presented in this study can be used to identify the best rainfall product for hydrologic models with sparselygauged areas and provide the basis for an optimal integration among different rainfall products.

Abstract.Satellite-based rainfall estimates over land have great potential for a wide range of applications, but their validation is challenging due to the scarcity of ground-based observations of rainfall in many areas of the planet.Recent studies have suggested the use of triple collocation (TC) to characterize uncertainties associated with rainfall estimates by using three collocated rainfall products.However, TC requires the simultaneous availability of three products with mutually uncorrelated errors, a requirement which is difficult to satisfy with current global precipitation data sets.
In this study, a recently developed method for rainfall estimation from soil moisture observations, SM2RAIN, is demonstrated to facilitate the accurate application of TC within triplets containing two state-of-the-art satellite rainfall estimates and a reanalysis product.The validity of different TC assumptions are indirectly tested via a high-quality ground rainfall product over the contiguous United States (CONUS), showing that SM2RAIN can provide a truly independent source of rainfall accumulation information which uniquely satisfies the assumptions underlying TC.On this basis, TC is applied with SM2RAIN on a global scale in an optimal configuration to calculate, for the first time, reliable global correlations (vs.an unknown truth) of the aforementioned products without using a ground benchmark data set.
The analysis is carried out during the period 2007-2012 using daily rainfall accumulation products obtained at 1 • ×1 • spatial resolution.Results convey the relatively high performance of the satellite rainfall estimates in eastern North and South America, southern Africa, southern and eastern Asia, eastern Australia, and southern Europe, as well as complementary performances between the reanalysis product and SM2RAIN, with the first performing reasonably well in the Northern Hemisphere and the second providing very good performance in the Southern Hemisphere.
The methodology presented in this study can be used to identify the best rainfall product for hydrologic models with sparsely gauged areas and provide the basis for an optimal integration among different rainfall products.

Introduction
Thanks to the combined use of microwave and infrared sensors, the quality of available satellite rainfall estimates over land has significantly increased in the few last decades.This strategy -also known as multi-sensor approach -has produced a number of different satellite rainfall products that either map infrared (IR) radiances to more direct passive microwave (PMW) retrievals (generally termed "blended" algorithms) or morph PMW rainfall using IR measurements (generally termed "morphing" algorithms).The new Global Precipitation Measurement Mission (GPM; Hou et al., 2014) has successfully expanded the concept of multi-sensor integration.Through the Integrated Multi-satellitE Retrievals for GPM (IMERG) algorithm, rainfall estimates from the various precipitation-relevant satellite PMW and IR missions are intercalibrated, merged and interpolated with the GPM Combined Core Instrument product to produce rainfall accumulation estimates with an unprecedented accuracy.Despite these technical advancements, the precipitation community still struggles to show a clear picture of the actual increased accuracy of satellite rainfall estimates in many areas of the world because validation studies rely upon the availability Published by Copernicus Publications on behalf of the European Geosciences Union.
of high-quality (and sufficiently dense) ground-based rainfall instrumentation (e.g.rain gauge and radars).
Many studies (e.g.Ebert et al., 2007;Sapiano and Arkin, 2009;Tian et al., 2007;Stampoulis and Anagnostou, 2012) have investigated error associated with remotely sensed precipitation products by comparing their estimates with those collected by ground-based observations assuming they represent the zero-error rainfall.However, the physical characteristics of precipitation, particularly at finer spatial and temporal resolutions, necessitate frequent, systematic and sufficiently dense validation measurements -requirements that are often not met within data-scarce regions of Africa, Asia and South America.Indeed, despite their relative accuracy, the distribution of available gauges significantly varies around the world.Much of the land surface (representing 25-30 % of the Earth's surface) have measurement networks, although those networks with good gauge densities are limited (Kidd et al., 2017).
The current networks of surface observations are therefore often insufficient for the quantitative assessment of the error associated with satellite rainfall estimates.Moreover, despite the relatively higher accuracy of rainfall estimates that can be obtained by rain gauges, they are not error-free (Peterson et al., 1998;Villarini et al., 2008).Therefore, evaluating the performance of different satellite rainfall products with ground-based observations is challenging due to the scarcity of such observations and of the inherent error contained in their estimates.
Based on the work of Adler et al. (2009), Tian and Peters-Lidard (2010) estimated the uncertainties of satellite rainfall estimates by using the measurement spread of coincidental and collocated estimates from an ensemble of six different satellite-based data sets, thus providing a globally consistent methodology that does not require ground-based validation data.The analysis yielded a lower bound estimate of the uncertainties, and a consistent global view of the error characteristics and their regional and seasonal variations.However, the authors showed that the analysis is able to provide only a relative estimation of the measurement uncertainties because these data sets are not entirely independent measurements.
An alternative approach for assessing the quality of satellite rainfall products was proposed by Roebeling et al. (2012) and Alemohammad et al. (2015) based on the triple collocation (TC) method (Stoffelen, 1998).The first applications of TC concerned geophysical variables such as ocean wind speed and wave height (Stoffelen, 1998).More recently, it has been used extensively to estimate errors in soil moisture (SM) products (Crow and Van Den Berg, 2010;Miralles et al., 2010;Dorigo et al., 2010;Draper et al., 2013;Su et al., 2014;Gruber et al., 2016).Given three estimates of the same variable, the main assumptions of the method are the (i) stationarity of the statistics, (ii) linearity between the three estimates (vs. the same target) across all timescales and (iii) existence of uncorrelated error between the three estimates.
In the work of Roebeling et al. (2012), the authors determined the spatial and temporal error characteristics of three precipitation data sets over Europe (a visible/near-infrared data set, a weather radar data set and gridded rain gauge products) showing that it can provide realistic error estimates.The authors ensured a Gaussian distribution of the error by averaging the data set over a sufficiently long period (10 days) and re-gridding to a sufficiently low spatial resolution (0.25 × 0.25 • ).Alemohammad et al. (2015) applied TC to 14-day cumulated rainfall estimates derived from satellite, gauges, radars and models in order to retrieve the error and the correlation of each data set in the United States.They also proposed the use of a logarithmic (i.e.multiplicative) error model which almost certainly provides a more realistic description of rainfall accumulation errors at fine space/timescales.In addition, they calculated the theoretical correlation of each product with the unknown truth by using the extended TC (ETC) (McColl et al., 2014) by analysing the covariance matrix of the three data sets.
TC can theoretically provide error and correlations of three products (a triplet) without use of ground-based observations -provided that each of the three products is afflicted by mutually independent errors.However, given that state-of-theart satellite rainfall products use a highly overlapping set of common sensors for the retrieval of rainfall (see Sect. 2.1, for further details), there is an inherent difficulty in obtaining triplets with mutually independent errors.Therefore, additional -highly independent -sources of rainfall accumulation estimates are needed.
Recently, Brocca et al. (2014) developed a method for estimating rainfall accumulation amounts directly from satellite SM observations based on the principle that the soil can be treated as a "natural rain gauge".In contrast with classical satellite rainfall products, this new bottom-up approach attempts to measure rainfall by calculating the difference between two successive SM measurements derived from a satellite SM product.In this respect, SM2RAIN offers a unique opportunity for applying the TC analysis because, being wholly independent of any other rainfall estimate, it can be used in place of a ground-based product.This opportunity has not yet been explored and could provide an appropriate basis for applying TC on a global scale without requiring the availability of ground-based rainfall accumulation data.
In this study, TC is applied to the rainfall accumulation estimates derived from (1) ERA-Interim (Dee et al., 2011), (2) SM2RAIN (Brocca et al., 2014) via inversion of Advanced SCATterometer (ASCAT; Wagner et al., 1999, SM data, (3) the NOAA Climate Prediction Center morphing (CMORPH, raw version) (Joyce et al., 2004) and (4) the TRMM Multi-satellite Precipitation Analysis (TMPA) 3B42RT (Huffman et al., 2007) product over the CONUS (note that 3B42RT and CMORPH do not include gauge information in their retrieval algorithms).Thanks to the ability of TC to provide the correlation against the "un- known" truth (ETC; McColl et al., 2014), the assessment of the products will be carried out in terms of correlation against "true" rainfall values.As a result, the word "performance" and "TC results" will be hereinafter referred to this correlation (additional clarification is provided in Sect.2.3).
An assessment of the reliability of subsequent TC results is conducted by direct comparison with the analogous evaluation results obtained via direct comparisons with the Climate Prediction Center (CPC) Unified Gauge-Based Analysis of Global Daily Precipitation (hereafter as CPC) product.These assessments will be carried out with and without the use of SM2RAIN rainfall accumulation products to isolate the value of SM-based rainfall estimates for the evaluation of global rainfall products.Note that, given the number of common sensors shared by CMORPH and TMPA 3B42RT, the application of TC to the triplet containing both products will serve to demonstrate the difficulties of using both of them in the same triplet within the TC analysis and evaluate the potential benefits of utilizing SM2RAIN-based accumulation products in a TC analysis.
The paper is organized as follows.Section 2 contains data and methods; in particular, the products used for the analysis are described in Sect.2.1, the theoretical background for TC is in Sect.2.2 and 2.2.1, the description of the performance scores used for the evaluation of the results is discussed in Sect.2.3, and Sects.2.4 and 2.5 describe SM2RAIN and the experiment setup.Results are presented and discussed in Sect. 3 and final remarks are presented in Sect. 4.
2 Data and methods 2.1 Rainfall and soil moisture products 2.1.1CPC The 0.5 • × 0.5 • gauge-based CPC product is used to evaluate the satellite-based rainfall estimates over the CONUS and verify evaluations provided by TC.Given the high rain gauge density associated with this product across CONUS (Fig. 1), along with the common practice of using ground-based rainfall data to validate satellite-based rainfall retrievals (Huffman et al., 1997), CPC is expected to provide a reasonable proxy of true rainfall accumulation over the CONUS.Nevertheless, this assumption will be verified below.Figure 1 illustrates that the spatial density of CPC gauge coverage (calculated as average number of rain gauge observations per day) during 2007-2012 is high in the Eastern CONUS and along the western coast of CONUS but relatively lower in many parts of the central CONUS.CPC rainfall observations are aggregated to a 1 • × 1 • spatial resolution by simple averaging.

ASCAT data
ASCAT (Bartalis et al., 2007) is a real-aperture radar instrument onboard the MetOp satellites which measures radar backscatter at C band (5.255 GHz) and VV polarization.It has a spatial resolution of 25 km (resampled at 12.5 km) and is available since 2007.The surface SM product (equivalent to a depth of 2-3 cm of the soil) is calculated from the backscatter measurements through the time-series-based change detection approach described in Wagner et al. (1999).The SM is measured in relative terms (degree of saturation) with respect to historical minimum and maximum val-ues.Here, we used the ASCAT data set produced using the Soil Water Retrieval Package (WARP) (Naeimi et al., 2009) (v5.5) from Vienna University of Technology (TU-Wien), and distributed as SM product H109 by the EUMETSAT Satellite Application Facility on Support to Operational Hydrology and Water Management (H-SAF).Prior to the application of SM2RAIN to ASCAT data, the points characterized by a surface state flag (SSF) of the ASCAT product that indicates frozen (SSF = 2), temporary melting/water on the surface (SSF = 3) or permanent ice (SSF = 4) were excluded from the analysis.For further details about the application of SM2RAIN to ASCAT, the reader is referred to Sect.2.4.

TMPA 3B42RT
TMPA 3B42RT, version 7 (http://trmm.gsfc.nasa.gov),combines rainfall estimates from various satellite sensors.The multisatellite platform uses the TRMM Microwave Imager (TMI) on board of TRMM satellite, the Special Sensor Microwave Imager (SSM/I) on board the Defense Meteorological Satellite Program (DMSP) satellites, the Advanced Microwave Scanning Radiometer for Earth observing system (AMSRE) on board the National Aeronautic and Space Administration (NASA) AQUA satellite, the Advanced Microwave Sounding Unit-B (AMSU-B) on board the National Oceanic and Atmospheric Administration (NOAA) satellite series and GEO IR rainfall estimates.The TMPA 3B42RT estimates are produced in three steps: (1) the PMW estimates are calibrated with sensor-specific versions of the Goddard Profiling Algorithm (GPROF; Kummerow et al., 1996) and combined, (2) IR rainfall estimates are created using the PMW estimates for calibration, and (3) PMW and IR estimates are then combined.The 3B42RT product is provided by NASA with a temporal resolution of 3 h and a spatial resolution of 0.25 • .The cumulated daily rainfall, available from March 2000, is obtained by simply summing the eight 3 h time windows for each day.The global coverage of the product is +50 • /−50 • latitude.To match the CPC spatial resolution, collocated TMPA 3B42RT estimates are aggregated to 1 • spatial resolution by simple averaging.

CMORPH
CMORPH uses a Lagrangian approach to construct highresolution global precipitation maps from the satellite IR and PMW observations (Joyce et al., 2004).This technique uses precipitation estimates that have been derived from PMW observations exclusively, and whose features are transported via spatial propagation information which is obtained entirely from IR data.It incorporates precipitation estimates derived from the PMW on board of the DMSP 13, 14 and 15 (SSM/I) and NOAA-15, 16, 17 and 18 (AMSU-B) satellites as well as AMSR-E and TMI aboard NASA's Aqua and TRMM spacecraft, respectively.Precipitation estimates are obtained as follows.First, advection vectors of cloud and precipitation sys-tems are computed using consecutive geostationary IR images in 30 min intervals.These advection vectors are then applied to propagate the precipitating cloud systems observed by the PMW measurements along the advection vectors in both forward and backward directions toward the target time of the precipitation analysis.The final precipitation analysis value at a grid box is defined as the weighted mean of the estimates from the forward and backward propagations with the weights inversely proportional to the time separation between the target analysis time and the PMW observations.In this study, we used the daily (derived from 3-hourly aggregation) estimates of precipitation at 0.25 • latitude/longitude resolution, distributed over the globe (+60 • / − 60 • of latitude) by the NOAA Center for Weather and Climate Prediction.Note that the CMORPH version used in this study is the raw version which does not use gauge information.To match the CPC spatial resolution, collocated CMORPH estimates are aggregated to 1 • spatial resolution.

ERA-Interim
The European Centre for Medium-Range Weather Forecasts (ECMWF) produces the ERA-Interim atmospheric, ocean and land reanalysis.ERA-Interim provides mediumrange global forecasts for environmental variables including soil temperature, evaporation, SM and rainfall.Products are available from 1 January 1979 to now.The forecast model incorporated in the ERA-Interim reanalysis is based on the ECMWF Integrated Forecast System (Cy31r2) forecast model (Dee et al., 2011), with a spectral horizontal resolution of about 80 km and 60 vertical levels.The ERA-Interim forecast precipitation is the sum of two components which are computed separately in the model: large-scale stratiform precipitation (Tompkins et al., 2007) and smaller-scale precipitation which originates solely from the parameterization of convection (Bechtold et al., 2004).Further information can be found at the ECMWF website (http://www.ecmwf.int).In this study, daily precipitation values are obtained from the temporal aggregation of ERA-Interim 12-hourly precipitation accumulation estimates (http://apps.ecmwf.int/datasets/)while co-location with CPC observations is determined by the nearest-neighbour method.Note that we considered only liquid precipitation in the analysis.Solid precipitation were excluded by masking out periods experiencing snowfall (using the "large-scale snowfall" variable of ERA-Interim).

TC analysis: general concepts
Here we apply the method of McColl et al. (2014) to robustly estimate the correlation of a particular rainfall measurement system with the truth.Suppose we have three systems X i , measuring the true variable t and afflicted by additive random error where X i (i = 1, 2, 3) are collocated measurement systems linearly related to the true underlying value t with additive random errors ε i , and α i and β i are the ordinary least squares intercepts and slopes.Assuming that the errors from each system have zero mean (E(ε i ) = 0), are mutually uncorrelated (Cov(ε i , ε j ) = 0, with i = j ) and orthogonal with respect to t (Cov(ε i , t) = 0), the covariance between X i is (2) By defining the new variable θ i = β i σ t , known as the sensitivity of the variable X i , Eq. ( 2) becomes which is a system of six equations in six unknowns from which we derive (McColl et al., 2014): From Eq. ( 2), using the definition of the correlation and covariance we can write where ρ t,X i is the correlation coefficient between t and X i .Since √ Q ii is already estimated from the data, and we can solve for θ i using Eq. ( 4), ρ t,X i (McColl et al., 2014): which provides the temporal correlation of each product with the unknown truth.Hereinafter, when talking about ρ t,X i or its squared value ρ t,X i 2 , we will refer to the correlation of the product X i with the unknown truth.ρ will be also used to refer to this variable but in more general terms.

Rainfall error model
It is generally accepted that a multiplicative model is more appropriate for describing errors in rainfall estimates (Hossain and Anagnostou, 2006;Tian et al., 2013).Based on this assumption, Alemohammad et al. (2015) proposed the application of TC to the rainfall by introducing a multiplicative error model: in which R is the rainfall intensity estimate from product i, T is the true rainfall intensity and a i is a multiplicative error.By transforming Eq. ( 7) in the log space we obtain an equation equivalent to Eq. ( 1), where X = log(R), t = log(T ) and α i = log(a i ).In this way, the development of TC expressed in Eqs. ( 2)-( 6), can be applied to the -potentially more relevant -case of multiplicative rainfall accumulation errors.
The resulting log RMSE can then be back-transformed into linear rainfall accumulation errors by exploiting a Taylor series expansion of the logarithm operator (see Alemohammad et al., 2015 for further details).
The main difficulty of this approach is its inability to consider the presence of zero values in the rainfall time series.To reduce their presence, Alemohammad et al. (2015) considered fortnightly rainfall estimates and simply removed remaining zeros in this time series.This has two implications.First, the fortnightly rainfall error may differ from the error of a shorter accumulation period (e.g.daily) because the daily signal has a substantially different character with respect to the fortnightly one due to the higher presence of zero values.Second, the method may not be appropriate in very dry climates, where even fortnightly values of rainfall can contain a significant number of zero accumulation values.
For the reasons mentioned above, we apply TC in two different ways: (i) to the rainfall time series using an additive error model and (ii) to log-transformed rainfall estimates using the multiplicative error model (by first removing rainfall accumulation values equal to zero).Comparisons of these two different approaches will provide insights regarding the appropriateness of various error model assumptions for rainfall estimates at a daily accumulation timescale.

Performance scores
In Sect.2.2, it has been demonstrated that TC can provide both error variances and correlation against an unknown truth for three collocated estimates of the same variable.When dealing with error variances, the products have to be rescaled to a common reference data space.However, such a rescaling imposes spatial patterns within the derived error metric which reflects the climatology of the chosen reference (Gruber et al., 2016).To this end, McColl et al. (2014) noted that correlation coefficients can provide important new information about the performance of the measurement systems with respect to the absolute error variances obtained via Eq.( 4) with the added advantage of not requiring the arbitrary definition of one system as a scaling reference.Indeed, ρ 2 represents the unbiased signal to noise ratio, scaled between 0 and 1, which provides a measure of the relative similarity between two signals, independently from their phase differences.This was also underlined by Gruber et al. (2016), who showed that ρ 2 is the complement of the f RMSE = σ 2 ε /σ 2 introduced by Draper et 2016) also pointed out that the absolute error variance provides only limited information about the true data set quality because a certain amount of noise can be either acceptable or unacceptable depending on the strength of the underlying signal (i.e. its variance).Therefore, we focus here only on ρ 2 or, analogously, on its root square ρ, i.e.Eq. ( 6).
As discussed above, a key goal is determining the relative accuracy of TC correlations obtained with and without the use of SM2RAIN-based rainfall accumulation products.Assuming that R X i (or simply R) is the Pearson correlation coefficient between the product X i and CPC, the main question is, how accurately can (TC-based) ρ t,X i , which utilize no ground observations, reproduce spatial patterns in (CPCbased) R X i ?We should expect a bias between the two (i.e.R X i ≤ ρ t,X i ) because -while relatively accurate -CPC estimates still contain representativeness errors (due to limitations in rain gauge density) and measurement errors due to wind and instrument inaccuracies.In contrast, Eq. ( 6) provides the correlations with an error-free truth.Nevertheless, if the TC hypothesis holds, the relative rank between the products predicted by TC should accurately reflect that obtained via direct comparisons with ground observations.
In order to evaluate the similarity between correlationbased maps of ρ t,X i and R X i a spatial correlation index SC was calculated as the spatial Pearson correlation coefficient between maps of R X i and ρ t,X i .The closer SC is to 1, the more spatially similar the two maps are and the more satisfied the assumptions of TC.In addition, based on the values of ρ t,X i and R X i , we are able to sort the products according to their relative performance for each pixel in the analysis.That is, considering three products X i , the rank value to be assigned to each product i will be 1 if ρ t,X i is the highest, 3 if it is the lowest and 2 if it is neither.If the same is done with R X i , the consistency of the resulting rank maps for each product provide feedback regarding the validity of assumptions underlying the application of TC.For the quantification of the discrete maps, we also calculate the number of pixels providing equivalent relative sorting of the products based on R X i vs. ρ t,X i .

SM2RAIN and its application to ASCAT data
SM2RAIN (Brocca et al., 2014) is a method of rainfall estimation which uses two successive SM retrievals to estimate the rainfall accumulated between the two retrievals.It exploits the soil water balance equation with appropriate simplifications valid only for liquid precipitation (Tian et al., 2014): where Z * is the soil water capacity (soil depth times soil porosity), s(τ ) is the relative saturation of the soil or relative SM; τ is the time; and p(τ ), r(τ ), e(τ ) and g(τ ) are the rainfall, surface runoff, evapotranspiration and drainage rates, respectively.Under unsaturated soil conditions, and assuming negligible evapotranspiration rate during rainfall and Dunnian runoff, solving Eq. ( 8) for rainfall yields Note that in Eq. ( 9) the drainage rate has been expressed with a power law function of the type g = as b (Famiglietti and Wood, 1994), where a and b are two model parameters.When the soil is fully saturated, no rainfall can be estimated from SM; however, at the scale of satellite pixel, the soil is rarely saturated (except in some exceptional places like tropical forests).
The SM2RAIN parameters a, b and Z * can be estimated either by using a rainfall data set as a reference or assigned based on soil properties.In this study, in order to maximize the independence of SM2RAIN predictions, SM2RAIN parameters were not calibrated and were instead assumed constant in space as in Koster et al. (2016).In particular, the drainage rate (the second term in Eq. 9) was assumed linearly related with SM (b = 1) and a = 3.7 mm day −1 and Z * = 62 mm based on results obtained in previous studies (Brocca et al., 2014).Note that Z * does not have a significant influence on the results because we are using a correlationbased metric.In addition, it should be noted that, while maximizing the independence of SM2RAIN rainfall accumulation estimates, the use of this default calibration approach results in sub-optimal SM2RAIN performance.Superior SM2RAIN can easily be obtainable via calibration against existing satellite rainfall accumulation products.
Daily rainfall estimates from SM2RAIN were obtained by using linearly interpolated (at 00:00 UTC) ASCAT data with a maximum allowable data gap of 5 days.The obtained 0.25 • × 0.25 • rainfall estimates were then aggregated to the 1 • × 1 • spatial resolution through simple averaging of the collocated pixels with CPC.Finally, 1 • × 1 • grid cells were masked if more than 50% of their sub-grid areas consisted of ASCAT observations characterized by a SSF equal to 2, 3 or 4. Hereinafter, the thus obtained product is referred to as SM2RAIN for simplicity.

Experimental setup
A TC analysis was carried out using five different daily rainfall accumulation triplets: (1) ERA-Interim-SM2RAIN-3B42RT (Triplet A in the following), ( 2  As a result, they will only be used for initial considerations about TC robustness and to evaluate the relative quality of the CPC product.Triplets A, B and C will be then used in the remainder of the paper to demonstrate the potential utility of SM2RAIN. The analysis was carried out first across CONUS and then on a global scale using only ERA-Interim, 3B42RT, CMORPH and SM2RAIN during the period 2007-2012.Over CONUS it was confirmed that the available sample size was sufficient (about 500) over the entire study domain (Gruber et al., 2016), while for the global analysis, grid cells with inadequate sample size were individually masked out of the analysis.The extended TC analysis was applied for both additive and multiplicative error model assumptions.For the latter, we first removed days with zero rainfall constituting about 80 % of daily values and leaving approximately 450 non-zero daily values in the 2007-2012 time series and then applied a log transformation to the remaining daily rainfall estimates.This reduction in sample size may affect TC results by making the analysis with log-precipitation estimates statistically less robust.

Results and discussion
In this section, we present the results obtained from the application of TC (for both additive and multiplicative error models) by following the subsequent methodological steps: (1) calculating TC-based correlations (ρ t,X i ) for Triplets A, B, C, D and E over the CONUS and providing an assessment of the CPC product (Sect.3.1), ( 2) understanding the adequacy of TC results based on the spatial similarity between (TC-based) ρ t,X i and (CPC-based) R X i (along with their relative rank) over the CONUS in order to identify the optimal configuration for applying TC and (3) applying the optimalconfigured TC on a global scale to calculate ρ t,X i globally for the selected rainfall products (Sect.3.3).

Assessment of the CPC product
As described above, our first goal is to assess the relative performance of the CPC product.Table 1 shows mean ρ t,X i (obtained via the spatial average of 0.25 • CONUS grid cells).Regardless of the triplet or error model applied, the TC analysis summarized in Table 1 indicates that CPC is the most accurate product (mean TC-based correlation close to 0.9 for the additive error model and close to 0.8 for the multiplicative one), which strengthens our assumption that within CONUS, CPC can be used as a benchmark to evaluate the optimal TC configuration for rainfall product evaluation.In addition, its correlation spatial pattern (not shown) provides very good performance almost everywhere except in the central US, where the spatial density of available rain gauges shown in Fig. 1 is relatively lower.Based on this, in the next section we will consider the CPC product as an appropriate benchmark for the selection of an optimal TC configuration which does not utilize a gauge-based precipitation product (and is therefore potentially applicable at a global scale).S1 of the Supplement).A comparison of these results with TC-based correlations (i.e.ρ t,X i ) shows that ρ t,X i are biased high with respect to R X i .This is expected given that CPC is not free of errors, whereas TC should theoretically provide the correlation with respect to an error-free truth.

Optimal TC configuration
The spatial agreement between ρ t,X i and R X i is examined in Table 2 and Fig. 2. In particular, Fig. 2 shows that Triplets A (panels e, f, g) and B (panel h, i, l) accurately reproduce CPC-based results plotted in Fig. 2a-d, although they are characterized by higher values as underlined above (see Sect. 2.3 for further details).This similarity is higher in the eastern and western US and lower in the central US especially for ERA-Interim and SM2RAIN.This lower agreement in the central US is likely due to the lower rain gauge density of CPC here (see Fig. 1), which degrades the quality of the CPC product as benchmark.However, in contrast, TC results based on Triplet C predicts a substantial different behaviour with correlation patterns which differ substantially relative to CPC-based benchmark results in Fig. 2a-d.This suggests those triplets not containing SM2RAIN (or CPC) provide unreliable results.In particular, the simultaneous use of two satellite-based rainfall products in Triplet C leads to an overly optimistic assessment of their performance.This is likely due to cross-correlated errors in 3B42RT and CMORPH rainfall accumulation products which cause TC to misinterpret their mutual consistency as an indication of high accuracy (Yilmaz and Crow, 2014).
It is often important to understand which is the best rainfall product among those available in a specific location.As described in Sect.2.3, we ranked the products based upon how well they compare relative to each other using both R and ρ. Figure 3 shows the distribution -three products at time (panels d-f, k-m, r-t) -of the relative rank based on comparisons with the (CPC-based) R X i of each triplet, while panels a-c, g-i, and n-p of the same figure provide similar information except that the relative rank is based on TC (i.e.ρ).The latter shows a very similar pattern with respect to CPC-based rank for Triplets A and B; however, Triplet C yields again a distinct pattern with ERA-Interim being the worst product and 3B42RT and CMORPH providing complementary performances.As in the comparisons discussed in Fig. 2, this implies that triplets containing SM2RAIN (i.e.Triplets A and B) provide more robust evaluation information than triplets utilizing 3B42RT and CMORPH together.
Table 2. Spatial correlation between ρ t,X i and R X i and percentage of rank correctly identified obtained for various triplets considered in the study.The "Triplet" column refers to the naming convention applied in the text.The same analysis carried out with the assumption of multiplicative error model (see Fig. S2 in the Supplement) shows similar findings but larger differences between the spatial distribution of the rank obtained with CPC and the one with TC, especially for Triplet B. To quantity this agreement, we have calculated the percentage of pixels which are ranked the same in both TC-band CPC results (% of rank identified in Table 2).The table confirms the patterns observed in Figs. 3 and S2 of the Supplement with Triplets A and B yielding the highest percentage of pixels with a common rank -ranging from 65 to 81 % for the additive error model, and 48 to 71 % for the multiplicative error model.As discussed above, inferior results are obtained in both cases for Triplet C (percentage of correct ranking between 5 and 60 %).

Spatial correlation
A quantification of the agreement between the spatial variations of the correlations both for additive and multiplicative error models was also derived by the use of the spatial correlation SC in Table 2.The table shows that for Triplets A and B, when TC is used with the assumption of additive error model, SC is relatively high with values ranging from 0.61 to 0.84 while for Triplet C provides substantially lower SC for 3B42RT and CMORPH.A slightly different situation can be observed for the multiplicative error model.Here, SC values are generally lower than those obtained by TC (based on an assumed additive error model), likely due to the neces-sity of removing zero-rain days, which modifies the original precipitation time series and reduces the sample size of TC calculations.In particular, ERA-Interim provides the worst score.This is not clearly evident in the spatial distribution of R and ρ (see Fig. S1 in the Supplement for further details) which show some similarities at least for Triplets A and B.
In summary, the application of TC to the different triplets shows the following: 1. CPC product performs relatively well over the CONUS with a TC-derived correlation vs. truth of 0.9 (assuming an additive error model) demonstrating its relatively high quality here and supporting its application as a benchmark data set within CONUS.
2. TC-based correlations are similar among the triplets except for Triplet C (i.e.ERA-Interim, 3B42RT and CMORPH).This is likely due to the existence of nonnegligible cross-correlated errors between 3B42RT and CMORPH.
3. A comparison between ρ t,X i and R X i shows that ρ t,X i are biased high with respect to R X i .In addition, the pattern of ρ t,X i and R X i is similar for all triplets except for Triplet C, which shows inconsistencies relative to the CPC benchmark for both the additive and multiplicative error model assumptions.The agreement, measured in terms of spatial correlation (Table 2), provides higher scores for an additive error model assumption relative to a multiplicative one.This is likely due to a reduction of sampling power associated with the removal of daily rainfall accumulations equal to zero, which are not acceptable in the log-transformation process.Therefore, it is possible that the observed differences in TC performance may shrink for larger sample sizes.
4. Retrieved spatial patterns of ρ t,X i for the triplets containing SM2RAIN (Fig. 2) show a higher degree of similarity with (CPC-based) R X i when we assume an additive (vs.multiplicative) error model for daily rainfall accumulations.
On this basis, we can conclude that (i) TC results are unreliable unless SM2RAIN is used in the triplets and (ii) the assumption of multiplicative error model in the application of TC at a daily timescale does not appear necessary.

Application of optimized TC approach
Based on the superior performance for Triplets A and B under the assumption of additive error model, we will apply this particular TC configuration approach to assess the performance (in terms of ρ) of daily rainfall accumulation estimates derived from 3B42RT, CMORPH, SM2RAIN and ERA-Interim first over the CONUS (Sect.3.3.1 and Fig. 2) and then on a global scale (Sect.3.3.2and Fig. 4).

CONUS
Over CONUS, ERA-Interim shows relatively better performance in western and eastern US with respect to the central US, where SM2RAIN is slightly superior.3B42RT and CMORPH perform reasonably well in eastern and along the west coast of the US while demonstrating worse performance in the central US.In contrast, SM2RAIN performs worse in northern US probably due to the lower accuracy of the AS-CAT data at high latitudes.The spatial pattern of these corre-Hydrol.Earth Syst.Sci., 21, 4347-4361, 2017 www.hydrol-earth-syst-sci.net/21/4347/2017/ lations is similar to those found in Gottschalck et al. (2005) and Ebert et al. (2007), who showed a generally lower level of correlation of satellite-only rainfall products in the central US due to the effects of snow cover and frozen surface conditions.This corroborates results presented in Alemohammad et al. (2015) using TC, who found a similar pattern of correlation of 3B42RT in a box covering a large part of southeastern US (however, the authors here assumed a multiplicative error model and fortnightly rainfall accumulation estimates).

Global
On a global scale, 3B42RT (Fig. 4a) shows relatively good performances in eastern and central South America, southern and central Africa, southern and eastern Asia, eastern Australia, and southern Europe, while it performs relatively worse in central Asia, western Australia and in the southern part of the Sahel.The performance of CMORPH (Fig. 4b) is similar to 3B42RT with slightly lower correlations in Australia, in the Horn of Africa and in southern Asia.SM2RAIN (Fig. 4c) performs reasonably well in Africa (except in the tropical forest), Australia, Mexico, eastern South America and India and generally in the Southern Hemisphere, while worse results are obtained in the Northern Hemisphere, in the tropical forests and at high latitudes.In contrast, ERA-Interim (Fig. 4d) provides much better results in the Northern Hemisphere with respect to the south of the planet (e.g.South America and southern Africa) and performs relatively poorly in central and northern Africa as well as in the tropical forests.
The results for 3B42RT and SM2RAIN are similar to those obtained in Brocca et al. (2014) who calculated the Pearson correlation coefficient with the Global Precipitation Climatology Center (GPCC; Schamm et al., 2014) data set.Similar findings are also presented in Yong et al. (2015) (Table 2 of their study), who compared different versions of the 3B42RT product against global CPC observations in the US, East Asia, Europe and Australia.In their study, the best results were obtained in Australia and in East Asia (Europe showed slightly lower performance) while lower performances were obtained in the US as in our analysis.Further comparisons can be also considered with the recent work of Beck et al. (2017), who, in attempting to create a high-quality rainfall product specifically tailored for hydrological modelling, compared different satellite and modelled products globally with the Global Historical Climatology Network-Daily (GHCN-D; Menne et al., 2012) database.Their results (in terms of spatial pattern of correlation) are consistent with those obtained in our study over the US, East Asia and the Middle East for CMORPH and 3B42RT, while less agreement is observed in Australia.For ERA-Interim, the results agree with our study in the US, Europe and generally are better in the Northern Hemisphere, whereas they show some differences with SM2RAIN results in Australia, Africa and in South America, although in these areas the low number of available rain gauges cannot provide a clear picture of the real performance of the analysed products.Substantial dif-C.Massari et al.: Performance of global rainfall estimates ferences between our study and the studies of Beck et al. (2017) and Yong et al. (2015) can likely be attributed to the quality of the benchmark data set used for the evaluations.This is the main limitation of rainfall validation studies relying upon ground-based observations for assessment.With our proposed TC-based approach, this issue can be overcome because ground observations are no longer required.
An interesting feature of the global evaluation of the products (Fig. 4a-d), but also over the CONUS between 3B42RT (or CMORPH) and SM2RAIN (Fig. 2 triplets A and B), is the complementary nature of the products.Especially for Fig. 4c  and d, it can be seen that ERA-Interim performs very well in the Northern Hemisphere and worse in the Southern Hemisphere, whereas SM2RAIN is relatively good in the south and worse in the Northern Hemisphere.Similar findings can be seen between the two state-of-the-art satellite rainfall products (i.e.3B42RT and CMORPH) and SM2RAIN over the CONUS with the first performing better in eastern US and the second in the central and western US.This opens up new possibilities for the integration of multiple products to obtain a higher-quality merged rainfall estimate -as outlined in Ciabatta et al. (2015) and in Beck et al. (2017).

Summary and conclusions
The assessment of the performance of satellite rainfall products on a global scale is challenging due to significant limitations in the spatial coverage of high-quality, ground-based rain gauge observations.Provided that its underlying assumption are respected (see Sect. 2.2), TC provides an alternative approach for evaluating global rainfall products without reliance on ground-based observations.Here, we describe how a new method for rainfall estimation based on SM observations (i.e.SM2RAIN) provides a rainfall product that is uniquely suited to satisfy the error independent assumptions at the heart of the TC approach.
The extended version of TC introduced by McColl et al. ( 2014) was applied to provide the correlation with the (unknown error-free) truth for each of the products applied within a particular triplet.To assess the robustness of correlated-based results obtained with TC, we used an area characterized by a high-quality rainfall product (CPC data set over the CONUS; see Fig. 1) with the assumption that it represents a good proxy of the true rainfall field.Therefore, if TC assumptions hold, Pearson correlation coefficients computed against CPC should match those of TC -at least in terms of their relative values.Since we have two different error model options (i.e.additive and multiplicative) for the application of TC to rainfall data, we explored both.
Results demonstrate that daily rainfall accumulations provided by the CPC product are indeed relatively high quality compared to competing products (Table 1), thus supporting the assumption that it provides an acceptable proxy of the true rainfall field.Once it is established as a credible bench-mark, CPC is used to evaluate (1) what type of triplets can be considered for a robust application of TC, and (2) which model error assumption can be considered more appropriate.Triplets containing SM2RAIN and assuming an additive error model (Table 2) appear to provide the most robust TC results.Based on this, an optimal TC configuration was applied (for the first time) to globally evaluate daily rainfall accumulation derived from the 3B42RT and CMORPH, ERA-Interim and SM2RAIN products (Fig. 4a-d) without the use of any ground-based data.Results demonstrate the relatively high performance of daily rainfall accumulations derived from the satellite rainfall products (i.e.3B42RT and CMORPH) in eastern North and South America, southern Africa, southern and eastern Asia, eastern Australia, and southern Europe, as well as complementary performances between ERA-Interim and SM2RAIN, with the first performing reasonably well in the Northern Hemisphere and the second providing very good performance in the Southern Hemisphere.
Based on the results obtained, we can therefore conclude the following: 1. Despite the abundance of satellite rainfall estimates, their relative dependency impedes their use within the same triplet for the TC analysis, thus alternative independent products must be used for obtaining meaningful TC results.In particular, the use of two remotely sensed rainfall products in a single triplet entails significant risk of a biased TC analysis.
2. Wholly independent daily rainfall accumulation products obtained from SM2 RAIN are uniquely valuable for obtaining robust global evaluation statistics in absence of ground-based gauge observations.This is important not only for simple validation purposes but also for hydrological studies and applications within developing countries, where ground-based rain gauge networks are often limited or absent and an alternative product has to be chosen.
3. At the time/space scales examined here, the assumption of additive error model provides reasonable and robust results and no advantage is observed for a log transformation of the time series (which allows for the consideration of a multiplicative error model).However, this result is likely to be scale dependent and implies at the timescale resolution of this analysis is sufficiently coarse such that averaging produces approximate additive/Gaussian distributions (via the central limit theorem).Therefore, different results may be obtained at finer timescales.
4. Both state-of-the-art satellite rainfall estimates (i.e.3B42RT and CMORPH) and SM-based rainfall estimates (i.e.SM2RAIN) performances are affected by the presence of snow cover and frozen soil conditions -thus these rainfall estimates may be unreliable at high latitudes and in mountainous regions.In these areas, a reanalysis product (i.e.ERA30 Interim) provides higherquality rainfall estimates and should be considered in place of satellite-based estimates.SM-based rainfall estimates also work reasonably well in semi-arid climates (e.g.Sahel, central Australia and Mexico) where the state-of-the-art satellite products report problems due to sub-cloud evaporation of hydrometeors (Ebert et al., 2007).Conversely, in wet climates (e.g.tropical forests) 3B42RT and CMORPH seem to be the only reliable option given that neither SM2RAIN nor ERA-Interim provide reasonable results.
5. Given the existence of complementary performances among the products, TC can potentially be a valuable tool for the characterization of their relative performances so as to be used for data fusion and assimilation experiments for obtaining more accurate rainfall estimates.
The question of whether this analysis is valid for different spatio-temporal scales remains to be addressed and will be addressed in future studies.Also, removing zeros for obtaining log-transformed rainfall may not be ideal for testing the validity of the model error assumptions since it shortens the sample size, thus providing less robust TC results.Other strategies should be considered.

Figure 1 .
Figure 1.CPC gauge coverage during 2007-2012 expressed as average number of working rain gauges per day within each 0.25 • spatial grid cell.

Figure
Figure 2a-d plot CPC-based Pearson correlation coefficients (i.e.R X i ) for ERA-Interim, 3B42RT, CMORPH and SM2RAIN obtained with the assumption of additive error model (for multiplicative error model results the reader is referred to Fig.S1of the Supplement).A comparison of these results with TC-based correlations (i.e.ρ t,X i ) shows that ρ t,X i are biased high with respect to R X i .This is expected given that CPC is not free of errors, whereas TC should theoretically provide the correlation with respect to an error-free truth.The spatial agreement between ρ t,X i and R X i is examined in Table2and Fig.2.In particular, Fig.2shows that Triplets A (panels e, f, g) and B (panel h, i, l) accurately reproduce CPC-based results plotted in Fig.2a-d, although they are characterized by higher values as underlined above (see Sect. 2.3 for further details).This similarity is higher in the eastern and western US and lower in the central US especially for ERA-Interim and SM2RAIN.This lower agreement in the central US is likely due to the lower rain gauge density of CPC here (see Fig.1), which degrades the quality of the CPC product as benchmark.However, in contrast, TC results based on Triplet C predicts a substantial different behaviour with correlation patterns which differ substantially relative to CPC-based benchmark results in Fig.2a-d.This

Figure 3 .
Figure 3. Rank based on CPC-based correlation (CPC-based rank in the figure) and TC-based correlation (TC-based rank in the figure) of the triplets: (i) ERA-Interim-SM2RAIN-3B42RT (Triplet A: a-c for TC-based rank and d-f for CPC-based rank) , (ii) ERA-Interim-SM2RAIN-CMORPH (Triplet B: g-i for TC-based rank and k-m for CPC-based rank) and (iii) ERA-Interim-3B42RT-CMORPH (Triplet C: n-p for TC-based rank and r-t for CPC-based rank) during the period 2007-2012 using an additive error model.

Table 1 .
Mean correlation with CPC (R) and TC-based correlation (ρ) for various triplets assuming additive and multiplicative error models.The "Triplet" column refers to the naming convention applied in the text.