Evaluation of GPM IMERG Early, Late, and Final rainfall estimates using WegenerNet gauge data in southeastern Austria

. The Global Precipitation Measurement (GPM) Integrated Multi-satellite Retrievals for GPM (IMERG) products provide quasi-global (60 ◦ N–60 ◦ S) precipitation estimates, beginning March 2014, from the combined use of passive microwave (PMW) and infrared (IR) satellites com-prising the GPM constellation. The IMERG products are available in the form of near-real-time data, i.e., IMERG Early and Late, and in the form of post-real-time research data, i.e., IMERG Final, after monthly rain gauge analysis is received and taken into account. In this study, IMERG version 3 Early, Late, and Final (IMERG-E,IMERG-L, and IMERG-F) half-hourly rainfall estimates are compared with gauge-based gridded rainfall data from the WegenerNet Feldbach region (WEGN) high-density climate station network in southeastern Austria. The comparison is conducted over two IMERG 0.1 ◦ × 0.1 ◦ grid cells, entirely covered by 40 and 39 WEGN stations each, using data from the extended summer season (April–October) for the ﬁrst two years of the GPM mission. The entire data are divided into two rainfall intensity ranges (low and high) and two seasons (warm and hot), and we evaluate the performance of IMERG, using both statistical and graphical methods. Results show that IMERG-F rainfall estimates are in the best overall agreement with the WEGN data, followed by IMERG-L and IMERG-E estimates, particularly for the hot season. We also illustrate, through rainfall event cases, how insufﬁcient PMW sources and errors in motion vectors can lead to wide discrepancies in the IMERG estimates. Finally, by applying the method of Villarini and Krajewski (2007), we ﬁnd that IMERG-F half-hourly rainfall estimates can be regarded as a 25 min gauge accumulation, with an offset of + 40 min relative to its nominal time.


Introduction
The Global Precipitation Measurement (GPM) mission was launched in February 2014.This international mission is led by the National Aeronautics and Space Administration (NASA) and the Japan Aerospace and Exploration Agency (JAXA), as a successor to the Tropical Rainfall Measuring Mission (TRMM), to continue and improve satellite-based rainfall and snowfall observations on a global scale (Tapiador et al., 2012;Hou et al., 2014;Yong et al., 2015).The GPM mission consists of a core observatory satellite and a constellation of partner satellites to collect information from as many passive microwave (PMW) and infrared (IR) satellite platforms as available.Such a merged PMW-IR approach can mutually enhance the respective merits of individual PMW or IR satellite-based rainfall estimates; that is, IR satellite estimates can be adjusted with the greater accuracy of PMW data, and, conversely, PMW satellite estimates can Published by Copernicus Publications on behalf of the European Geosciences Union.Sungmin O et al.: Evaluation of GPM IMERG Early, Late, and Final rainfall estimates be interpolated along cloud movements obtained by the high sampling rate of IR data (Kidd et al., 2003;Prigent, 2010;Kidd and Huffman, 2011;Kidd and Levizzani, 2011).
Once observation data are received from the PMW and IR platforms, they are combined into half-hourly gridded fields through the Integrated Multi-satellite Retrievals for GPM (IMERG) system (Huffman et al., 2015a, b).The IMERG system mainly comprises the following rainfall retrieval algorithms: the Climate Prediction Center Morphing-Kalman Filter (CMORPH-KF) (Joyce et al., 2004;Joyce and Xie, 2011), the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System (PERSIANN-CCS) (Sorooshian et al., 2000;Hong et al., 2004), and the TRMM Multi-Satellite Precipitation Analysis (TMPA) (Huffman et al., 2007).Processed differently based on user requirements in terms of data latency and accuracy (see Sect. 2.1 for details), the IMERG computes Early, Late, and Final runs (hereafter IMERG-E, IMERG-L, and IMERG-F runs).
Since the first release of IMERG-F data in April 2014, extensive studies have been devoted to the evaluation of the IMERG rainfall estimates compared to ground observations such as radars and gauges, or to other existing satellite rainfall data (e.g.Gaona et al., 2016;Guo et al., 2016;Liu, 2016;Prakash et al., 2016a, b;Sharifi et al., 2016;Tan et al., 2016;Tang et al., 2016).For instance, Tang et al. (2016) demonstrated, through an intercomparison study between the data using a hydrological model, that the IMERG products can adequately substitute TMPA products, both statistically and hydrologically.Furthermore, Tan et al. (2016) presented a new validation approach for tracing rainfall errors to individual platforms or techniques within the IMERG system using ancillary variables provided in the products.Such analyses can provide useful information, not only for further improvements in processes of satellite rainfall retrieval but also for users in many relevant applications, from hydrological modeling and hazard studies to climate simulations (Barros et al., 2000;Nicholson et al., 2003;Bidwell et al., 2004;Wolff et al., 2005;Roca et al., 2010;Chen et al., 2013;Huang et al., 2013;Kirstetter et al., 2013;Lo Conti et al., 2014;Worqlul et al., 2014).
In this study, we evaluate and compare the rainfall data generated by all three IMERG runs using rain-gauge-based gridded data from the WegenerNet Feldbach region (WEGN) high-density climate station network in southeastern Austria (Kirchengast et al., 2014).Through this approach, the study aims to rigorously test the performance of IMERG runs and to explore differences between the data.The comparison is conducted over two 0.1 • × 0.1 • IMERG grid boxes, which are fully covered by 40 and 39 WEGN stations, respectively.We investigate the data during April-October in the years of 2014-2015, the first two years after the launch of the GPM Core Observatory.While IMERG-F data are available from April 2014, IMERG-E and IMERG-L data are only available from April 2015 at the time of writing, so the evaluation of these two runs is restricted to 2015.Note that all the IMERG data will eventually be retrospectively processed to the start of the TRMM era.
Even though gauge data are considered to be ground reference in many existing validation studies, it is acknowledged that gauge measurements are also subject to uncertainties in terms of areal representativeness, owing to a limitation in spatial coverage (Morrissey et al., 1995;Villarini et al., 2008).Fortunately, this point is of much less concern for the WEGN data.Around 40 gauges in one IMERG grid ensure a high reliability of data within the domain area, considering that a much smaller number of gauges ranging from 5 to 15 gauges per 2.5 • × 2.5 • grid cell, depending on the study (Rudolf et al., 1994;Xie and Arkin, 1995;Ali et al., 2005;Villarini, 2010), has been suggested to guarantee a monthly error of less than 10 %.The variance reduction factor (VRF) (Villarini et al., 2008), examined using half-hourly WEGN data from the 40 gauges grid box, is about 0.02 for 10 gauges (average of 40 random combinations), and little or no improvement is observed in the VRF beyond 10 gauges.Another concern is that the use of tipping-bucket gauges, as employed in the WEGN network, is associated with systematic errors caused by various factors such as wind speed and rainfall intensity (Nešpor and Sevruk, 1999;Duchon and Essenberg, 2001).To this end, the WEGN data are adjusted by a correction factor described by O et al. (2016), who found that WEGN tends to underestimate rainfall by about 10 % compared to reference gauges.
The paper is organized as follows.Following this introduction, Sect. 2 further introduces IMERG and WEGN data and Sect. 3 describes the methodologies adopted for the assessment of IMERG estimates.The results are detailed in Sect.4, in terms of statistical evaluation and analysis of example rainfall events.Section 5 contains concluding remarks and plans for future studies.
The IMERG system is run twice in near-real time (NRT), first to produce IMERG-E data about 6 h after nominal observation time for users who need a quick answer related to potential flood or landslide warnings, and second to produce IMERG-L data with approximately 18 h latency for users working in agricultural forecasting or drought monitoring.Once the monthly gauge analysis is received, the final IMERG cycle is run to create the IMERG-F data approximately 3 months after the observation month.Note that both IMERG-E and IMERG-L runs only use some of the IMERG processing steps.For instance, instantaneous PMW rainfall estimates are only propagated forward in time by the morphing scheme of the IMERG-E run, whereas both forward and backward morphing schemes are used in IMERG-L and IMERG-F runs.In this way, IMERG-L and IMERG-F runs are expected to better describe changes in the intensity and shape of rainfall features.For bias adjustment, the IMERG NRT runs use climatological gauge data, while the IMERG-F run ingests monthly GPCC gauge analyses, so the IMERG-F estimates are supposed to be the most accurate and reliable (Huffman et al., 2015a, b).In this study, we use the calibrated estimates (precipitationCal) for all IMERG runs.
IMERG version 4 (V04) products have recently been started to be released, but the new data do not yet cover the time ranges used in this study at the time of writing (as of April 2017).However, the different version should not lead to significant changes in our conclusions, since the main aim of this study is to evaluate the three different IMERG runs relative to each other.Most of the changes in V04 are applied to all three IMERG runs (Huffman et al., 2017), so any improvements to the IMERG-F run should also result in a similar improvement to IMERG-E and IMERG-L runs.Furthermore, it is likely that the algorithmic and data differences between the runs (e.g., backward morphing and gauge adjustment) have a stronger influence than any differences between the versions.

WEGN gridded rain gauge data
The WEGN is a high-resolution network for weather and climate study and monitoring purposes, located in the Feldbach region, southeastern Austria (Kann et al., 2011;Kirchengast et al., 2014;Scheidl, 2014;Szeberényi, 2014;Kann et al., 2015).The region is part of the southeastern Alpine foreland, characterized by the river Raab valley and a moderate hilly landscape, with altitudes ranging from 260 to 600 m.The network comprises 153 weather stations in an area of about 300 km 2 (i.e., about one station per 2 km 2 ), collecting rainfall measurement data every 5 min (Fig. 1).A total of 151 stations employ tipping-bucket gauges for rainfall measurements, and each gauge was equipped with one of three different sensors during the study period (Szeberényi, 2014).Meanwhile, since a major sensor replacement in 2016, all WEGN tipping-bucket gauges have employed the same type of sensor (O et al., 2016).
Once the WEGN processing system receives "Level 0" raw observations, with a latency of 1-1.5 h, the Quality Control System produces "Level 1" station-level data.Then, only best quality Level 1 data are chosen to transfer into the Data Product Generator (DPG), and the DPG generates the general user data products, "Level 2" station time series, as well as 200 m × 200 m gridded data by an inverse-distance-weighted interpolated method; all missing and non-best Level 1 data are filled in by temporal and spatial interpolation as part of the DPG processing.All data products are available online at the WEGN web portal within 2 h latency.Since recently, based on the findings of O et al. (2016), the Level 2 processing has started to apply a bias correction factor for part of the rain data.More information on the WEGN data processing system and data products can be found in Kabas et al. (2011) and Kirchengast et al. (2014).
For the statistical comparisons and the study on rainfall events reported in Sect.4.1 and 4.2, half-hourly WEGN gridded rainfall data are used, which are generated by summing up the basic (5 min) gridded data, for direct comparison with IMERG rainfall estimates on a WEGN grid points average to IMERG grid box basis.For Sect. 4.3, on interpreting temporal characteristics of the satellite estimates, WEGN gridded data with 5 min native resolution are used.For computing the area-averaged WEGN rainfall for each IMERG grid box, we simply take the arithmetic mean of all WEGN grid points that lie within the grid box.Furthermore, we use a threshold value of 0.05 mm 30 min −1 to define rain/no rain for removing false alarms in half-hourly data.Figure 2 shows that the number of WEGN half-hourly rainfall values retained after such threshold-clipping is only significantly reduced for a small number of gauges (fewer than about 30 out of 150 gauges) and that more than about 65 gauges are not affected at all.This suggests that the chosen threshold is reasonable, leaving a high amount of reliable half-hourly data, and that the WEGN half-hourly data exceeding 0.05 mm are very unlikely to be false alarms from the gauges' technical limitations.We also note that WEGN is not a member of the GPCC network, so the WEGN gauge data are independent of the IMERG gauge adjustment process.

Approach
We assess the performance of IMERG runs using both statistical and graphical methods.After inspecting some basic time series differences, we compare probability density functions (PDFs) and cumulative distribution functions (CDFs) of half-hourly IMERG estimates and WEGN data in terms of their distribution as function of rain rate.The PDF of rain occurrence (PDF c ) describes the percentages of rain occurrence across the predefined bins.Conversely, the CDF of rain volume (CDF v ) indicates the relative contribution of rain rate in each bin to the total rain volume (Chen et al., 2013;Kirstetter et al., 2013).The PDFs and CDFs are computed over a binning range up to 30 mm, with a 0.5 mm bin width.We also use scatter plots to visually evaluate how IMERG estimates are distributed compared to the WEGN data.
In addition, we adopt widely used statistics and contingency indices, including relative bias (RB), mean absolute error (MAE), root mean squared error (RMSE), Pearson correlation coefficient (r), Spearman's rank correlation coefficient (ρ), and probability of detection (POD), for quantifying differences in performance between the IMERG runs.These are used with definitions as follows.
where cov(X, Y ) is the covariance between X and Y values, and var(X) is the variance of X, where rank j (X) means the rank position of X, and where "hits" means that both IMERG and WEGN data recorded rainfall (≥ 0.05 mm 30 min −1 ), and "misses" refers to the rainfall occurrence identified by WEGN data but missed by IMERG data.The POD ranges from 0 to 1 with a perfect score of 1 (0 in the case of misses).Furthermore, we select two example rainfall events for case-based inspections of spatial patterns and time series, in order to visually explore some pronounced discrepancies, especially of IMERG-E and IMERG-L estimates, compared to WEGN data.The half-hourly WEGN gridded data from the whole network domain and corresponding time series of both IMERG estimates and WEGN data are used in this evaluation, including the consideration of data sources, i.e., PMW or IR observations.Lastly, we employ the method of Villarini and Krajewski (2007) to provide an interpretation of IMERG rainfall estimates in terms of gauge accumulation time, , and the offset, δ.WEGN 5 min gridded data are used as the basis for the  purpose of test-integrating these gauge data in the range between 5 and 100 min.The offset means the time from which the accumulation of gauge data is started and is considered to account for time differences between instantaneous satellite estimates and actual rainfall on the ground surface.Consequently, we can reveal the combination of and δ which leads to a minimum RMSE and interpret it as the temporal resolution of the IMERG rainfall estimates.
When only the pairs for which both IMERG and WEGN data exceeding the threshold value are investigated, we generally classify the entire data into low rain intensities (≤ 80th percentile) versus high rain intensities (> 80th percentile), according to the percentiles of WEGN rain rates, and also into warm season (April, May, and October) versus hot season (June-September), following the approach of Villarini and Krajewski (2007).According to temperature measurements collected by WEGN, the average 2 m air temperature of the study period (2014)(2015) was 12.2 • C in the warm season and 18.6 • C in the hot season.We did not use data from the cold season, November to March, in order to guarantee the robustness of the WEGN data as ground reference, since most WEGN gauges are not heated and therefore do not capture snowfall events accurately (O et al., 2016).

Statistical evaluation of IMERG rainfall estimates
Basic statistics of IMERG estimates and WEGN data are summarized in Table 1.All three IMERG estimates have a higher value for mean and maximum rain rates, and for standard deviation, compared to the rain rates shown by WEGN data.The percentage of no rain is also slightly larger in IMERG data, which is very likely related to limitations of satellite observations in detecting very low rain intensities  ( Kirstetter et al., 2012Kirstetter et al., , 2013)).Figure 3 shows the 24 h accumulated rainfall time series comparison (0.2 mm threshold is applied for the daily amounts).IMERG estimates and WEGN data show good overall agreement on the occurrence of most daily rainfall events at IMERG grid scale, although IMERG tends to overestimate high rain rates.
Hydrol  Figure 4 shows PDF c and CDF v of IMERG estimates versus those of WEGN data.The IMERG estimates are in good agreement with the WEGN data in terms of rain occurrence except for low rain rates (< 0.5 mm 30 min −1 ).However, IMERG shows high rain rates exceeding the maximum value of WEGN data (15.3 mm 30 min −1 ) and consequently yields relatively large differences in CDF v compared to WEGN data for the moderate to high rain rates.More specifically, rain rates less than 15 mm 30 min −1 contribute to essentially 100 % of the total rain volume for WEGN data, while about 95 % for IMERG-F, and only about 75 % for IMERG NRT estimates.This shows that the satellite estimates sometimes overestimate rainfall and produce very high values, which have, in spite of their low frequency, a significant impact on the total rain volume.
Figure 5 is similar to Fig. 4 but with the data pairs restricted to those whose IMERG and WEGN values are both higher than 0.05 mm 30 min −1 ; i.e., both detect rain.Here, we also divided the entire data into low and high rainfall intensities, and into warm and hot seasons, as described in Sect.3. Given that the disagreement of low rain rates in PDF c reduces (see entire data), it is confirmed that the differences seen in Fig. 4 are due to the poor sensitivity (misses) of satellites to low rain rates rather than due to some general biases in the estimation.Indeed, we recomputed the POD score, only using values when WEGN is above 0.5 mm 30 min −1 (i.e., disregarding low rain rates) and found that the POD scores are 0.70, 0.79, and 0.75 for IMERG-E, IMERG-L, and IMERG-F estimates, respectively, while compared to WEGN of the entire range of rain intensities above the 0.05 mm 30 min −1 threshold, the POD scores are only 0.50, 0.57, and 0.53, respectively.
In the panels of low rain intensities (< 1.2 mm 30 min −1 , or the 80th percentile of WEGN data), IMERG CDF v still gets a contribution from rain rates greater than 1.2 mm 30 min −1 , even over 10 mm 30 min −1 , while the corresponding PDF c has a fairly good agreement with that of WEGN.This leads us to suspect that such big differences could be associated with a time lag between rain peaks of IMERG estimates and WEGN data, rather than a tendency of the satellite to constantly overestimate rainfall; this will be further investigated and illustrated in Sect.4.2.For the high intensities, the IMERG runs tend to underestimate the rain rates.
Furthermore, the CDF v of the IMERG NRT estimates does not show physically plausible shapes, e.g., a sudden rise between 10 and 20 mm.As seen in the hot season, the comparison reveals a clear improvement in IMERG rainfall estimates by applying more retrieval or calibration processes to the satellite observations; CDF v moves closer to that of WEGN data, and also, the shapes become gradually smoother from IMERG-E via IMERG-L to IMERG-F estimates.In general, it is concluded that IMERG-F estimates have the highest overall accuracy, followed by IMERG-L and IMERG-E estimates.
Figure 6 shows scatter plots of IMERG estimates versus WEGN data to enable a more quantitative understanding of the discrepancy between the data.Although it is a common practice to conduct regression analysis with scatter plots, we decided not to because highly skewed distributions of rain rates (outliers) seen in the CDF v (Fig. 4) can strongly affect the results.Therefore, we chose to examine distributions of IMERG estimates over nine predefined rain rate bins (each bin containing at least 30 data pairs); 25th, 50th (median), and 75th percentiles of IMERG estimates are analyzed for each bin, and the collective results are shown in Fig. 6.
The IMERG runs show better performance (i.e., closer to one-to-one line) in estimating moderate rain rates within about 0.3 to 3 mm 30 min −1 but have a tendency to overestimate low rain rates and underestimate high rain rates.It is worth noting that the slopes of IMERG-F percentile lines (Fig. 6, top row) are consistent, with a relatively narrow spread, across the dataset partitioning, indicating that the biases in the IMERG-F estimates are relatively small and uniformly distributed.In contrast, the 75th percentile line of the IMERG NRT estimates is slightly off the 50th and 25th percentile lines, particularly in the hot season, which indicates that the distribution of IMERG NRT rainfall estimates in the bins is skewed toward low values.
Table 2 provides the statistics metrics computed for each of the two IMERG grid boxes (see Fig. 1).All metrics are improved in IMERG-F estimates, except the correlation coefficient (r) that may not be a proper metric to evaluate the accuracy of IMERG data due to some large outliers.Indeed, IMERG-F estimates show the highest Spearman's rank correlation coefficient (ρ) which is known to be much less sensitive to outliers (Legates and McCabe, 1999;Habib et al., 2001).A somewhat better performance in Grid 15.85 compared to Grid 15.95 may be attributed to an Austrian national station, which is not part of the WEGN gauges, but located within the WEGN area (over Grid 15.85), of which measurements are integrated into the GPCC gauge product, therefore influencing IMERG over Grid 15.85.

Analysis of example rainfall events
In this subsection, we focus on diagnosing the more detailed behavior of IMERG runs by selecting example rainfall events where the IMERG estimates show distinct differences from the WEGN data.Note that the WEGN can give very accurate information about the domain in terms of spatial and temporal rainfall variability, in spite of potential overall biases of up to about 10 % in the data.Figures 7 and 8 show the spatial distribution of two such rainfall events captured by the WEGN network and the corresponding time series of IMERG estimates and WEGN data.
Figure 7 shows a rainfall event in the warm season, on 30 May 2015.According to the spatial WEGN maps, the rain clouds arrived at Grid 15.85 first (around 21:00 UTC), and then drifted eastwards.Among the IMERG NRT runs, the IMERG-L run is better able to describe this time lag between the two grids.This improvement can be attributed to backward morphing, which is applied in the IMERG-L, but not yet available in IMERG-E.The IMERG-L run captures rainfall withdrawal at Grid 15.95 better as well.Nevertheless, all IMERG runs tend to overestimate rainfall, with a time difference of about 2 h earlier in starting time.This false alarm suggests that the PMW observations (19:30-20:30 UTC) made by the IMERG runs were likely combined with incorrect IR cloud information.However, despite the absence of available PMW observations during the actual rainfall, the overestimation in IMERG-F is much smaller, thanks to the adjustment by the gauge analysis.
Figure 8 shows a rainfall event in the hot season, on 8 July 2015.Here, again, the onset of rainfall in IMERG estimates is ahead of that of the WEGN data (see the shaded area around 16:00 UTC).It is interesting that IMERG NRT runs describe the first peak (13:30-14:00 UTC) at Grid 15.85 well, albeit with overestimation, but only the IMERG-E run captures the peak at Grid 15.95 with a half-hour time shift.We suspect that the satellite-observed rain was morphed more slowly than the actual cloud movement (from west to east), for example, because the cloud motion vectors derived from IR-based data do not always accurately reflect the actual cloud advection speed (Joyce et al., 2004;Joyce and Xie, 2011).Therefore, the peak still remains at Grid 15.95 in the IMERG-E estimates.
When it comes to the IMERG-L and IMERG-F estimates, we assume that the backward morphing identified the timing of the peak correctly.However, given that the morphing weights are inversely proportional to the time difference between the target data time and the PMW observation (i.e., higher weight is assigned for the time step when IMERG-E depicted the peak), the backward morphing significantly reduced the peak in the IMERG-E run (since it has a higher weight), whereas it only slightly increased the missing peak (since it has a lower weight).This implies a possibility of conflict between the forward and backward morphing that can lead to error in the rainfall estimates.In Fig. 8, both IMERG-E and IMERG-L overestimate rainfall during 16:00-22:00 UTC.This can be explained by differences in the number of PMW observations conducted in each IMERG run.The IMERG NRT runs could use four or fewer PMW observations during the period, all of which overestimated the rain rates (no difference in data values between IMERG-E and IMERG-L once the data are collected from the same PMW sensor), resulting in the overestimation after the forward morphing and then even more after the backward morphing.According to Zeweldi and Gebremichael (2009), evaporation below cloud base can introduce large positive bias by the CMORPH morphing method during warm and hot seasons.
Conversely, the IMERG-F run received more PMW-based information over the same period (see 18:00-18:30 and 19:30-20:00 UTC) and the monthly gauge analysis.Thus, it shows better performance than the IMERG NRT runs.This demonstrates clearly the value of more PMW-based estimates in the morphing process (Joyce and Xie, 2011) as well as the ability of gauge adjustment to mitigate systematic biases (Gaona et al., 2016).From these two case studies, it appears that the gauges provide a greater improvement to IMERG Final estimates.One reason the PMW observations overestimate rainfall is likely the subgrid-scale rainfall variability.For instance, the IMERG runs may use satellite footprints over the northwestern corner of the grid cells, where rain is stronger (≈ 15 mm 30 min −1 at 16:30-17:00 UTC), for their gridding process.cumulation starting between 90 and 30 min before the nominal time.Here, we use the same approach to provide an evaluation and interpretation of the temporal characteristics for the IMERG estimates in terms of rain gauge accumulation on ground.The WEGN 5 min gridded data are used and integrated over accumulation times from 5 min (native sampling) to 100 min (twenty 5 min samples) for time offsets from −20 to +60 min (in 5 min steps).Figure 9 shows the resulting RMSE of IMERG rainfall estimates versus WEGN data as a function of the gauge accumulation time, , and the time offset, δ.
The minimum RMSE value for the IMERG-F estimates of the entire dataset (Fig. 9, top left) occurs at a of about 25 min and a δ of about +40 min.This offset of 40 min exceeds the 30 min time resolution of IMERG, which means that the IMERG-F estimates are, on average, displaced by more than one time step.This suggests, for example, that IMERG-F rainfall estimates during 09:00-09:30 UTC can be considered as gauge measurements during 09:40-10:05 UTC.The positive offset is consistent with the early bias in rainfall onset found in Sect.4.2.The hot season shows a shorter offset for the minimum RMSE compared to the warm season (Fig. 9, top middle and right), which agrees with the results of Villarini and Krajewski (2007).
Intercomparing the IMERG products for their common period in 2015 (Fig. 9, bottom row), it is visible that longer values are needed to minimize RMSE of IMERG-E and IMERG-L rainfall estimates, while optimal δ values are obtained at around +20 min for the both datasets.This analysis identifies possible sources of error that should be considered in the context of hydrological applications of IMERG data.For instance, biases (overestimation in this case) in IMERG rainfall estimates will inevitably propagate through hydrologic models, and consequently this would lead to larger errors in runoff.The magnitude of biases can be reduced when IMERG Final estimates are used.Time offset bias, however, remains relatively stable across all three IMERG runs, especially in the warm season.Therefore, comparison or adjustment of IMERG estimates using local ground reference (if available) in terms of biases, not only in amount of rainfall but also in its timing, should be considered as an approach to reach the required level of accuracy in rain data.
In general, the IMERG NRT estimates show higher RMSE values compared to the IMERG-F estimates, as expected.Also, they show relatively indistinct patterns and even multiminimum RMSE values (in the case of IMERG-E).As such, this approach of interpreting the rainfall estimates may not be sufficiently constrained by the NRT estimates, due to the limited sample size from only 7 months of data and also due to larger errors.More years of data are needed before such an approach can provide a robust interpretation of the NRT estimates.

Conclusions
In this study, we evaluated half-hourly rainfall estimates from the IMERG-E, IMERG-L, and IMERG-F runs using gauge measurement data from the WEGN network in southeastern Austria for the period of April-October in 2014 and 2015.The dense WEGN gauge network provided a unique opportunity for a direct grid-to-grid comparison over two selected IMERG 0.1 • × 0.1 • grid boxes.This evaluation work provides valuable insights and input to improve satellite rainfall retrieval processes, to further intercompare data among satellite-based rainfall products, and to achieve a better product quality, in particular of IMERG NRT, for various data applications such as flood and landslide warning or agricultural drought forecasting and monitoring.
First, thorough statistical comparisons between IMERG estimates and WEGN data show the biases of IMERG both in rainfall occurrence and in intensity distributions.Nonetheless, we find that the IMERG-F run considerably outperforms the NRT runs.IMERG-E and IMERG-L runs overestimate low rain rates, leading to large discrepancies in accumulated rainfall, which result in a lower correlation with WEGN data in general.All three IMERG runs tend to underestimate high rain rates.
Second, the study of rainfall events selected to examine large IMERG-WEGN discrepancies reveals specific situations, e.g., a lack of PMW-based observations during shortterm rainfall, when the IMERG runs can fail to describe rainfall features even qualitatively.Here, again, we find significantly smaller errors in the IMERG-F estimates, by the monthly gauge correction, compared to the IMERG NRT estimates.
Last, by calculating the RMSE of the half-hourly IMERG estimates compared to the WEGN ground-based rainfall data as a function of gauge accumulation time and time offset, the minimum RMSE found for IMERG-F estimates suggests these can be regarded as a 25 min accumulation with a +40 min time offset (preceding the time of the gauge data by this time span).For example, an IMERG-F estimate for 09:00-09:30 UTC can be interpreted as an accumulation over 09:40-10:05 UTC.Again, the results for the IMERG NRT estimates suggest significantly lower confidence, both due to insufficient sample size and larger estimation errors.
Consequently, our analysis across the different runs of IMERG demonstrates the effects of the additional processes on the final rainfall estimates.While the better performance of the IMERG-F run is often attributed to the gauge adjustment procedure (Boushaki et al., 2009;Vila et al., 2009;Almazroui, 2011), we also identify the advantages of a greater number of PMW-based estimates.Conversely, the inclusion of forward and backward morphing in the IMERG-L run, with sparse PMW observations, provides only marginal benefits over the forward-only morphing in the IMERG-E run.In fact, our case study of example rainfall events illustrates the interesting possibility of cancellation in the backward and forward morphing estimates for the IMERG-L run, resulting in a performance poorer than in the IMERG-E run.These results for the performance of IMERG runs could be representative of other regions under similar conditions (e.g., midlatitude land areas).The study approach is, however, not easily applicable to different precipitation regimes.This is mainly due to the limited availability of independent ground reference data like WEGN.As a result, WEGN offers valuable information about the accuracy of IMERG estimates across its three different runs.
Further studies on detailed links between the errors in the final rainfall estimates and the upstream data sources (i.e., contribution of each PMW/IR sensor to biases in IMERG estimates) or retrieval processes, to alleviate those issues, will contribute to improvements in the performance of the IMERG-L run (e.g., by accounting for time-lagged peaks or improving the cloud motion vectors) and, consequently, the IMERG-F run.Meanwhile, addressing instantaneous satellite estimates made in the IMERG runs will help us to understand overestimation in the PMW estimates themselves.
Our future work on the evaluation of IMERG products will place emphasis on the IMERG-F data, in order to bet-ter understand the behavior of rainfall estimates with various conditions, such as different temporal accumulation, varying thresholds, or the inclusion of PMW/IR sources.Using the WEGN high-resolution data, we can also explore rainfall uncertainty and variability at a IMERG subpixel-scale, another intriguing prospect.Additionally, it will be worthwhile to intercompare the version of IMERG-F data used here (V03) with the current version (V04) or the upcoming version (V05) to be released soon for evaluation of improvements of IMERG.IMERG V04 is the first version to use the GPM Core Observatory as a calibrator for the constellation satellite partners so it is expected to provide more consistent quality among the PMW/IR estimates.

Figure 1 .
Figure 1.WEGN climate station network in the Feldbach region (black square), southeastern Austria.The inset plot shows an enlarged view of the number of WEGN rain gauges that are located within GPM IMERG 0.1 • × 0.1 • grid cells.The two red-framed grid boxes (within 46.9-47.0• N, 15.8-16.0• E), containing 40 and 39 WEGN gauges each, are selected for the study; for brevity, these are also termed "Grid 15.85" and "Grid 15.95" based on midlongitude.

Figure 2 .
Figure2.WEGN network-averaged half-hourly rainfall amounts as function of the number of WEGN gauges detecting rain (gauges with ≥ 0.1 mm rainfall are counted).The rain amounts for halfhourly time intervals over the study period(April-October in 2014- 2015)  are shown, to check the reasonability of a rain/no-rain threshold for the IMERG data evaluation; 0.05 mm is used for the study.The inset shows the full range of rainfall data.

Figure 4 .
Figure 4. Occurrence probability density function of rain rates (PDF c , dashed) and cumulative distribution functions of rain volume (CDF v , solid) for WEGN (gray) and IMERG (red).Comparisons are shown for IMERG-F over the full 2014-2015 period (a) and separately for IMERG-F (b), IMERG-L (c), and IMERG-E (d) over the 2015 period.

Figure 5 .
Figure5.Same layout per panel as Fig.4, but restricted to the data pairs for which IMERG and WEGN both detected rainfall (≥ 0.05 mm 30 min −1 ).Comparisons are shown for IMERG-F (a-e), IMERG-L (f-j), and IMERG-E (k-o), using the entire data (a, f, k), the data divided into low and high rain intensities (b-c, g-h, l-m) based on the 80th percentile of WEGN data, and the data divided into two different seasons (d-e, i-j, n-o), warm (April, May, October) and hot (June to September).

Figure 6 .
Figure6.Scatter plots of half-hourly rainfall amounts for IMERG-F (a-c), IMERG-L (d-f), and IMERG-E (g-i) versus WEGN, for the entire dataset (a, d, g), warm season only (b, e, h), and hot season only (c, f, i).Overplotted are, for nine bins across the range with at least 30 IMERG-WEGN data pairs in each bin, the 50th percentile (black symbols and line) and the 25th and 75th percentiles (gray symbols and lines) of the IMERG estimates.Total number of data pairs (n) is indicated in the upper left of each panel.Note that a log-log scale is used.

Figure 7 .
Figure 7. Example rainfall event in the warm season that occurred on 30 May 2015.(a) Spatial rainfall pattern over the WEGN network for consecutive half-hourly periods over the two selected IMERG grid boxes (red outline; see Fig. 1).(b) Time series of IMERG-F, IMERG-L, IMERG-E estimates, and WEGN data in each grid box during the rainfall event, with the shaded areas highlighting the two hours illustrated by the map sequence in (a).Solid vertical lines indicate time steps where all three IMERG runs received PMW-based information for the rainfall retrieval at that time step, dotted vertical lines (not applicable for this event but for the event of Fig. 8 below) mean that either IMERG-E or both IMERG NRT runs did not receive PMW-based information at the time step, and no vertical line (the case for most time steps) implies that none of the IMERG runs received PMW-based information.

4. 3
Evaluation of temporal matching of IMERG estimates Villarini and Krajewski (2007) used contour diagrams of RMSE as a function of accumulation time and time offset to interpret TMPA 3-hourly rainfall estimates as a 100 min ac-Sungmin O et al.: Evaluation of GPM IMERG Early, Late, and Final rainfall estimates

Figure 8 .Figure 9 .
Figure 8. Same as Fig. 7, but for an example rainfall event in the hot season that occurred on 8 July 2015.

Table 1 .
Description of the data: mean, standard deviation (SD), maximum, number of rainfall amount values (≥ 0.05 mm 30 min −1 ), and percentage of no-rain data (< 0.05 mm 30 min −1 ).Half-hourly rainfall data for the period of April to October in 2014 and 2015 are used.IMERG-L and IMERG-E data are available from April 2015, i.e., used for the second year only.

Table 2 .
Validation statistics comparing the performance of IMERG runs at each grid box.