Monitoring small reservoirs storage from satellite remote sensing in inaccessible areas

In river basins with water storage facilities, the availability of regularly-updated information on reservoir level and capacity is of paramount importance for the effective management of those systems. Yet, for the vast majority of reservoirs around the world, storage levels are either not 5 measured or not readily available due to financial, political or legal considerations. This paper proposes a novel approach using Landsat imagery and Digital Elevation Models (DEM) to retrieve information on storage variations in any inaccessible region. Unlike existing approaches, the method does not 10 require any in situ measurement and is appropriate to monitor small, and often undocumented, irrigation reservoirs. It consists of three recovery steps: (i) a 2D dynamic classification of Landsat spectral bands information to quantify the surface area of water, (ii) a statistical correction of DEM data 15 to characterise the topography of each reservoir and (iii) a 3D reconstruction algorithm to correct for clouds and Landsat 7 Scan Line Corrector failure. The method is applied to quantify reservoir storage in the Yarmouk basin in Southern Syria, where ground monitoring is impeded by the ongoing 20 civil war. It is validated against available in situ measurements in neighbouring Jordanian reservoirs. Coefficients of determination range from 0.69 to 0.84, and the normalised root-mean-square error from 10 % to 16 % for storage estimations on six Jordanian reservoirs with maximal water sur25 face areas ranging from 0.59 km2 to 3.79 km2.


Introduction
Reservoirs are essential for the development and management of a river basin's water resources, no matter their size (Liebe et al., 2005;Leemhuis et al., 2009). By increasing the availability of water during low-flow periods (International Commission On Large Dams, 2016), dams often play a key role in water supply, irrigated agriculture, hydropower generation, navigation, cattle breeding, fisheries, etc.
Despite these valuable applications, there is a scarcity of monitoring data as many countries cannot financially afford to build gauging stations (Solander et al., 2016). And even when monitoring systems do exist, there may not be institutions to collect the data, or legal means to disseminate them, as they are often considered sensitive data (Alsdorf et al., 2007;Dombrowsky, 2007;Duan and Bastiaanssen, 2013). Yet this information is essential to conduct hydrological studies in committed basins, from defining reservoir operation rules in simulation models (Yoon and Beighley, 2015) to assessing the impact of multi-reservoir systems on downstream river discharge (Vörösmarty et al., 1997;Hanasaki et al., 2006;Döll et al., 2009).
In that context, remote sensing is a promising tool to overcome the difficulty in accessing reliable information on a reservoir. This technique has also been applied to characterize a range of continental water bodies such as large lakes (Birkett, 1995;Ponchaut and Cazenave, 1998;Mercier et al., 2002), paddy rice fields (Islam et al., 2010), or tidal floods (Yan et al., 2010). The general procedure to monitor storage consists in associating water surface elevation and area after evaluating them independently (e.g. Frappart et al., 2006). N. Avisse et al.: Monitoring small reservoirs' storage Satellite radar and laser altimetry are the predominant approaches to estimating the elevation of open water bodies (e.g. Morris and Gill, 1994;Crétaux and Birkett, 2006;Calmant et al., 2008;Gao et al., 2012;Wang et al., 2013), or their bathymetry (Arsen et al., 2014). Orbit repeat periods of radar altimeters such as Topex/Poseidon (T/P), GFO, Jason-1 and 2, or Envisat range from 10 to 35 days. They have a high vertical accuracy with root-mean-square errors of the order of centimetres to tens of centimetres, depending on the altimeter and the size of the water body (Calmant et al., 2008;Crétaux et al., 2016). However, the above-mentioned sensors are affected by important drawbacks, including nadir viewing, narrow swath, coarse cross-track spacing (a few hundred kilometres), long along-track path length (about 1 km), and large elevation differences around some water areas that impede their application to more than a few hundred large lakes and reservoirs on the planet (i.e. area > 100 km 2 and width > 500 m) (Crétaux and Birkett, 2006;Alsdorf et al., 2007;Gao et al., 2012). More recent satellites such as Cryostat-2 or Sentinel-3 present significant improvements in terms of along-track resolution (∼ 300 m). However, their respective inter-tracks of 7 and 52 km (Donlon et al., 2012;Crétaux et al., 2016) still place many reservoirs out of the trajectory of their nadir-viewing sensors onboard. The small inter-track of Cryosat is also realized at the expense of a long revisit cycle (369 days) that impedes any monitoring of small reservoirs on a monthly basis. Alternatively, the Geoscience Laser Altimeter System onboard the Ice, Cloud, and Elevation Satellite (ICESat/GLAS) measured land surface elevations between 2003 and 2009 with a much finer spatial resolution (footprint size between 50 and 105 m every 170 m along the track), a vertical accuracy close to 10 cm (Zhang et al., 2011;Duan and Bastiaanssen, 2013), and a finer crosstrack resolution (15 km maximum at the Equator, Zwally et al., 2002). There was however no continuous elevation retrieval: ICESat/GLAS gathered data only during designated campaigns, with a long ground-track repeat cycle for almost all of it (183 days). Furthermore, unlike radar altimeters that can be used under all weather conditions (Birkett and Beckley, 2010), laser measurements are affected by the presence of thin clouds (Duan and Bastiaanssen, 2013). Many existing studies consequently used ICESat/GLAS data to get a trend on pre-determined large lake variations over several years (e.g. Zhang et al., 2011;Duan and Bastiaanssen, 2013;Song et al., 2013), or to calibrate area-elevation relationships for a limited number of water bodies large enough for the satellite to take sufficient elevation measurements per track (Zhang et al., 2014).
Water surface areas are commonly determined from optical satellite imagery such as MODerate Resolution Imaging Spectroradiometer (MODIS) and Landsat products (Xiao et al., 2006;Gao et al., 2012), or synthetic aperture radar (SAR) sensors (e.g. RADARSAT, JERS-1, ERS, or Sentinel-1) (Annor et al., 2009;Duan and Bastiaanssen, 2013;Amitrano et al., 2014). The latter has however been less used due to the difficulty in getting consistent results, as the required condition of a significantly lower phase coherence of water areas than of the surrounding land surface is not always met, with orbital repeat cycles of more than a few days, or with wind or rain (Alsdorf et al., 2007;Eilander et al., 2014). Therefore, existing approaches have used either MODIS or Landsat, depending on their emphasis on spatial or temporal resolution (Solander et al., 2016;Zhang et al., 2016). Images acquired during the various Landsat missions have a much finer spatial resolution (30 m) than MODIS' (250 m for the red band, 500 m for infrared), but they are taken on a repeat cycle of 16 days compared to the daily MODIS products. The higher revisit frequency of MODIS satellites allows MODIS-based approaches to better address clouds and smoke artifacts in optical images. However, MODIS missions cover a much shorter period (July 2000 to present) than Landsat missions (July 1982 to present). The potential of the recent two Sentinel-2 satellites can also be mentioned for post-2015 studies. Launched in June 2015 (Sentinel-2A) and March 2017 (Sentinel-2B), they provide spectral bands at a resolution of 10 m for visible and NIR bands, and at 20 m for SWIR bands. They also have a repeat cycle of 5 days by combining the two (European Space Agency, 2013; Yang et al., 2017).
The common protocol to separate water areas from other land use categories is to apply a threshold to indices such as the Normalized Difference Vegetation Index (NDVI) (e.g. Frappart et al., 2006;Gao et al., 2012), or the Modified Normalized Difference Water Index (MNDWI) proposed by Xu (2006) (e.g. Crétaux et al., 2015Müller et al., 2016). But determining an adequate value for a multi-temporal analysis can be challenging because such a threshold is known to be case-dependent (Liu et al., 2012;Coltin et al., 2016). Furthermore, separating water from land or vegetation may be difficult due to subpixel land-cover components (Ji et al., 2009) or water quality that can vary throughout a water body (Gao et al., 2012). To address these issues, decision tree defined thresholds have successfully been applied with various vegetation indices (e.g. Xiao et al., 2006;Islam et al., 2010;Yan et al., 2010), but remain case-dependent. Coltin et al. (2016) then advocated the implementation of automatic thresholds as they developed a supervised learning approach to improve flood mapping. Other methods like unsupervised classification (Wang et al., 2008), or direct elevation-area relationships from a digital elevation model (DEM, Wang et al., 2005), have also been tested, but did not prove to be more precise. Gao et al. (2012) recently developed a method to combine both an index analysis and an unsupervised classification to improve the accuracy of the delineation of water areas. The approach was refined by Zhang et al. (2014), who enhanced the storage assessment with a novel surface area retrieval algorithm.
While promising, these approaches generally fail to systematically combine remote sensing surface area and elevation due to the different timing in orbital repeat cycles  of different satellites. Elevation-area relationships are then deduced from remote sensing data that are available at the same time (e.g. through linear or polynomial regressions, Gao et al., 2012;Duan and Bastiaanssen, 2013;Song et al., 2013), so that reservoir storage can be computed with either remote sensing elevation or area only. Even then, existing methods estimate storage in relative terms, either from the already known elevation, area and storage at capacity (Zhang et al., 2014), or from the lowest water level detected (Duan and Bastiaanssen, 2013). Furthermore, these approaches have only been applied to reservoirs larger than 100 km 2 , which are estimated to represent only 0.54 % of reservoirs larger than 0.1 km 2 in the world (Lehner et al., 2011). Studies that analysed small reservoirs delineated water surface with Landsat optical sensors (e.g. Liebe et al., 2005;Sawunyama et al., 2006;Rodrigues et al., 2012) or radar images to address the cloud cover issue (Annor et al., 2009;Liebe et al., 2009), and could only get an estimation of storage capacities by conducting bathymetrical  surveys. Due to their reliance on in situ observations, these methods are inapplicable to remote, ungauged, or conflicttorn areas. This paper introduces a new method to monitor reservoir storage based on remote sensing data exclusively. The method is applied to small reservoirs -capacities and water surface areas starting from 1 hm 3 (million cubic metres) and 0.5 km 2 respectively -in the Yarmouk River basin (YRB; see Fig. 1) in southern Syria during the ongoing civil war and the decade before it started. Its prediction performance is tested against available in situ observations of reservoir storage and elevation in neighbouring Jordan.
The document is organized as follows: Sect. 2 presents the method and algorithms developed for the monitoring of reservoir storage, Sect. 3 reviews results, error measurements, and sensitivity analysis, and Sect. 4 concludes the study.

Methodology
The procedure is based on two types of data: Landsat images for water area estimation, and DEM for topography. It works in three stages that are presented in the flowchart in Fig. 2. The idea behind the process is (i) to use Landsat bands to enhance the detection of water pixels, then (ii) to exploit this information to statistically correct the DEM vertical errors and characterize reservoir bathymetry, and (iii) to use the updated topography to reconstruct missing parts of Landsat images (e.g. pixels covered by clouds or not captured by the Landsat sensor).
N. Avisse et al.: Monitoring small reservoirs' storage 2.1 Two-dimensional dynamic classification and water body area retrieval Landsat images are chosen because they are freely available with a spatial resolution fine enough (30 m) to detect variations in the area of small reservoirs. The spatial resolution of MODIS images is indeed too coarse to assign to any small reservoir a proper range of area and elevation (1 km 2 is covered by 16 MODIS image 250 m pixels only). Thus, about 300 Landsat 4, 5, 7, and 8 images for each scene -index 173/37 above a part of the YRB, 174/38 above reservoirs in Jordan, and 174/37 above parts of both in the Worldwide Reference System (WRS; see the scene frames in Fig. 1) -are downloaded from the United States Geological Survey (USGS) EarthExplorer website (https://earthexplorer. usgs.gov/).

Fmask
We use the Fmask (Function of mask) algorithm (Zhu and Woodcock, 2012;Zhu et al., 2015) to discriminate cloud coverage from open water. The algorithm was originally designed to separate potential cloud pixels from clear sky pixels on Landsat images using empirical thresholds on the NDVI and the near-infrared band, with an overall accuracy of 96.41 % (Zhu and Woodcock, 2012). Fmask distinguishes land and water areas and produces a probability mask for clouds, which we use to manually remove images that are almost entirely covered by clouds or with obvious large errors in water body detection. After quality control, about 245 images remain per location. Most pixels classified as water by Fmask can reasonably be considered water due to the relatively selective thresholds used in the algorithm. Hence, at this stage, the uncertainty remains with regards to pixels hidden by clouds or cloud shadows, misclassified by Fmask as land or snow, or not captured by the Landsat sensor (e.g. "N/A" -not available -stripes caused by the Landsat 7 Scan Line Corrector -SLC -failure; see Fig. 4a and b). Our analysis reveals that, on average, 24.1 % of the reservoirs' pixels are misclassified as land, 8.1 % are covered with clouds or cloud shadows, and 8.6 % are in "N/A" areas (see Sect. 3.1).

Occurrence mask
We use the frequency with which pixels are classified as water to distinguish actual reservoirs from small pools or misclassified land, and to delimit them. For each Landsat scene, the ∼ 245 satellite images are superimposed to form an image where each pixel represents the number of times it has been covered by water (see Fig. 3). This occurrence mask (M occ ) is useful for filtering occasional Fmask classification errors, and for creating a water mask (M wat ): pixels with values greater than 5 in M occ are classified as water and kept in M wat , while those with lower values are considered  Figure 3. Image of the number of times each pixel has been covered by water (M occ ). The text in black indicates the identification number (for Syrian reservoirs) and the name of known reservoirs.
Coordinates are expressed in CRS WGS 84/UTM zone 36N.
misclassified land and removed from water bodies (i.e. hidden by M wat ). In practice, the threshold of 5 was empirically chosen after comparing detected water bodies with Google Earth high-resolution (∼ 1 m) imagery. The same threshold is applied to reservoirs located in the overlapping area of two Landsat images as it does not change their contours. Its small value is justified by the fact that most images with obvious mistakes have already been manually discarded at the previous step. Aside from sporadic large wadis (intermittent rivers) that are manually removed from the mask, final water bodies in M wat are deemed to be reservoirs. They are the ones depicted in blue and green dots on the map in Fig. 1.

Classification enhancement for each Landsat image
The detection of water bodies is enhanced using NDVI and MNDWI rasters computed from Landsat imagery. A low NDVI can be attributed to both water and bare land, and a low MNDWI value can denote either water or clouds. We combine these indices and leverage their complementary nature to detect open water.
To ensure more reliable and repeatable values for identical land use categories in different images, the two indices are computed from surface reflectance, which is estimated by applying the image-based atmospheric correction Dark Object Subtraction 1 (DOS1, Chavez, 1996) to top of atmosphere (TOA) reflectance. However, the DOS1 adjustment is not optimal because it is not based on actual atmospheric or cloud cover measurements. Moreover, the slight band variations between the various Landsat missions may affect NDVI and MNDWI, and may require different thresholds to detect water. Consequently, two supplementary water detection ad-  justments are performed through the method presented in the flowchart in Fig. 5 to define a MNDWI threshold adapted to each date and climatic condition (i.e. each time t over a given scene). A NDVI mask (M NDVI (t)) is first created to calibrate the MNDWI threshold, which is then used to build a MNDWI mask representing water areas (M MNDWI (t)).
1. The goal of M NDVI (t) is to find with the NDVI all pixels where there could potentially be water. Depending on the results of the Fmask classification in M wat , three situations can arise.
i. If water is already detected by Fmask in M wat reservoirs, M NDVI (t) is formed from those ones (see the green dots in the Fig. 4c example).
ii. Where water is not detected by Fmask, we impose a threshold on the NDVI. Pixels with a NDVI of less than −0.1 in M wat are used to form M NDVI (t). The lowest values are indeed generally typical of water. This condition has been added to take into account images where thin clouds cover reservoirs.
iii. In ∼ 1 % of all images, no pixel meets the previous conditions, even if water can be seen by eye on the original spectral bands. We have indeed noticed that for these few cases, Fmask associates water bodies with clouds. M NDVI (t) is thus built on pixels classified as cloud in M wat .
2. The detection of water bodies is further enhanced by imposing a threshold on MNDWI images. The threshold is defined automatically so as to optimally distinguish water bodies from clouds. For the two first situations, the threshold is set to the X MNDWI th percentile (P X MNDWI ) of MNDWI values in M NDVI (t). X MNDWI is set to the maximum, 100, in order to avoid over-constraining the classification. The sensitivity of the method to the choice of this parameter is presented in Sect. 3.2 below. For the third case, the threshold is set to the 70th percentile (P 70 ) of MNDWI values in M NDVI (t).
Finally, the water mask -or MNDWI mask -M MNDWI (t) is formed by including only areas with a MNDWI of less than the MNDWI threshold in M wat . This last step allows us to incorporate most water pixels left out by Fmask and undetectable with NDVI (see the red dots in Fig. 4d).
After removing water areas smaller than 20 pixels (20 × 900 m 2 ), considered noise, the classified images have three categories: (i) Water as identified by the protocol developed above, (ii) Land according to Fmask, and if not in the category Water, and (iii) Unknown, which includes all other pixels (see Fig. 4d).

Digital elevation models
Unlike most studies, the proposed method does not rely on satellite altimetry to assess water bodies' elevation, but on DEMs to get the topography. It is then required that reservoirs were almost empty or not yet built when the DEM satellites passed over them, for at least one of the two sources considered: ASTER GDEM v2 and SRTM-C/X. ASTER

Statistical analysis
Corrected elevation & filling curve

2-D classes in all images
Probability to be immersed, r p for each pixel p Elevation Figure 6. Procedure for the statistical correction of topography.
GDEM v2 data were acquired between 2001 and 2008, and SRTM-C/X data on 11-22 February 2000. All have a spatial resolution of 1 (approximately 30 m at the Equator), which we resample to match Landsat images. The large coverage of these datasets is chosen over the very low precision of the measures. They indeed cover almost all of the Earth's land surface (except for SRTM-X): from 83 • N to 83 • S for ASTER GDEM v2, and from 60 • N to 56 • S for SRTM-C; but the vertical relative precision is very low compared to satellite altimetry: objectives of 15 and 6 m for 90 % of SRTM-C and SRTM-X data respectively (German Aerospace Center, 2017;Rodriguez et al., 2005), and standard deviations estimated to 3.95 and 8.68 m for SRTM-C and ASTER data respectively (ASTER GDEM Validation Team, 2011).

Elevation-area relationship
To improve elevation assessment, DEMs are statistically corrected by using the information on water surface areas obtained from Landsat images. The protocol presented in Fig. 6 is implemented for each reservoir.
1. a water coverage quantile is computed at each pixel to determine the probability of it being immersed. With each pixel p is associated the ratio r p defined as where N p water is the number of times the given pixel p is counted as Water, and N p land is the number of times it is counted as Land. We ignore the images where the pixel p is classified as Unknown.
2. To illustrate the interest of this section, Fig. 7 shows both r p and the relative elevation in the Kudnah reservoir. We can see that the two do not always match as we would expect -i.e. the lowest pixels are not always the most frequently immersed, nor are the highest pixels the most rarely immersed. The immersion frequency (r p ) can actually be used to correct the elevation. The former, which is estimated from the results of the 2-D clas-sification enhancement, is indeed assumed to be more reliable than the original DEM. Hence, each pixel's elevation H p is put in relation with the area A p , defined as the cumulated area of all pixels q in the reservoir for which r q ≥ r p . The examples of Fig. 8 confirm the observations made in Fig. 7: pixels' elevations are not always correlated with the number of times they are classified as water. To a certain extent the difference was expected from the DEM's low vertical precision, but some "anomalies" concerning the most often immersed pixels (i.e. lowest A p ) can be recurrent from one reservoir to another, due to either a strong dispersion in elevation (see the SRTM-X data in Fig. 8b), or a flat elevation (see the SRTM-C data in Fig. 8c). We interpret this irregularity as arising from the presence of water where the satellite tried to evaluate elevation: in the case of SRTM-X, the measure over water is hampered for reasons inherent to the use of a SAR sensor; and in the case of SRTM-C, DEM pixels covered with water may have been filled during a post-treatment analysis. Either way, elevation cannot be retrieved from the given DEM for these reservoirs' most often immersed pixels.
3. To address the issue, a polynomial regression on observed land pixels (A > A i , with A i the area assumed to be immersed during the satellite elevation retrieval) is used to build a "corrected elevation"-area relationship (A → H c (A)) that best fits the data (in a least-squares sense). Values of H greater than the 80th percentile or lower than the 20th percentile are ignored to filter potential errors and smooth the data. This step is executed three times -one for each DEM -and the better quality dataset (i.e. the one with less dispersion and fewer "anomalies" as defined above) is kept. Examples are shown in Fig. 8.

4.
A filling curve -the volume-area relationship -is finally constructed using the outcomes of the previous step.
The regression relies on the assumption that elevation estimates are correct on average by considering many pixels. Indeed, the relative error in elevation approaches zero when the number of images taken into account grows. This property has already been used by LeFavour and Alsdorf (2005) for instance, in order to estimate the slope of the Amazon River. Parameters and results of the regression for reservoirs that fulfil the criteria mentioned at the beginning of this article -maximum storage and area larger than 1 hm 3 and 0.5 km 2 respectively -are summarized in Table 1.

Three-dimensional reconstruction through hidden areas
Retrieving missing parts of water bodies in the Unknown areas means dealing with Landsat drawbacks: (i) the 16-day repeat cycle making images regularly covered by clouds, and (ii) the failure of the Landsat 7 SLC that led to large data losses for the Enhanced Thematic Mapper Plus (ETM+) sensor after May 2003 (see the grey stripes in Fig. 4b). Zhang et al. (2014) developed an approach to improve quite significantly the estimation of a reservoir's water area. However, their method requires that only a small part of the reservoir is misclassified or hidden. This is not a problem if one works with MODIS images over very large reservoirs, but in our situation -Landsat images over small water bodies -the condition is rarely met.
We developed an alternative algorithm to use the information from each individual pixel.
1. As the area A p has been associated with each pixel p, and H c has been expressed in terms of A, a corrected elevation is associated with each pixel in a reservoir.

Each pixel in an Unknown area adjacent to water areas is set to Water if (i) the pixel is in M wat , and (ii) its corrected elevation H
p c is less than the X H c th percentile of the corrected elevation in all adjacent water bodies. This threshold is set to 98 to ignore the highest values of H c , in case they were associated with pixels misclassified as water. A sensitivity analysis has been conducted with regard to this threshold, and the results are available in Sect. 3.2 below.
This water body reconstruction technique relies on the fact that a pixel that is often immersed likely has an elevation lower than a pixel that is rarely immersed. This is a reasonable assumption due to the large number of images analysed. The blue dots in Fig. 9 show how the 3-D reconstruction complements the previous 2-D information retrieval. Finally, storage variations are obtained by combining final reconstructed areas with the previously determined filling curves.

Storage variations: validation and discussion
Storage variations estimated by remote sensing for all reservoirs that cannot be gauged in the YRB are displayed in Fig. 10. These reservoirs are located in Syria and in the Israel-controlled Golan Heights. By qualitatively comparing our results to those obtained by Müller et al. (2016) (monitoring of Syrian reservoirs using Landsat 7 datasets but before the 2-D and 3-D corrections), we can see more coherent storage variations through the presence of annual drawdownrefill cycles -particularly for Roum and Sahwat al-Khadr. This means that the 2-D enhancement and 3-D reconstruction steps have improved the detection of water and helped to overcome the low Landsat repeat cycle of 16 days.
Reservoirs managed by Jordan are used to validate the method by comparing our remote sensing estimates of elevation and storage with monthly in situ measurements con- Table 1. Parameters and results of the elevation-area regression. PR and LPR stand for polynomial regression and local polynomial regression respectively. R 2 is the coefficient of determination between the corrected elevation H c and the elevation H for pixels taken into account by the regression (red dots in Fig. 8).

Location
Reservoir DEM Visible area Regression  ducted by the Jordan Valley Authority (JVA). With the exception of the King Talal dam, our results seem to follow quite accurately the historical records (see Fig. 11). For some reservoirs (i.e. Karama and Tanour), the method seems to have difficulties in predicting the highest storages. Indeed, if the number of high-elevation pixels is small, the uncertainty in their corrected elevation (and thus the filling curve) can potentially affect the estimate of the maximum storage. This may be a limitation of the method. In addition, we can note that elevation H and volume V may vary a lot from month to month: up to 10 m or 15 hm 3 -i.e. 50 % of the maximal storage -for instance for the Mujib reservoir. Because no information is available regarding the data collection date, some of the differences between our estimates and measured data might then come from this lack of metadata.
With regard to the King Talal reservoir, we can see large errors in storage estimates (see Fig. 11). But they could have been expected at the end of the elevation-area relationship establishment step: the assumptions that were made to define H c were maybe not justified in this case. Indeed, 80 % of the reservoir maximal area was covered with water when the SRTM satellite passed over the dam, and the R 2 is only 0.29 for the regression applied to the remaining visible pixels (see Table 1). A small visible surface area does not necessar-Ja n -9 9 Ja n -0 0 Ja n -0 1 Ja n -0 2 Ja n -0 3 Ja n -0 4 Ja n -0 5 Ja n -0 6 Ja n -0 7 Ja n -0 8 Ja n -0 9 Ja n -1 0 Ja n -1 1 Ja n -1 2 Ja n -1 3 Ja n -1 4 Ja n -1 5 Ja n -1 6

Roum
Ja n -9 9 Ja n -0 0 Ja n -0 1 Ja n -0 2 Ja n -0 3 Ja n -0 4 Ja n -0 5 Ja n -0 6 Ja n -0 7 Ja n -0 8 Ja n -0 9 Ja n -1 0 Ja n -1 1 Ja n -1 2 Ja n -1 3 Ja n -1 4 Ja n -1 5 Ja n - ily lead to a low-quality elevation-area relationship -see the good estimates for the Kafrein reservoir, while 70 % of its maximal area was hidden when the SRTM satellite passed over it -but it certainly is a sign that results might be biased. Errors in the estimation of elevation and storage are evaluated in terms of the coefficient of determination (R 2 , Eq. 2) and the normalized root-mean-square error (NRMSE, Eq. 3): where Cov(RS, Hist) is the covariance between remote sensing (RS) estimates and JVA historical measurements, σ 2 the variance, and N the number of RS estimates during the period in which JVA measured storage or elevation. Results are presented in Table 2. The coefficient of determination for storage ranges from 0.69 to 0.84. These high values confirm an important correlation and the similar variation trends that can be seen between the method's estimates and JVA records (see Fig. 11). A few high NRMSE values for both V and H c though indicate that there is still some uncertainty with regard to the estimation of their absolute value at a given In order to better evaluate the proposed method compared to a basic fixed NDVI and near-infrared thresholds water area detection, we consider the results presented in Table 3:  ter bodies that is detected with the employment of a NDVIbased dynamic threshold for MNDWI is larger than 30 % for all Jordanian reservoirs, and can reach more than 50 % for the Tanour and Wala reservoirs. Similarly, the average additional part obtained through the 3-D reconstruction is larger than 3.9 % (Karama reservoir), and goes beyond 16 % for the more recent reservoirs Tanour, Wala, and Mujib, whose construction ended after 2002 -proportionally, more Landsat 7 images affected by "N/A" stripes were then used for them Table 3. Initial Fmask classification inside the final water areas ("Other" refers to clouds, cloud shadows, and snow), and stages' percentage changes that led to the classification as water ("2-D" for the 2-D classification enhancement, and "3-D" for the 3-D reconstruction). "N/A" means not available.

Reservoir
Fmask classification (%) Changes (%) than for older dams. In light of these large shares of hidden or undetected water areas, corrections were obviously essential to consistently monitor reservoir elevation and storage.

Sensitivity analysis
The two algorithms used to improve the estimation of reservoir area rely on one empirical threshold each: the classification enhancement is performed through the definition of a MNDWI percentile threshold (X MNDWI ) to build a mask dynamically adapted to each Landsat image, and the reconstruction is achieved with the choice of a percentile for H c values (X H c ), which is set to avoid water area overestimation.
The sensitivity of the whole method to these two parameters is tested in terms of the above defined indices: R 2 and NRMSE, which are averaged for all reservoirs in Jordan (King Talal excluded). The sensitivity analysis is conducted by making the percentile thresholds vary between 90 and 100 with a step of 1. Results are presented for both storage and elevation in Fig. 12.
The coefficient of determination reaches its maximum with X MNDWI values around 98 for storage and 93 or 95 for elevation. However, R 2 does not quantitatively assess the accuracy of the method, and as it remains fairly high (above 0.78 for storage, or 0.74 for elevation) in the whole 90-100 range for both parameters, it is not considered to select the threshold percentiles.
NRMSE decreases as X MNDWI and X H c increase. The method does not detect an excessive number of water pixels -see the retrieval over the large missing parts detailed in Table 3 -but rather obtains estimates for elevation and storage closer to the measurements conducted by JVA. Two conclusions can be drawn from these observations. First, the success in the 2-D enhancement means that there is enough information in Landsat bands to better detect water areas. And second, the precision of the 3-D reconstruction implies that enough Landsat images are available for most reservoirs to statistically improve the detection of water bodies when clouds or "N/A" stripes hide land. However, the H c upper limit for the reconstruction has a decreasing impact on NRMSE as the MNDWI threshold increases: fewer missing water pixels lead to fewer pixels available to "fill with water" during the subsequent reconstruction. For lower X MNDWI values, the decrease in NRMSE for high X H c values is clearer. It shows that the reconstruction algorithm addresses well the Fmask and dynamic threshold method limitations, even if it cannot entirely balance the errors. The fact that NRMSE is on average lower for maximal X H c values than for maximal X MNDWI values could however be expected as the reconstruction relies on the reservoir's elevation-area relationship, which is established from the elevation of the pixels that are detected in the first stage.
In the end, the percentiles that we chose in this study -respectively 100 and 98 for X MNDWI and X H c -enable a tradeoff between the options of lowering NRMSE for both storage and elevation. Also, with these percentiles, R 2 is still sufficiently high to ensure a strong correlation. It should be noted that the thresholds do not depend on the location, nor the date on which the Landsat images were taken. Therefore, the sensitivity analysis reveals that the highest values for both X MNDWI and X H c could be used to apply the method to any other region in the world.

Conclusions
Although information on small reservoirs' storage is crucial for water management in a river basin, it is most of the time not freely available in remote, ungauged, or conflicttorn areas. A remote sensing method is proposed in this paper to monitor small water bodies (capacities and water surface areas starting from 1 hm 3 and 0.5 km 2 respectively). The method is based only on DEMs for elevation, and Landsat satellite images for water surface area, to quantitatively estimate storage variations.
The method is applied to reservoirs in Syria and the Israelcontrolled Golan Heights in the Yarmouk River basin, and an uncertainty analysis is conducted with neighbouring Jordan reservoirs for which in situ measurements are available. The NRMSE is relatively low compared to the size of the studied reservoirs and the precision of the datasets that are used.
The main limitation of the approach is its inapplicability to reservoirs that were significantly "covered" with water when the DEM satellites passed over them. Fortunately, this infor-mation can be readily obtained from remote sensing data and used to determine the applicability of the method a priori.
For all "uncovered" small or large reservoirs, the uses of datasets available over the whole continental surface make this method a valuable complement to satellite altimetry to increase the number of reservoirs observable anywhere in the world. The thresholds dynamically defined for both the 2-D enhancement and the 3-D reconstruction also make the method potentially suitable to monitor reservoirs in truly inaccessible areas. Moreover, the precision of the filling curve and the 3-D reconstruction algorithm increases with the number of pixels taken into account. Applying the method to large "uncovered" reservoirs could then potentially lead to better results. The sensitivity analysis also shows that choosing maximum thresholds in both water area retrieval stages gives the best reservoir storage estimates.
The recent two Sentinel-2 satellites also promise a great improvement of the method for post-2015 studies, as they produce images with spatial and temporal resolutions finer than Landsat (up to 10 m and 5 days). Combining Landsat and Sentinel-2 satellites would then reduce the already short revisit cycle of water bodies and would provide near realtime updates on water body storage.
Furthermore, the algorithms used in the method automatically detect water bodies, define the water area retrieval parameters, build filling curves, and assess reservoir storage. Such algorithmic tools can then be dynamically updated with each new image from Sentinel-2 and Landsat satellites, giving the model the potential to learn by itself and correct previous storage estimates while generating new ones. This approach is somehow comparable to the continuous change detection proposed by Zhu and Woodcock (2014).
Code and data availability. The source code of the algorithm is available at https://drive.google.com/open?id= 0B54cRCK06X-9RUdqaTZmWkdsOXc. Underlying research data are not publicly accessible. Remote sensing data access for this study is explained in Sect. 2. JVA data records are not publicly available.
Competing interests. The authors declare that they have no conflict of interest.