Citizen science can provide spatially distributed data over large areas, including hydrological data. Stream levels are easier to measure than streamflow and are likely also observed more easily by citizen scientists than streamflow. However, the challenge with crowd based stream level data is that observations are taken at irregular time intervals and with a limited vertical resolution. The latter is especially the case at sites where no staff gauge is available and relative stream levels are observed based on (in)visible features in the stream, such as rocks. In order to assess the potential value of crowd based stream level observations for model calibration, we pretended that stream level observations were available at a limited vertical resolution by transferring streamflow data to stream level classes. A bucket-type hydrological model was calibrated with these hypothetical stream level class data and subsequently evaluated on the observed streamflow records. Our results indicate that stream level data can result in good streamflow simulations, even with a reduced vertical resolution of the observations. Time series of only two stream level classes, e.g. above or below a rock in the stream, were already informative, especially when the class boundary was chosen towards the highest stream levels. There was some added value in using up to five stream level classes, but there was hardly any improvement in model performance when using more level classes. These results are encouraging for citizen science projects and provide a basis for designing observation systems that collect data that are as informative as possible for deriving model based streamflow time series for previously ungauged basins.
Streamflow data are crucial for water resource management decisions and the calibration of hydrological models. However, streamflow data are only available for a number of sites and gauging stations are not always installed at representative locations. There is, for instance, a lack of streamflow gauges in small headwater streams (Kirchner, 2006) and in developing countries (Mulligan, 2013). Although technological developments provide the possibility to expand the measurement network, the reality is that, due to budget cuts, observation networks often shrink (Kundzewicz, 1997) rather than expand. Remote sensing images can be used to estimate stream levels or streamflow, particularly for wide lowland rivers (Smith, 1997; Milewski et al., 2009; Pavelsky, 2014; Van Dijk et al., 2016), but estimation of streamflow from satellite images is likely to remain problematic for small headwater streams.
Stream level data are easier to obtain than streamflow data because
they do not require any information on the rating curve. Seibert
and Vis (2016) tested whether stream level data can be used to constrain
a simple hydrological model. The results for
Even though the price of water level recorders has significantly
gone down in recent years and their datalogging capacity has
increased, it is not feasible to install a water level recorder in
every ungauged catchment. It is, therefore, useful to also consider
the use of other approaches to obtain water level data. Citizen
science is now more frequently used to obtain environmental data
over large areas (Savan et al., 2003; Bonney et al., 2009; Graham
et al., 2011; Fohringer et al., 2015; Huddart et al., 2016; Wiseman
and Bardsley, 2016). Little et al. (2016) gave citizen scientists
water level sounders to measure groundwater levels in private wells
and found that these measurements provided valuable data on
groundwater levels across a large area in Alberta, Canada, and that
the measurements were relatively accurate; the root mean square
error between citizen scientist observed water levels and pressure
transducer based water levels ranged between 3 and
11
Information from time lapse cameras or webcams can also be used to obtain information on stream water level classes. Pixel classification or image recognition to determine whether the water level is above or below a certain point can be used to determine the relative stream water level, even if no other information about the stream or the cross section is available. Several studies have shown that cameras can be used for accurate streamflow estimation (Muste et al., 2011; Tsubaki et al., 2011; Hilgersom and Luxemburg, 2012; Royem et al., 2012; Stumpf et al., 2016), but these studies used dedicated cameras that focused directly on the stream and often required information about the stream channel cross section. While promising, it is unlikely that many of the ungauged streams will be equipped with these systems. However, streams are often included in the pictures of existing webcams or time lapse cameras that were installed for other reasons, e.g. to show the snow conditions on a ski slope or to highlight the view from a hotel. The information from these webcams can be used to obtain information about the relative changes in the stream level or width, but this information might not be very precise because of the sub-optimal angle of the camera. It is, thus, more likely that these images can be used to obtain information about the relative stream level or stream width (class), rather than the actual water level. Remotely sensed data can also be used to rank stream levels or stream width. These data, however, as promising as they are, have limitations regarding their accuracy and resolution (and will likely have them for the foreseeable future). Thus, also for these measurements time series of level (or width) classes are more realistic than high-resolution time series of actual water levels.
For crowd based (or citizen science) observations, but also for data from webcams or satellites, the resolution of the stream level data will be significantly poorer than for data obtained by a dedicated water level sensor. To determine the effect of this loss of information, we tested the usefulness of these new types of stream level class data for constraining a simple bucket-type hydrological model. The aim was to provide a basis for designing citizen science projects that collect data that are as informative as possible and that can be used to derive model based streamflow time series. We pretended that stream level class observations were available continuously (daily), but only at a limited vertical resolution by transferring the streamflow data to stream level classes. We then tested how the number of stream level classes (i.e. the resolution) influenced the information content of the data with regard to constraining the model. Furthermore, we studied the effect of different locations of the class boundaries on model performance.
Time series of the observed streamflow (blue) for the first year of
simulation (October 1982–September 1983) for catchment 002011460 (Back Creek
near Sunrise, VA), a medium sized catchment (235
This study largely followed the methodology of Seibert and Vis
(2016), who evaluated the value of water level time series for
model calibration for almost 600 catchments in the contiguous US
based on continuous, high-resolution stream level data. In this
study, the model was calibrated based on stream level class data
for a subset of these catchments. The 100 catchments used in this
study were chosen randomly from the catchments used by Seibert and
Vis (2016) and are spread across the contiguous US. The
hydrometric data for these 1 to 12 584
In order to determine how many stream level classes are needed for
model calibration, the daily average streamflow data were
converted into time series of
The HBV (Hydrologiska Byråns Vattenbalansavdelning) model
(Bergström, 1992; Lindström et al., 1997) was used in the
HBV-light software implementation (Seibert and Vis, 2012). The HBV
model is a frequently used bucket-type model and consists of
different routines representing snow, soil, groundwater and stream
routing processes. The HBV model, as it was applied here, has 14
free parameters, which are usually found by calibration or
regionalization. Elevation bands of 200
For each catchment the HBV model was calibrated for the period
1 October 1982–30 September 1996 using a genetic optimization
algorithm (Seibert, 2000). The data from the
1 January 1980–30 September 1982 period were used for warming up
the model. For model calibration, we maximized the Spearman rank
correlation coefficient (
For each catchment, we used 100 independent model calibration trials, resulting in 100 parameter sets (one for each model calibration). For each of these (100) calibration trials, a total of 3500 model runs were done to find the optimum parameter set with the genetic algorithm. The 100 calibration parameter sets for each catchment were validated by comparing the simulated streamflow to the observed streamflow data using the model efficiency (Nash and Sutcliffe, 1970). For each catchment, the median value of the model efficiency for the 100 parameter sets was used to represent the performance of the model for that catchment.
Box plots of the difference in the median model efficiency and the
upper benchmark (
Different benchmarks were used to assess the performance of the models calibrated with the stream level class data: an upper benchmark that represents how good the model simulation would be if continuous streamflow data were available, and two lower benchmarks that represent a model simulation in the absence of any streamflow or stream level data.
For the upper benchmark (
In addition, the simulations based on the stream level class data
were compared to the simulations based on calibrations derived
from high-resolution stream level data (
For the first lower benchmark (
Median, maximum and minimum model efficiencies for the 100
catchments for model calibrations using different types of data and the two
lower benchmarks. Note that the difference in the median model efficiency for
the model calibrations with all streamflow data (
Difference in the median model validation result (model efficiency)
for the models calibrated using two water level classes
(
Not surprisingly, the model efficiency was lower for the models
calibrated with the stream level class data than for the models
calibrated with the streamflow data (Fig. 2 and Table 1). However,
the differences between the models calibrated with the
high-resolution stream level data and the models calibrated with
stream level class data were relatively small, as long as at least
five stream level classes were used for model calibration (compare
the results for
A more detailed analysis of the increase in model performance with an increasing number of water level classes suggests that for the wet catchments model performance increased only slightly when increasing the number of water level classes from two to five, but that for some of the dry catchments model performance increased significantly when using more than two water level classes (Fig. 3). In general, the increase in model performance with an increasing number of stream level classes was largest for the catchments for which the difference in model performance between the upper and lower benchmarks was largest (Fig. 3).
Difference in model validation results (model efficiency) for the
models calibrated with data from two (
Difference in median model validation results relative to the upper
benchmark (
Comparison of the performance of the models calibrated with stream level class data to the upper benchmark suggests that especially for the wet catchments the differences between traditional model calibration based on continuous streamflow data and the calibration based on the stream level class data were small (Fig. 4a and b). For the dry catchments, model calibration based on stream level class data led to larger errors in the simulated streamflow (Fig. 4a and b).
Comparison of the performance of the models calibrated with the
stream level class data to the lower benchmarks suggests that the
inclusion of stream level class data led to a huge improvement in
model performance for some of the dry catchments (Fig. 4c and
d). However, the differences in the median improvement in model efficiency when using
the data for two stream level classes compared to the lower
benchmark (
In order to determine the optimal location of the class
boundaries, we systematically varied them for the cases with two
and three stream level classes. The results show that model
performance generally improved when at least one class boundary
was located at high stream levels. For example, for the case with
two classes, the median model performance for the 100 catchments
was highest when the class boundary was chosen, so that the stream
level was in the lower class for 94 % of the time and in the
upper class for 6 % of the time. The smallest median
difference between the model performance for two classes and the
upper benchmark occurred at the class boundary definition of
93–7 % (Fig. 5a). The variability in model performance also
decreased when the boundary was chosen at a higher stream level,
so that for fewer catchments the difference between the median
model performance (i.e. the median performance of the 100 calibration
parameter sets) and the upper benchmark was larger than 0.20
(
Median difference in model efficiency for models calibrated with
data for three water level classes and the upper benchmark (
The results of this study show that five stream level classes are as informative for model calibration as stream level data with a very high vertical resolution. This is good news for citizen science projects or webcam based analyses, as it is much easier to determine the stream level class when there are only a few classes than when there are many classes. The small difference between the performance of the models calibrated on data for a few stream level classes and the upper benchmark (Fig. 4a and b) suggests that the stream level class data from citizen science approaches or webcam images are most useful for model calibration for wet catchments and that stream level class data for these catchments can be used in combination with a model to obtain time series of streamflow. This is encouraging, as it is likely much harder for citizen scientists to estimate the streamflow than the stream level class, and this way the streamflow data that are needed for water management or flood or drought forecasting can be obtained from the stream level class data.
On the other hand, the large improvement of the models calibrated with stream level class data compared to the lower benchmark for some of the dry catchments (Fig. 4c and d) suggests that stream level class data may be especially useful in improving model performance in some dry catchments when no other streamflow or stream level data are available. For these catchments, the model performance of the lower benchmark (i.e. based on the random parameter sets) was very poor, while for the wet catchments the model performance of the lower benchmarks was already reasonably good (see the colour coding in Figs. 3 and 4). Thus the biggest gain in adding stream level class data was seen for some of the dry catchments, even though the absolute model performance was much poorer than for models calibrated on streamflow data. Seibert and Vis (2016) showed that model calibration based on high-resolution stream level data worked best for wet catchments, and that for dry catchments, additional data on the water balance were needed. Using such additional information may also improve model performance based on stream level class data for the dry catchments. What kind of additional information might be most useful in combination with stream level class data remains to be explored.
In practice, the boundaries between the different water level classes will be chosen based on features in the river or the stream bank that are easy to observe. The results from this study suggest that for most streams the optimal class boundaries should be located at the high flow levels, but not at the very highest flows. This high optimal class boundary is good news for model calibration based on opportunistic webcam images because high flows are usually easier to observe in these images than low flows because it may be difficult to see the water level at low flows when the camera does not focus directly on the stream. Citizen scientists, on the other hand, are perhaps more likely to go out and estimate stream levels during nice weather conditions and low flow periods. However, people also tend to look at rivers when the water level is particularly high. The still relatively long time that the water level is in the highest class (e.g. 6 % of the time or on average 22 days per year for the case with two water level classes for which the median model performance for the 100 catchments was highest) suggests that there is ample time for citizen scientists to observe the water levels during the high water level period. These results thus suggest that citizen science projects should communicate to the participants that measurements during high water levels are important and worth collecting and transmitting.
Box plots of the average number of times per year that the water level switched from one class to another for different class definitions. In the top row the number of catchments for which the number of water level class switches was highest at that class definition. As an example, 80–20 indicates that streamflow was in the lower stream level class for 80 % of the time and in the upper stream level class for 20 % of the time, and that for 26 of the 100 catchments this class boundary definition resulted in most class switches per year.
The reasons that for the majority of the catchments the optimal boundary between the water level classes is located at high stream levels are related to the data, the model and the choice of the model evaluation criterion. The choice of a high water level class boundary helps to avoid the selection of a parameter set that leads to an overly flashy streamflow response because the water level is in the upper water level class for only a limited fraction of time. The information content of the water level class data, and thus its value for hydrological model calibration, is higher when we know that for some events the water level does not cross this boundary and for another set of events it does. If for every event the water level crosses the boundary because it is set at a low level, then it is not possible to distinguish between the responses of different events. Similarly, if the level is set too high, then the water level may cross the class boundary only a very few times so that no distinction can be made for the response of the majority of the events. For the optimal boundary definition for the two classes at 94–6 % of the time, there were on average between 2.2 and 27.2 switches between the two water level classes per year (median: 14.4; 25th and 75th percentiles: 8.0 and 17, respectively; Fig. 7). One could also argue that the water level class data are most informative when the class boundaries are crossed as often as possible in the actual time series. For the majority of the catchments the water level class boundary was most often crossed if it was set so that the water level was in the lower class for 60–80 % of the time (Fig. 7). For only 8 of the 100 catchments was the water level class boundary most frequently crossed if it was set at such a level that it was in the lower class for less than 40 % of the time; for 8 other catchments the water level class boundary was crossed most often if the boundary was defined such that the water level was in the lower class for more than 80 % of the time (Fig. 7).
Wani et al. (2017) used censored data in a formal Bayesian framework to simulate the combined sewer overflow in an urban catchment. Similar to the results for the two water level classes here, they show that binary data (i.e. a water level above or below a threshold) are very effective in reducing the parameter uncertainty in their rainfall–runoff model. They show that the location of the threshold matters and highlight the high information content in crossing the threshold, but also mention that it is difficult to determine the relation between the location of the threshold and the value of the data in reducing the parameter space because it depends on how close the system is to the threshold and how many times the threshold is exceeded.
The optimal location of the water level class boundaries is also
dependent on the model validation criterion that is used. We used
the model efficiency (
Because in real citizen science projects the boundaries will not be chosen based on optimality as discussed above, but will be chosen by citizens based on local conditions, such as identifiable features in the stream, the usefulness of citizen science based water level class data for the simulation of different aspects of the hydrograph will differ. However, the investigation of theoretically optimal class boundaries is still valuable for at least two reasons. First, these results can be used to provide guidance to citizen scientists on how to choose class boundaries, if at all possible. Second, such results can help to decide which citizen science based water level class data might be especially useful for the simulation of a certain aspect of the hydrograph.
A challenge with citizen science based stream level data is that observations are taken at irregular time intervals, with a limited vertical resolution, and may contain errors. In this study, we addressed the issue of the limited vertical resolution by assessing the value of stream level class data. More work is needed on the issue of irregular data to determine the number of observations that are needed and the best times of these observations. Model calibration using weekly stream level class data for the cases with two, three and five water level classes suggests that the deterioration in model performance when weekly data are used instead of daily data is very small. Previous studies on model calibration based on streamflow measurements have also suggested that continuous streamflow data are not needed and that only a few streamflow measurements, particularly during rainfall events, are already useful for constraining hydrological models because many of the streamflow measurements contain redundant information (Seibert and Beven, 2009; Rojas-Serna et al., 2016).
In this study, we pretended to have stream level class data by transforming the streamflow data to stream level classes (Fig. 1). These data, therefore, do not include any errors. In reality, citizen science data may contain errors and misclassifications of the stream level. The effects of data errors on model results need to be tested as well. However, in this respect, it has to be mentioned that several studies have shown that citizen science data can be quite accurate (Cohn, 2008; Lowry and Fienen, 2013; Tye et al., 2017) (but not always, e.g. Savan et al., 2003) and that traditional streamflow data also can have significant uncertainties and may even contain dis-informative information that affects model calibration (McMillan et al., 2010; Beven and Westerberg, 2011).
This study demonstrates that stream level class data can be useful for calibrating hydrological models in otherwise ungauged catchments. The results confirm the conclusions from a previous study (Seibert and Vis, 2016), but more importantly extend the findings towards the use of stream level class data for model calibration to cases where data are available at only a limited vertical resolution, such as in citizen science based observation approaches or webcam image analysis. The results show that a small number of stream level classes contain almost as much information for hydrological model calibration as high-resolution water level data. This is good news for citizen science approaches. We also found that class boundaries at high water levels result in the most informative water level class time series. While in practice the class boundaries are likely determined by the local situation (such as a rock that is covered by water at a certain level), the importance of high levels shows the value of motivating the public to also collect data during high flow situations.
More generally, this study demonstrates how hydrological modelling can be used to evaluate the potential value of certain types of data. Similar approaches can be used to evaluate how much the information content of stream level class data might decrease if observations are made at irregular times or with a certain amount of error. This information is crucial for the optimal design and implementation of citizen science based observation approaches.
The streamflow data used in this study were obtained from Newman et al. (2015). The DAYMET precipitation data (Thornton et al., 2012) were obtained from the Newman et al. (2015) dataset as well. The HBV model software (Seibert and Vis, 2012) is available from the authors on request.
The authors declare that they have no conflict of interest.
We thank Andy Newman and Martyn Clark for making the data used in this study available. The ScienceCloud provided by S3IT at the University of Zurich enabled us to run the computationally intensive simulations on virtual machines. The comments of the two reviewers helped to clarify the text. Edited by: Stefan Uhlenbrook Reviewed by: Wouter Buytaert and Fernando Nardi