A big challenge in constructing global hydrological models is the inclusion of anthropogenic impacts on the water cycle, such as caused by dams. Dam operators make decisions based on experience and often uncertain information. In this study information generally available to dam operators, like inflow into the reservoir and storage levels, was used to derive fuzzy rules describing the way a reservoir is operated. Using an artificial neural network capable of mimicking fuzzy logic, called the ANFIS adaptive-network-based fuzzy inference system, fuzzy rules linking inflow and storage with reservoir release were determined for 11 reservoirs in central Asia, the US and Vietnam. By varying the input variables of the neural network, different configurations of fuzzy rules were created and tested. It was found that the release from relatively large reservoirs was significantly dependent on information concerning recent storage levels, while release from smaller reservoirs was more dependent on reservoir inflows. Subsequently, the derived rules were used to simulate reservoir release with an average Nash–Sutcliffe coefficient of 0.81.

An example showing the four steps of fuzzy reasoning.

The five layers of ANFIS for a network with two input variables and two membership functions per variable. Note that square nodes contain trainable parameters while circular nodes are fixed.

Over the last decades, major advances have been made regarding global
data availability. Low-resolution hydrologic states from remote
sensing and high-resolution parameter fields have become
available. Combined with the improvements in computational
capabilities and data storage, these advances have provided
hydrologists the opportunity to pursue the development of high-resolution
global hydrological models (GHMs) like, among others,
PCRGLOB-WB

As indicated by

Actual reservoir operation is an imprecise and vague undertaking,
since operators always face uncertainties about inflows, evaporation,
seepage losses and various water demands which need to be met. They
often base their decisions on experience and available information,
like reservoir storage and the previous periods inflow

Fuzzy logic, as introduced by

In this study, historical inflows, storage levels and releases are used to derive fuzzy rules that describe the release decisions of dam operators using artificial neural networks (ANNs). These rules can be used as the basis for a macro-scale reservoir algorithm. Validity of the derived rules is tested by using them to simulate the reservoirs release and comparing these releases with the actual releases. In order to evaluate if the rules are capable of improving upon the way reservoirs are currently modelled in GHMs, a quantitative comparison is made with a simulation-based reservoir algorithm. Additionally, the accuracies of simulated releases resulting from different configurations of the fuzzy rules are compared mutually in order to link the results to the impoundment ratios of the dams.

Many macro-scale algorithms, which cannot rely on detailed information
on reservoir operation policies used in small-scale models, have been
proposed in order to take reservoir release and storage in GHMs into
account

Recently, more data-driven simulation-based schemes have been proposed by

As a result of limitations of macro-scale algorithms, which are not
yet capable of fully mimicking the dynamics of regulated flows,
simulations with GHMs are still highly uncertain

Furthermore, the discussed simulation-based algorithms use reservoir
characteristics from databases like the aforementioned GRAND

Just like the aforementioned data-driven simulation-based schemes, the proposed method requires time series of observed data to calibrate, or train, the algorithm. Although this training can be computationally expensive, afterwards the simulated releases can be acquired easily. Moreover, the temporal resolution of the proposed method is flexible and dependent on the resolution of the provided time series.

A membership function with an indication of the physical meaning of its parameters.

Overview of all considered reservoirs; data from

To model a process, fuzzy logic uses rules of the form “IF

Fuzzy reasoning is the process in which fuzzy rules are used to
transform input into output and consists of four steps. (1) Firstly,
the input variables are fuzzified, (2) next the firing strength of
each rule is determined. (3) Thirdly, the consequence of each rule is
resolved and (4) finally the consequences are aggregated. In
Fig.

A big drawback of fuzzy logic is the need to assess fuzzy rules. Transforming human knowledge or behaviour into a representative set of rules manually is a complicated task. As the amount of input variables and membership functions increases, the total number of required rules quickly becomes very large.

ANFIS is a specific ANN that can deal with linguistic expressions used in
fuzzy logic. The network structure is capable of adjusting the shape of the
membership functions and of the consequence parameters that form the fuzzy
rules by minimizing the difference between output and provided targets. ANFIS
is a feed-forward neural network with five layers as seen in
Fig.

In the forward pass, the output of each layer for a given input is
calculated and the consequence parameters are adjusted with the LSE,
before the final output is generated. Each layer is discussed
individually below.

The first layer is called the membership layer, the input is put
through a membership function to determine its membership value:

The circular nodes in this layer are marked with

In the third layer, the firing strengths of all nodes are
normalized with respect to each other:

The fourth layer is called the implication layer. The
consequence of each rule is calculated as a linear combination of
the input variables, as described by

In the fifth layer all the incoming signals are summed to
compute the final output:

Before the final output is calculated, the consequence parameters need to be
updated. The final output can also be written as the following:

If

Equation (

This is an overdetermined problem which generally does not have an
exact solution. Therefore, a least square estimate is sought with
sequential formulas

So during every forward pass, the consequence parameters,

During the backward pass, the error associated with the sample under
consideration is propagated backward through the network in order to
acquire the gradient of the error with respect to each individual
premise parameter. So,

The derivative in Eqs. (

The first term on the right side of Eq. (

The final term of Eq. (16) is derived from Eq. (4) as

After the update of the premise parameters, a next sample is provided to the network and the forward pass starts again. When all samples have been passed trough the network once, one epoch has passed and another epoch is started until the solution converges.

In summary, first the input part of a sample is used to activate the
network and, together with the target of the same sample, the
consequence parameters are updated using a LSE. Next, the output error
is calculated with Eq. (

Diagrams showing different sample set-ups, The black dots represent input parameters, while the blue dots show the target.

In order to determine whether ANFIS is capable of deriving a set of
useful fuzzy rules that captures the characteristics of how a dam is
operated, 11 reservoirs for which in situ measurements were readily
available have been investigated. Table

To train a network, the first 60 % of the dataset of each dam is used to train the parameters and the next 20 % is used to validate the solution. Finally, the remaining 20 % is used to test the solution. During an epoch, all samples in a training set are passed forward and backward through the network once. The training is stopped when for at least five consecutive epochs, the mean square error (MSE) of the simulation with respect to the validation set has increased, after which the configuration of the network with the lowest validation MSE is chosen.

At this point, the training set has been used to update the network parameters and the validation set has been used to select the state of the network for which the results matched best with data not present in the training set. Since the validation set has been used to select the best configuration of the network, a third and independent set is used to test the performance of the network. This third set is the test set.

Initially, two variables will be used as input to train the network,
storage (

A somewhat more complicated sample is the following:

Additionally, since seasonality plays an important role in the
operation of reservoirs, a third input parameter (time of the year, ToY) will also be
considered. For example,

Finally, in order to use back propagation, initial values for the
parameters of the membership layer need to be set. These are set such
that for any input, the sum of the membership functions equals 1; an
example for an input parameter with two membership functions can be
seen in Fig.

Example showing the initial membership functions for a variable consisting of two membership functions.

In order to compare simulated releases with those made by an existing
macro-scale algorithm, the data used to train the networks has also
been applied to the algorithm proposed by

The monthly release for the remaining reservoirs is calculated as

To prevent reservoirs from overflowing, excess storage left after water for the current month has been released is released additionally.

The test MSEs (

Simulating reservoir releases with a simple set-up as in
Eq. (

Because the membership functions of Andijan and Charvak show different
effects that the training can have on the membership functions and their
convergence curves show two extremes (very fast and very slow
convergence respectively), they are presented more in-depth below. The
inputs,

For Andijan, the validation set contains two very dry years with low
inflows and low storage levels, while the peak flows in the rest of
the dataset are of similar magnitude (see
Fig.

The storage level of Charvak reservoir reaches its maximum nearly
every year, while the inflow during several years is not more than
50 % of the inflow during wetter years. Nevertheless, even during
some of these drier years, it appears the reservoir is able to fill
completely (see Fig.

The shape of the four membership functions of Andijan differ from
their initial shapes (see Fig.

The membership functions of Charvak for reservoir inflow have moved
slightly to the left and the steepness of the bell shapes has
increased for the low inflow membership function and decreased for the
other. There is a clear distinction between consequences for inflows
below and above 0.4 (see Fig.

The membership functions for other reservoirs have a similar shape as
for Andijan and Charvak. Occasionally, multiple membership functions
dominate over the same part of the input domain, resulting in the
simultaneous activation of fuzzy rules. Sometimes both membership
functions become near zero for a part of the domain, like the storage
membership functions of Charvak, resulting in simultaneous activation
of two rules. The rule for low inflow and storage is most frequently
activated for the majority of reservoirs, followed by the rule for a
low inflow and a high storage. The rules with regards to high inflows
are used less frequently (see Fig.

The consequence parameters of rules associated with a low inflow and
storage, and a low inflow and high storage, are quite similar across
the different reservoirs (see Fig.

The test set for

The MSEs and NS coefficients for Bull Lake and Kayrakkum are better
than those of Chardara (see Table

Seminoe has the largest dataset and shows a similar problem as Bull
Lake. The network seems incapable of dealing with the very low flows
and the high peak flows, while the medium peaks are simulated quite
accurately (see Fig.

Finally, Tyuyamuyun performs very well, with a very accurate timing
and magnitude of peak and low flows (see Fig.

Graphs showing the

Graphs showing the

Results of Andijan Dam.

Results of Charvak Dam.

Bar graphs indicating how many of the rules available to a network
are used for

The MSEs for the networks of the 11 reservoirs trained with a sample
set-up as in Eq. (

By using a time range of two and no prediction horizon, as in
Eq. (

Adding more membership functions or input variables to the
configuration of the network increases the number of fuzzy rules. It
is clear that increasing the time range over which

Figure

Like Fig.

In Fig.

The consequence parameters of all reservoirs, separated per
rule in a box plot. The parameter “

Simulated and observed reservoir releases for nine reservoirs when simulated with a time range of one or two.

When adding a prediction horizon of 1 month to the network, the MSEs
range between

Matrix showing the average test MSEs of the 11 considered
reservoirs as the number of input variables and membership functions
increase.

Matrix showing the significance (one-sided Student's

Matrix showing the significance (one-sided Student's

A simple configuration of ANFIS, with a time range of one and no
prediction horizon, is capable of determining fuzzy rules that are
able to describe the release regime for most reservoirs with MSEs as
low as

The classifications made by the membership functions differ per reservoir. These differences can be explained by reservoir characteristics, such as maximum storage capacity, dead storage capacity, impoundment ratio or reservoir purpose. For example, a filling level of 60 % at the end of a dry season in a reservoir used for irrigation will be interpreted differently from a similar filling level in a reservoir mainly used for hydropower.

Besides the variety of physical properties of reservoirs causing
differences in how input parameters are classified, two phenomena that
are intrinsic to ANFIS seem to be especially relevant. As membership
functions move either left or right, it is possible that a
membership function becomes zero in the entire domain, rendering its
associated rules obsolete. That is, of the four rules incorporated in
the network, only two were left to be used. When this occurs for all
input variables, only one rule is left to be used, as is the case for
Kayrakkum (see Fig.

Secondly, the opposite can happen too. Instead of a membership
function moving away from the domain and giving hegemony to the other
membership function, two membership functions can also move towards
each other. When either the centres of the membership functions,
defined by

With simple set-ups resulting in a network with four fuzzy rules,
these two phenomena occur very infrequently, in most cases all four
available rules are used (see Fig.

The range of the consequence parameters (see Eq.

When the complexity of the network is increased, it appears that the
aforementioned phenomena of membership functions turning either zero
or one over the entire input domain occur more often. A network
trained with a sample set-up as in Eq. (

The explanation for this increase in performance regardless of the
decrease in rules used is twofold. The most obvious cause lies in the
formulation of the consequence of a fuzzy rule (see
Eq.

Additionally, there is simply more information available. Although a
four-rule network in this study can determine the release from a
reservoir based on the current storage and inflow, more complex
networks can also consider the storages and inflows further back in
time. Fig.

The impoundment ratios, defined as the yearly inflow
divided by the total storage capacity, of the reservoirs in the
GRAND

This greater value of storage information can be explained by
considering the reservoirs mean annual inflow divided by the storage
capacity, the impoundment ratio. With a value of 1.04, Toktogul
reservoir has the lowest impoundment ratio of the 11 reservoirs (see
Table

The 11 reservoirs all have ratios greater than 1, with an average
of 4.3. By splitting the considered reservoirs into two groups of
equal size, using the median of the 11 impoundment ratios (i.e. 3.97),
and testing the significance of increasing the complexity and addition
of more information to the network again for both groups, this can
indeed be observed (see Fig.

The distribution of the impoundment ratios of the reservoirs in the
GRAND database

For the case of adding a ToY parameter, see Fig.

Implementation of ANFIS-derived fuzzy rules into GHMs presents a challenge different from the ones posed by the more traditional simulation- and optimization-based algorithms, mainly because of the need to acquire relatively extensive data on inflows, storage levels and release flows for each reservoir.

Nevertheless, the advent and expected development of remote sensing
(RS) techniques to monitor water resources on a global scale is cause for optimism and the proposed methodology provides opportunities to
take full advantage of these developments. As shown by the Joint
Research Centres Global Surface Water dataset

Subsequently, the inflows into a reservoir are needed to train a
network.

After determining a time series of inflows and storages, the release can be determined by applying a mass balance to the reservoir. These three steps of determining storage changes, inflows and releases could then be applied to reservoirs that are located furthest upstream in a basin first, working downstream from there. This way, using the trained networks of the upstream reservoirs, the inflow into the next reservoir could already include the anthropogenic effect on stream flow of the upstream reservoir, mitigating the accumulation of errors between cascading reservoirs along a major river.

Alternatively, the system-scale effects of cascading reservoirs can be dealt with by implementing a cluster of reservoirs as a single reservoir, represented by a single set of fuzzy rules. Fuzzy rules as described can represent these systems by defining the storage term as the sum of the individual reservoirs storages, the inflow as the inflow into the most upstream reservoir, and the release as the release from the further downstream reservoir.

Once the data required for the training of a network has been acquired, the actual training is a straightforward and easily automated process, resulting in a calibrated network that can in a computationally cheap way quantify release decisions based on the inputs.

Although all the variables associated with the fuzzy rules have a
physical basis, it is possible that a trained network releases more
water than is actually stored in its reservoir because the network
does not keep track of a mass balance. Since simulated peak releases
do not deviate much from the actual releases, see
Fig.

Just like the more traditional generic operating rules, the proposed method will suffer from errors in the reservoirs inflows generated by the host model, errors due to the interdependence of cascading reservoirs and errors attributed to the non-stationarity of rule curves. As mentioned before, the errors in inflow are expected to be mitigated by the fuzzification, while the errors due to cascading could be restrained by incorporating the upstream anthropogenic effects of dams on inflows in the training set.

Regarding the non-stationarity of rule curves,

However, the inter-annual variability of flows also needs to be reflected in the time series. Choosing a too short time frame in order to avoid issues with the non-stationarity of rule curves or applying a too strong forgetting factor can obstruct this. Possibly, the return period of hydrological droughts can be a good point of reference.

It has been shown that by using fuzzy logic and ANFIS, operational rules of existing reservoirs can be derived without much prior knowledge of the reservoir. Their validity was tested by comparing actual and simulated releases with each other and by comparing the performance of the proposed method with a simulation-based algorithm. The rules can be incorporated into GHMs or more regional models struggling with reservoir outflow forecasting. After a network for a specific reservoir has been trained, the inflow calculated by the hydrological model can be combined with the release and an initial storage in order to calculate the storage for the next time step using a mass balance. Subsequently, the release can be predicted time steps ahead using the inflow and storage.

Although adding the ToY to the mix of input parameters does not seem to result in significant improvements in release prediction, adding other input parameters might. Many macro-scale reservoir modelling algorithms use downstream water demands as input, which is an important factor in reservoir operating decisions. Adding this parameter would allow the fuzzy rules to describe operating decisions more accurately, especially for irrigation reservoirs.

More research on the optimal set-up of fuzzy rules per reservoir type
is needed in order to get a better understanding of how the physical
properties of a reservoir affect the results. It has been shown that
set-ups with information on storage in previous months significantly
improve results for reservoirs with small impoundment ratios. Similar
tests should be done for different types of reservoirs, by splitting
the reservoirs into groups based on their primary purpose, uncertainty
of the available hydrological information or the local climate; this, however,
requires a larger set of reservoirs. As shown by

Besides the extension of the neural network with new or extra
parameters, the membership functions themselves also show room for
improvement. In some cases, the shapes of the trained membership
functions lead to the activation of multiple fuzzy rules for a single
sample. This is undesirable because it greatly undermines the basic
principle of fuzzy logic. Input is translated into linguistic labels
and processed by fuzzy rules which represent human behaviour and
knowledge. When samples are processed by multiple rules, the logical
interpretation of a network becomes much
harder.

A drawback of applying the proposed method, compared to other
macro-scale reservoir modelling algorithms, is the need to acquire
in situ time series, which is often problematic as a result of
multilateral mistrust

Data used can be found at

In Fig.

The storage has been fuzzified, it is assigned the membership function “high” and its associated membership value is 0.8. Similarly, the membership values for a “medium” and “high” inflow can be determined. They are 0.6 and 0.4 respectively.

Now the firing strengths, giving an indication of the relative
importance of each rule, need to be determined. This can be done in
many ways. In this example, the membership values are multiplied with
each other. For the first rule, the “high” storage has a membership
value of 0.8, while the “medium” inflow has a membership value
of 0.6. The firing strength of this rule is W1 = 0.48. In the same
manner, it follows that the firing strength of the second rule is

It is possible to describe the consequences of rules in many ways; in
this example and study, they are linear combinations of the input
variables as described by

Finally, the consequences can be aggregated by using a weighted
average to acquire the release:

The authors declare that they have no conflict of interest. Edited by: Albrecht Weerts Reviewed by: two anonymous referees