Regression analyses
Complete database 0 and reduced databases 1 and 2
Regression analyses based on Eq. () were performed for the database 0
and for each of the reduced databases 1 and 2.
The Ks values in relation to the ΘR values resulted
in low correlations with R2 of 0.43. A more structured
Ks - ΘR relation seems to arise for Ks values smaller
than 150 cm day-1 and ΘR smaller than 9 %. Consequently,
database 0 was reduced to database 2 and R2 of the regression function,
which was computed out of the complete database 0, increased to 0.72. However,
to obtain a function on the basis of database 2, new regression analyses were
conducted leading to R2 of 0.74. This function is shown in the first plot
of Fig. . A similar approach was applied to
evaluate Ks and ΘS; no significant correlations were
obtained. Because of the high correlations found for Ks - ΘR in
database 2, the reduction of the database 0 was also applied for
ΘS. However, only the range of the Ks values was
reduced, leading to database 1. In contrast to Ks - ΘR, no
significant correlations were found between Ks and
ΘS based on the reduced database; see the second plot of
Fig. . Low correlations (R2 = 0.41) were found
for the parameter n when using database 0. An even lower fit (R2 = 0.25)
was obtained when reducing database 0 to database 1 as seen in the third plot
of Fig. . The analysis of Ks
vs. α shows neither correlations for database 0 nor for database 1
(fourth plot of Fig. ).
Generally, in some sections of the scatter diagrams there seem to be more
connections between the Ks values and parameters of the soil
hydraulic functions than in other sections. However, these connections are
very low and too uncertain for hydrological modelling purposes. A reduction
of database 0 to database 1 respectively database 2 had a positive effect on
the regression of ΘR only. Apparently, it is not possible to
obtain four single regression functions, one for each parameter.
Databases 3x, variant A: classification by soil map
Univariate regression analyses
Regression analyses based on Eq. () were performed for each of the
natural texture classes. Concerning ΘR, very high R2
between 0.88 and 0.99 were found for 7 out of the 10 texture classes with an
average R2 of 0.96. The other three classes reached correlations with
R2 lower than 0.5; therefore, these classes were not included in following
analyses and applications. Generally, curves with a R2 lower than 0.5 are
not illustrated in the figures and tables. The regression curves of
ΘR are exponentially decreasing proportional to decreasing
Ks values, which physically makes sense. However, we have to keep
in mind that van Genuchten's ΘR has no clear physical
interpretation and other fitting models for the pF curve actually have no
residual water content (see e.g. ). The high correlations
between ΘR and Ks may have to be considered as a
kind of black box correlation that is valid for the Rosetta-fed van Genuchten
model only.
Concerning ΘS, high R2 between 0.68 and 0.93 were found for
five texture classes with an average R2 of 0.82. The behaviour of these
classes can be divided into two groups. Group one includes Lu and Ls, group
two includes Us, Sl and Su. The main textural difference of these two groups
is the fractional higher clay and lower sand content in group one compared to
group two, as seen in Table . This has an effect on
the slopes of the fitted regression models. Group one shows decreasing values
of ΘS with increasing Ks values, group two behaves
the other way round. Assuming higher sand fractions causing higher
Ks values, the grain size compositions of group one are shifted in
the direction to the centre of the texture triangle. This may cause smaller
values of ΘS. On the other hand, moving away from the centre
of the texture triangle with higher fractions of sand (as for group two) may
have the opposite effect of increasing porosity. Both effects are imaginable,
however, we do not want to overinterpret the physical impact of van
Genuchten's ΘS.
Concerning α, high R2 values between 0.67 and 0.96 were found for
four texture classes with an average R2 of 0.75. As given in
Sect. , the parameter α is weakly related to the inverse
of the air entry suction (not to forget that van Genuchten curves have no
defined air entry value). In general, without focusing on van Genuchten's
model, the entry suction should be higher for fine-grained than for coarse-grained
soils. This means that the entry suction should rather decrease with
increasing Ks than increase. This connection cannot be found for
the texture class Lu. That is why this regression (Lu) is not considered in
the subsequent analysis.
Concerning n, very high R2 between 0.63 and 1.00 were found for seven texture
classes with an average R2 of 0.85. Especially for the two sandy texture
classes, highly accurate fits were obtained. Under the assumption of n being
related to the pore size distribution, many different pore sizes lead to low
values of n, whereas many pores with a similar size lead to high values of
n. In general, soils that are located near the borders of the texture
triangle tend to have a more narrow pore size distribution than soils located
in the middle of the triangle. Taking into account that these soils (pure
sand, pure silt) may have higher Ks values compared to loamy soils,
increasing Ks may be related to increasing values of van Genuchten's n.
Again, we have to be careful not to overstretch connections of Rosetta-generated
VGP to measurable physical properties of soils.
All statistical quality values from the univariate regression analyses are
listed in Table . Additionally, p values are included.
Low p values indicate a correlation between Ks and the parameters
of the soil hydraulic functions. All p values of Table
are nearly 0, yielding that all shown correlations are significant.
Further, the square of rspear yields approximately R2 for most cases.
This seems to validate R2 as a quality criterion for the regression analyses.
Obtained coefficients of determination (R2), Spearman correlation
(rspear) and belonging p value (p) as well as the sample size (Samples)
for the regressions between the Ks values and the soil hydraulic
parameters for each texture class. Lu is silty loam, Ls is sandy loam,
Ut is clayey silt, Ul is loamy silt, Us is sandy silt, Sl is loamy sand,
Su is silty sand, S is sand, Ss is pure sand.
Texture
Statistic
van Genuchten parameters
ΘR
ΘS
n
α
Lu
R2
0.94
0.82
0.78
0.73
rspear
0.97
0.91
0.86
0.88
p
0.00
0.00
0.00
0.00
Samples
13 829
Ls
R2
0.88
0.90
rspear
0.94
0.95
p
0.00
0.00
Samples
50 648
Ut
R2
0.99
0.93
rspear
1.00
0.96
p
0.00
0.00
Samples
6822
Ul
R2
0.98
0.63
rspear
0.99
0.79
p
0.00
0.00
Samples
12 995
Us
R2
0.99
0.78
0.56
0.96
rspear
1.00
0.89
0.74
0.98
p
0.00
0.00
0.00
0.00
Samples
3093
Sl
R2
0.92
0.68
0.88
0.67
rspear
0.95
0.83
0.96
0.80
p
0.00
0.00
0.00
0.00
Samples
7202
Su
R2
0.99
0.93
0.76
0.63
rspear
0.99
0.96
0.92
0.78
p
0.00
0.00
0.00
0.00
Samples
6364
S
R2
1.00
rspear
1.00
p
0.00
Samples
1455
Ss
R2
0.98
rspear
0.99
p
0.00
Samples
479
Mean R2
0.96
0.82
0.85
0.75
Mean rspear
0.98
0.91
0.90
0.86
Subdivision of the soil texture by means of cluster analyses based
on 31 classes (blue coloured polygons). The classes were divided by similarity
of their soil hydraulic parameters (cf. ). The
subdivisions of the German soil classification system
(cf. ) are overlayed with white
lines.
Procedure to obtain van Genuchten (VG) parameters and the saturated
hydraulic conductivity (Ks) values based on soil map information.
The software Rosetta is based on neural network analyses and generates van
Genuchten parameters and Ks values out of soil texture
information.
Average coefficient of determination (R2) in dependency of the
number of classes used for the subdivisions based on soil hydraulic
properties by means of cluster analyses. The average R2 is calculated
out of the R2 of all classes for each case. For this calculation, only
classes with R2 > 0.5 were considered. In addition to that, the
range of R2 is shown. The range was calculated out of the maximum and minimum
R2 of the individual classes.
Scatterplots of the van Genuchten parameters (ΘR,
ΘS, n, α) in dependency of the saturated hydraulic
conductivity (Ks). Visualised is database 1
(ΘR - Ks) and database 2
(ΘS-Ks, n - Ks and
α - Ks). A regression function with a coefficient of
determination (R2) of 0.74 was fitted between ΘR and
Ks. Furthermore, a regression function with an R2 of 0.25 was
fitted between n and Ks. ΘS - Ks as
well as α - Ks showed no
correlation.
Scatterplots of the van Genuchten parameters (ΘR,
ΘS, n, α) in dependency of the saturated hydraulic
conductivity (Ks) for the texture class Su (silty sand) out of
database 3x (variant A). Regression functions were fitted for all
variants of VGP - Ks. R2 is the coefficient of
determination.
Multivariate regression analyses
Regression analyses based on Eq. () were performed for each
of the natural texture classes. We used log10(Ks) to fill the
matrix X. The matrix Y comprises ΘR,
ΘS, n and α. These more elaborate procedures, which
consider the correlations among the dependent variables, serve as references
for the previous results.
Both the shape of the obtained fits of the multivariate method and the R2
turned out to be very similar to those of the univariate method. The average
R2 both for the univariate and multivariate method was ∼ 0.835.
The shapes of the functions differ just slightly or are even identical.
Figure shows the univariate and multivariate
regression results for n based on the texture class Su. It can be seen that
both curves behave very similarly with small differences at high Ks values.
However, R2 are equal to each other and a better fit cannot be
pointed out. All other comparisons between the regression results of the two
methods act similar to Fig. . The high accordance of
both methods' results speaks for the robustness of the less elaborate
univariate method. Based on this, the results of the univariate regression
analyses will be used for further applications.
Databases 3x, variant B: classification based on soil hydraulic properties
Results of the subdivision
Figure shows subdivisions of the soil texture based
on soil hydraulic properties by means of cluster analyses for a number of
31 classes. Results of showed that the subdivisions
based on soil hydraulic properties are similar to the US texture-based
classification, especially for coarse-textured soils (sands). These
similarities were not found for fine-textured soils. The results of our
subdivision based on soil hydraulic properties are unlike the texture-based
classification. However, this is not directly a contradiction to
. They used the US texture triangle for comparison and
we used the German classification. In addition to that, the rules and
conditions for the algorithm of the cluster analyses have a high influence on
the result.
Scatterplot of the van Genuchten parameter n in dependency of the
saturated hydraulic conductivity (Ks) for the texture class Su
(silty sand) out of database 3x (variant A). To compare the univariate
and multivariate regression, both functions are shown in the graph.
R2 is the coefficient of determination.
Impact on the pF and K(h) curves due to the univariate
regression functions out of database 3x (variant A).
pF is log10 of the absolute pressure head h. K(h) is the hydraulic
conductivity in dependency of pressure head. Θ = volumetric water
content. The minimum and maximum saturated hydraulic
conductivities (Ks) were given by Rosetta. The van Genuchten
parameters were changed in dependency of Ks by means of the
regression functions. (a) pF curves for the texture class S
(sand). (b, c) The same as shown in (a), but for the
texture classes Su (silty sand) and Lu (silty loam). (d) Hydraulic
conductivity curves for the texture class S. (e, f) The same as
shown in (d), but for the texture classes Su and
Lu.
Univariate regression analyses
In variant B, we concentrate on univariate regression analyses only. In
Fig. the average R2 are shown in dependency of the
number of classes used for the subdivisions. As previously, regression
results with R2 lower than 0.5 are not considered. The abscissa is limited
to a maximum of 200 classes. If more classes are used, the average R2 does
not increase significantly. The average R2 ranges therefore mainly
between 0.7 and 0.8. If we use 31 classes, which is the same number of subdivisions
as the texture-based classification of the German soil classification system,
the average R2 is 0.74 and 40 % of the regression results have coefficients
of determination higher than 0.5. The maximum can be found for the number of
2128 classes (R2 = 0.82 with 49 % of the regression results with > 0.5).
The results of the regression analyses based on databases 3x (variant A)
yielded a total R2 of 0.88 by using nine natural texture classes and
67 % of the regression results had an R2 > 0.5. In addition, the
application of the univariate method is faster and less elaborative. For
those reasons, we will use the results of the regression analyses based on
databases 3x (variant A) for further applications.
Applications on soil hydraulic functions
Figure illustrates the impact of the regression results that
were obtained by the univariate method of databases 3x (variant A) on
van Genuchten's soil hydraulic functions for the texture classes S, Su and
Lu. These three texture classes are assigned to be representative for all
classes that were investigated. In addition, a wide range of Ks
values is covered. Ks values were selected ranging from the minimum
to the maximum values that were obtained out of database 3x (variant A).
The pF curves of the texture class S are shown in Fig. a. Van
Genuchten's n was computed out of the regression function. The pF curve of
the regression with the smallest Ks value has a clearly smoother
slope compared to the pF curve that was obtained for the largest
Ks value. The lower the Ks the more moves the shape of
the pF curves in the direction of typical pF curves for sandy soils with a
fraction of silt. The curves for low Ks values tend to have a
higher usable field capacity possibly leading to higher rates of
transpiration in hydrological modelling applications. The curves for the
unsaturated hydraulic conductivity K(h) of the texture class S are given in
Fig. d. The same parameters as for the pF curves were used.
Near saturation the curves of large Ks values are above the curves
of low Ks values. This relation changes after an intersection point
at pF of ∼ 2, caused by the variation of van Genuchten's n that is
directly connected to the parameter m. From the physical point of view, the
shapes of the curves can be described as reasonable. The curves with lower
Ks values have a higher fraction of small pores. These fractions of
small pores are able to transport water for a wider range of pF in contrast
to the curve parameterisations with high Ks values. This leads to
the intersection point that changes the dominating impact factor on the
conductivity curves: for pF < 2, the Ks value, which simply scales
the curve, is the dominating factor. For pF > 2, van Genuchten's m is the
dominating impact factor. However, after the intersection point K(h) is
already at very low values. Therefore, the variation of m for sandy soils
may have a small impact compared to the impact of variations of the Ks values.
Figure b shows the impact of the regression results on the
pF curves of the texture class Su. Similar to Fig. a, the curves
for low Ks values have a smoother slope. In addition to that, the
modifications of van Genuchten's α causes the water content to drop
at higher pF values for the curves of low Ks values compared to
the curves of high Ks values. This behaviour is typical for texture
classes that have a slightly larger fraction of fine pores than the standard
Su. The usable field capacity is more or less the same for all pF curves.
The impact on hydrological model applications might nevertheless be immense
depending on the method that reduces the potential evapotranspiration to the
actual one: methods based on the actual water content of the soil within the
root zone probably calculate higher rates of actual evapotranspiration using
the parameterisation based on low Ks values than using the ones of
higher Ks values. On the other hand, methods based on pF values
of the soil are expected to be less affected. The impacts on the conductivity
curves for the texture class Su are plotted in Fig. e. Here
again, an intersection point can be located (at a pF of ∼ 1.8). Above
this pressure head, the curves of high Ks values drop below the
curves of small Ks values. In contrast to the conductivity curves
of the texture class S, the values of K(h) at the intersection point (and
close below) are still high enough to enable a water movement that is not
negligible. For that reason, soil water simulations are influenced, especially
during dry seasons.
The pF curves for the texture class Lu are visualised in Fig. c.
Here, a shift on the ordinate can be observed, whereas the
curves for low Ks values induce higher water contents than the
curves for high Ks values for the same pressure head. This is due
to the relation that was found for Lu of ΘR and
ΘS being inversely proportional to Ks. However, the
variations of n cause different slopes of the curves. The impact on the
reduction of the potential evapotranspiration is comparable to the impact
described for the texture class Su. The impact on K(h) is primarily driven by
the variations of the Ks values, as seen in Fig. f.
The intersection point is approximately at pF 4. At this high pF, K(h) has
dropped magnitudes below the saturated value.
It can be summarised that the modifications of the VGP caused by the
regression results of the databases 3x (variant A) lead to plausible
pF curves. Further, the impact on the conductivity functions near
saturation is primarily driven by the value of Ks. As the
Ks value works as a scaling factor for the conductivity curves,
this result is no surprise and not induced by the regression functions. For
medium and low saturations, however, the impact is dominated by the variations
of the parameterisations of the soil hydraulic functions that were produced
by the regression functions. Especially for the texture Su (and similar
ones), the impact of the regression functions will have an impact on long-term
hydrological model applications. Taking the soil map of Lower Saxony for
instance, texture classes with compositions, like Su, Sl or similar classes, occupy
more than one-third of the total area. For many of the texture classes, all
four VGP could be fitted in dependency of Ks. However, this did not
always work as seen in Table . Following this, the
correlation matrices of the VGP, generated within the regression analyses of
databases 3x (variant A), were taken into account more deeply. It turned
out that correlations were very low between VGP that are related to
Ks and VGP that are not related to Ks. These findings
indicate the admissibility of fitting less than four VGP in dependency of Ks.
Generating subgrid spatial variability
Spatial resolutions of hydrological models mainly depend on the resolutions
of the input data of soil properties and land use, respectively. These input
data are often not equally resolved in space and time (e.g. the German ATKIS
database). If the model area is subdivided into polygons by the hydrological
model, the spatial resolution is unequally distributed and given
automatically by the input layers. If the model area is subdivided into
raster cells, the spatial resolution is equally distributed and depends both
on input layers and on the user's interests. For latter types of models, the
spatial resolution may often induce a pseudo-accuracy, because the chosen
grid size can be much smaller than most of the subdivisions of the input
layers. In any case, the real spatial resolution of a hydrological model
that has to be considered for the process description is given by the spatial
resolutions of the input data. In most cases these spatial resolutions are
rather coarse, causing many processes that are not directly resolved by the model.
To consider the spatial variability of soil water processes that are not
directly resolved by the hydrological model, the following procedure is
elaborated in order to generate parameterisations of soil hydraulic functions:
Acquisition of a soil map for the model area (or similar information).
In this study, a German soil map of Lower Saxony is used; see Fig. .
If not already included in the soil map, soil information has to be transformed into texture information. This study included usage of the German soil classification
system; see .
Obtaining texture classes out of the soil map. For example, Sl with 65 %
sand, 25 % silt and 10 % clay (see Table ).
Random generation of trios of numbers within a range of 0–100 with
the precondition that the sum of each trio has to be 100. The numbers of each
trio are assigned to be a percentage fraction of sand, silt and clay.
Consideration of a boundary in each direction (sand, silt, clay). This
study used a ±5 % boundary. For example, Sl with 65 ± 5 % sand,
25 ± 5 % silt and 10 ± 5 % clay. Categorisation of the
random number trios into the obtained boundaries.
Generation of VGP sets with the software Rosetta for the obtained texture
classes (categories).
Regression analyses between Ks values and all other VGP for
each texture class.
The total number of needed randomly generated numbers (point 3) may differ in
dependency of the texture classes that are going to be analysed. The Rosetta
underlying databases have more samples of sandy soils than of clayey soils
. Furthermore, some combinations in the texture
triangle are very seldom in nature. To ensure that these disagreements do not
bias the regression results, only close-range (± boundary) near-natural
occurring texture classes that are obtained from soil maps should be
considered for the regression analyses (here: generation of database 3x
(variant A); see Sect. ). The boundary was assigned to be
±5 % in order to get a representative number of VGP sets for
each texture class. Other values for the boundary were tested, whereby much
lower values (e.g. ±1 %) lead to a very close range of the
Ks values. Much higher values for the boundary (e.g. ±10 %)
blurred the VGP sets of the texture classes (there was no difference
left between certain texture classes). Therefore, we recommend a value of
±5 % for the boundary.
At the next step, the obtained regression functions have to be applied in a
hydrological model. The following procedure is recommended:
Assumption of a lognormal distributions for the Ks values of
each texture class. The mean values are given by the Ks values that
were obtained with Rosetta at the centre of each texture class. The standard
deviations are given by the user.
Calculation of variations of the other VGP by using the regression functions
and the Ks distribution functions. The number of VGP sets is up to
the user. At least three sets should be used. We recommend five sets by using
the 10th, 30th, 50th, 70th and 90th percentiles of the Ks distribution
function. More sets are possible.
Run the model by parallelly using the VGP sets that were obtained at the
previous point 2.
Due to the fact that standard deviations of the Ks values are in
most cases unknown for meso- and macroscale hydrological model applications,
this parameter should be assumed by the user. Note that this is the only
tuning parameter needed for the procedure presented in this study. The
standard deviations of Ks values at field scale may vary between
less than 50 % and several hundred percent and there seem to be no clear
correlations to the texture classes of the analysed soils; see
e.g. , , ,
, or . The range
of the standard deviation that should be used is indirectly given by the
minimum and the maximum Ks values that were obtained out of
database 3x (variant A). Assuming a specific standard deviation, the
10th and 90th percentiles of the resulting Ks distribution still have
to be within the range of Ks values given in database 3x
(variant A). If so, the hydrological model is ready to start the simulation.
If not, the regression function should either be restricted to the range of
Ks (this is recommended) or the standard deviation should be forced
to a maximum value by the model. After fulfilling this condition, the
hydrological model is ready to start. A possibility to effectively process
the VGP sets within the hydrological model is given in point 2 of the above
list. We recommend to use at least three different VGP sets per soil to describe
the spatially variability. However, more sets can be used likewise. It is
possible to simulate the soil water movement for all VGP sets parallel in one
simulation run of the hydrological model. Note that vertical information
about soil profiles, if available by the soil map, can be handled with the
same procedure as described so far. Hence, the spatial variability of soil
hydraulic functions can either be described as horizontal (if just texture
classes without any vertical profile information is available) or
horizontal and vertical (if soil profile information is also available).
These presented developments were implemented into the hydrological modelling
system PANTA RHEI
and were used successfully in many practical
applications and projects (e.g. ).
PANTA RHEI has been developed by the Department of Hydrology,
Water Management and Water protection, Leichtweiss Institute for Hydraulic
Engineering and Water Resources, University of Braunschweig in cooperation
with the Institut für Wassermanagement IfW GmbH, Braunschweig
. It is a deterministic, semi-distributed,
physically based hydrological model for single events or long-term
simulations. The temporal discretisation is adaptive; for many applications
an hourly time step is used. The spatial discretisation is divided into three
levels: HRUs (hydrologic response units), subcatchments and gauged
catchments. Watersheds are the basis for the subcatchments, which contain
the HRUs. This spatial discretisation makes the model very flexible to
account for differences in scale of the input data, similar to the mHm model
of . A difference between our hydrological model PANTA
RHEI compared to many other models is the low number of model parameters that
are used for calibration. We work with catchment-based model parameters,
which have different effects on the subcatchment scale controlled by
physiographic characteristics. This leads to (only) 6–8 model parameters in
total to calibrate the model for an area of a many hundred square kilometres.
Application of different van Genuchten parameter sets on the soil
model of the hydrological modelling system PANTA RHEI. The different
parameterisations (domains) are parallel used at all spatial locations. The
domains are solved simultaneously and with interaction to each other. The
main input is given by the spatial precipitation (P), which was reduced in
advance by vegetational interception. Results of the soil model are the
direct runoff (Peff,D), the groundwater recharge (Peff,GW), which
leads to base flow in a long-term view and actual evapotranspiration (ET).
The structure of the soil model of PANTA RHEI is shown in Fig. .
Different parameterisations of VGP (e.g. 5) are
established by means of lognormal distributions of Ks. After the
sets of VGP are derived, we use all of them to parameterise the soil model.
As mentioned, we assume that one effective set of VGP cannot express subgrid
variability. Secondly, we assume that many different sets of VPG are able to
do so. That is why the soil model is parameterised many times, whereby the
structure and equations were not changed. These different models (domains)
operate individually. However, they are connected to each other. Summarised,
it can be argued that we do not have multiple model scenarios – it is one
model with multiple parameterisations solved simultaneously. The impact of
the subgrid parameterisation of the soil hydraulic functions are dominated by
the variation of Ks in wet periods and by the variation of VGP in
dry periods. Furthermore, the parameterisations have a feedback on the
reduction of evapotranspiration that can be related to the pressure head of
the soil . The developed soil model is innovative
regarding concept, interfaces and parameterisation. The model structure
provides the required interfaces for calibrations made at runoff, soil
moisture and/or groundwater level. Therefore, the demand for an automated
optimisation procedure arises through the multi-variable examination of the
system and its new complexity. A pioneering lexicographical strategy of
optimisation was developed, using the model interfaces connected to modern
data types . To account for the impact of
the subgrid parameterisation, we compared breakthrough curves (1-D) with
different numbers of VGP sets and with different standard deviations of the
Ks distribution functions. We also compared spatially distributed simulation
results of the hydrological model for soil moisture with remotely sensed
satellite data (ERS1/2-ESCAT, MetOp-ASCAT, ENVISAT-ASAR). The simulated soil
water contents turned out to have high accordance with the satellite-based
soil moisture. In addition to that, the model was able to approximate the
dynamics of groundwater level with very high quality compared to measured
data . Another possibility to account for subgrid
variability is to analyse the standard deviation of soil moisture as a
function of the number of applied VGP sets. Further, the spatial soil
moisture patterns could be compared in dependence of the number of applied
VGP sets, similar to . We are working on a following
paper focusing on the hydrological model and its calibration.