Remotely sensed soil moisture products are influenced by vegetation and how
it is accounted for in the retrieval, which is a potential source of
time-variable biases. To estimate such complex, time-variable error
structures from noisy data, we introduce a Bayesian extension to triple
collocation in which the systematic errors and noise terms are not constant
but vary with explanatory variables. We apply the technique to the
Soil Moisture Active Passive (SMAP) soil moisture product over croplands,
hypothesizing that errors in the vegetation correction during the retrieval
leave a characteristic fingerprint in the soil moisture time series. We find
that time-variable offsets and sensitivities are commonly associated with an
imperfect vegetation correction. Especially the changes in sensitivity can be
large, with seasonal variations of up to 40 %. Variations of this size
impede the seasonal comparison of soil moisture dynamics and the detection of
extreme events. Also, estimates of vegetation–hydrology coupling can be
distorted, as the SMAP soil moisture has larger

Soil moisture products derived from satellite measurements are subject to
errors. These are not constant, but vary in space and time. For any given
location, they may depend on variable factors such as vegetation phenology,
atmospheric conditions or measurement characteristics like the incidence
angle

However, the time-variable error properties of soil moisture products are
poorly understood and rarely considered in practice

Here, we extend triple collocation to estimate non-constant error structures
(Sect.

We apply this procedure to estimate time-variable biases in the SMAP soil
moisture product that are associated with an imperfect vegetation correction
(Sect.

We focus on croplands, as they present a particular challenge to the
vegetation correction approach using an a priori

We hypothesize that seasonal changes in the error structure arise due to an
inaccurate vegetation correction in the retrieval, so that the biases
relative to the in situ data track the misspecification in the vegetation
optical depth

The two components of Bayesian triple collocation.

We now present a general overview of the approach (Fig.

Our approach has several characteristics that make it useful in a wide range
of applications. It is widely applicable, as no soil moisture product is
assumed to be free of errors. This is particularly critical for estimating
the noise magnitude and the sensitivity, which cannot be estimated
consistently by standard regression approaches when the reference product is
subject to errors

The probability distribution comprises the observable products

Each product's error model is governed by a set of parameters

Our key extension compared to previous triple collocation studies is to make
the bias terms and the noise magnitude vary with time-dependent explanatory
variables,

The quasi-random errors are characterized by their variance and further
distributional assumptions. For the variance

We generally specify

The second piece of the probabilistic model concerns the soil moisture

One drawback of this model is that it cannot account for the autocorrelation
and seasonality of soil moisture. To test for the importance of temporal
characteristics, we also generalize the model by making

To complete the full probability distribution, one has to specify the prior
distributions of the parameters

For all products

The standard prior distribution for the soil moisture
parameters

The default model specification used in both the simulation study
and the SMAP case study, and the baseline configuration for the simulation
runs. The bias terms for the reference product

The Bayesian inference takes the observed products and explanatory variables
as input and outputs posterior probability distributions over the unknown
quantities (Fig.

We now study the applicability of Bayesian triple collocation using a
simulation study. We used three simulated products with realistic error
properties (Table

We first analysed the fidelity with which the error parameters could be
estimated. To this end, we simulated

The error parameters could be estimated with sufficient fidelity in the
simulation study (Fig.

Bayesian triple collocation yields a distribution of the parameters and thus
naturally provides uncertainty estimates. The posterior standard deviation

Simulation results illustrating the estimation fidelity.

To test the sensitivity of the estimates to model assumptions, we extended
the simulation study. The most critical aspect turned out to be the
specification of the bias terms: neglecting variable bias terms can impair
the overall estimation quality. Neglecting the complex error structure leads
to an overestimation of the error magnitude (Fig.

To test for additional model assumptions, we modified the model and the
forward simulations. The impact on the estimation accuracy was typically
limited (see the Supplement), so we only provide a short summary. First, the
model for the noise term

To estimate the biases of the SMAP soil moisture product, we used

The analyses focus on seven locations in North America, South America and
Europe with significant crop cover, due to the availability of high-quality
dense in situ networks (Table

To provide a better overview of the spatial patterns, we also used data from

For the third soil moisture product we used the MERRA2 reanalysis
(M2T1NXLND.5.12.4)

To quantify the error structure as a function of

Network sites from north to south, including their Koeppen–Geiger climate regime.

In our estimation we specified the error structure based on our hypothesized
impact of a vegetation misspecification. The

To compute the predicted biases in Fig.

The same overcompensation that increases the sensitivity also increases the
noise level, so that we also made the variance of

The same error structure was assumed for the re-analysis data. The inclusion
of a

To compare the estimated biases with the model predictions, we re-expressed

To test the robustness of the estimates, we varied the input data and the
model configuration. Instead of using the SMOS

To explore the relation between time-variable vegetation biases and estimates
of vegetation–water coupling, we analysed the coefficient of
determination

SMAP biases that track the vegetation misspecification

The varying sensitivity of SMAP is illustrated for the South Fork site,
Iowa (USA), in Fig.

Pronounced changes in the sensitivity are found for all network sites but one
(Fig.

Also, the additive biases track changes in

Time-variable biases over the network sites.

The time-average error properties

The time-variable biases are complemented by the time-average biases, which
are quite large at several network sites. The time-average sensitivity

Our sensitivity analyses focus on the reference

Robustness quantified by changes in the estimated

By contrast, the estimates can change substantially when

The estimates of the time-variable biases are reasonably robust to other
aspects of the model specification. The impact of replacing the MERRA2 with
the GLDAS2 soil moisture or dropping it altogether is also small
(Fig.

Across the sparse sites within the contiguous US we commonly find pronounced
time-variable biases (Fig.

Large biases over croplands and grasslands are also found when

To analyse the estimated noise level for all three products, we computed a
normalized version

Time-variable biases and

Time-variable biases and

Estimated noise level normalized to the in situ dynamic range. For
both the network and the sparse in situ sites, the distribution of the
posterior median values of

The observed coupling between vegetation and soil moisture anomalies is
larger when using SMAP than when using in situ soil moisture data
(Fig.

While the spatial patterns largely match those of the time-variable biases,
the link between them is not clear and not necessarily uniform across all
sites. The computation of anomalies largely removes seasonal offsets, which
constitute a major fraction of the estimated additive biases. However, it
does not remove higher-frequency variations or inter-annual differences,
although the record is too short to reliably study those. Neither can it
account for the changes in sensitivity, which are particularly large over
croplands. Finally, the in situ soil moisture anomalies, predominantly
derived from single probes, are subject to major uncertainties. All these
factors likely contribute to the elevated associations between the

By applying Bayesian triple collocation to the SMAP soil moisture product, we
detect time-variable biases. These time-variable biases track the
misspecification of the vegetation optical depth

A mechanistic interpretation of the inferred biases is complicated by a
number of poorly understood factors. First, the time-variable biases are
relative to the in situ data. The results over the sparse sites should hence
be interpreted with caution due to representativeness error, even if they are
similar to those at the dense high-quality network sites
(Fig.

While it is premature to attribute the inferred biases completely to an
imperfect vegetation correction, there are two lines of reasoning that
suggest that the inferred biases are not spurious. First, they are fairly
consistent across croplands, and also between sites with sparse and dense in
situ networks (Fig.

One further caveat is that time-average biases are also present (additive
bias:

We conclude that our key finding is the presence of sizeable time-variable
biases in the SMAP product. They are associated with, but likely not entirely
caused by, deviations of the a priori

The time-varying biases can have a negative impact in many applications. The
changing sensitivity impedes the seasonal comparison of soil moisture
dynamics, as the same SMAP-observed change corresponds to a wide range of
actual soil moisture changes depending on the season (e.g. Fig.

The spurious vegetation signal in the soil moisture data may distort
estimates of water–vegetation coupling. We find inflated values of

Soil moisture products can be subject to complex, time-variable errors, as revealed by our novel method for estimating such complex error structures from data. Other estimation procedures are conceivable, especially if high-quality in situ data are available, and should be explored in the future. Our Bayesian triple collocation approach is widely applicable because it yields consistent estimates of error magnitudes and biases even when no error-free reference data set is available. It does, however, have to be assumed to be free of systematic error. The method is flexible, so that the error structure parameterization can be adapted to the problem at hand. We hope that this will enable the community to better characterize the uncertainties of remotely sensed soil moisture products. The knowledge of time-variable structural errors is key to improving the products, and it also helps to inform the application of these data sets in practice.

The presented approach could be applied to a wide range of variables besides soil moisture, such as wind speed, land surface fluxes and leaf area index. The issue of non-constant error sources, be they associated with environmental conditions or varying observational parameters, likely pertains to many such variables. By shedding light onto residual biases, our approach could in the future contribute to the development of improved retrieval approaches.

We developed a probabilistic approach for estimating complex error structures
to study time-variable biases in the SMAP soil moisture product. We
hypothesized that temporal changes in the error structure arise due to an
inaccurate vegetation correction in the retrieval, so that the biases
relative to the in situ data track the misspecification in the vegetation
optical depth

Sizeable temporal changes in the offset and the sensitivity were detected,
and they were particularly large over croplands (e.g. change in sensitivity

While the estimated time-variable biases track the

The time-variable biases impede the seasonal comparison of remotely sensed soil moisture values. In particular, extreme conditions like droughts may not be apparent in the SMAP data when the sensitivity happens to be small.

The presented estimation approach is widely applicable because it yields consistent estimates of error magnitude and biases even when no error-free reference data set is available. Further, it is flexible in that a wide range of different kinds of error structures can be estimated purely from observations.

Time-variable biases should thus be considered in future uncertainty analyses. Previous mission requirements have largely focused on the RMSE error metric, which, however, cannot distinguish between such systematic errors and white noise. Because neglecting that distinction can easily give rise to misleading interpretations, it is important that time-variable biases be quantified. The robust estimation approaches like that developed here can help to quantify and mitigate these biases, and thus to exploit the full potential of observational data sets.

The remotely sensed and reanalysis data are available at
the URLs provided in the references (registration generally required). The
sparse in situ data can be found at

The supplement related to this article is available online at:

SZ, AB and AC devised the study; SZ developed the technique and performed the data analysis; MC, JMF, HM, PS, MT, AB provided data and expertise on the core validation sites; SZ was the lead author, with inputs from all co-authors.

The authors declare that they have no conflict of interest.

The authors are grateful to Kaighin McColl, Dara Entekhabi and his group, and the SMAP team for comments and suggestions. They thank Alexandra Konings and Wade Crow for insightful reviews. Simon Zwieback acknowledges support from the Swiss National Science Foundation (P2EZP2_168789). The support of the Canadian Space Agency, Environment and Climate Change Canada and the Natural Sciences and Engineering Research Council of Canada is acknowledge for support of the Kenaston network. A contribution to this work was made at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The University of Salamanca team's involvement in this study was supported by the Spanish Ministry of Economy and Competitiveness with the project PROMISES: ESP2015-67549-C3 and the European Regional Development Fund (ERDF). Edited by: Graham Jewitt Reviewed by: Alexandra Konings and Wade Crow