The COsmic-ray Soil Moisture Interaction Code ( COSMIC ) for use in data assimilation

Soil moisture status in land surface models (LSMs) can be updated by assimilating cosmic-ray neutron intensity measured in air above the surface. This requires a fast and accurate model to calculate the neutron intensity from the profiles of soil moisture modeled by the LSM. The existing Monte Carlo N-Particle eXtended (MCNPX) model is sufficiently accurate but too slow to be practical in the context of data assimilation. Consequently an alternative and efficient model is needed which can be calibrated accurately to reproduce the calculations made by MCNPX and used to substitute for MCNPX during data assimilation. This paper describes the construction and calibration of such a model, COsmic-ray Soil Moisture Interaction Code (COSMIC), which is simple, physically based and analytic, and which, because it runs at least 50 000 times faster than MCNPX, is appropriate in data assimilation applications. The model includes simple descriptions of (a) degradation of the incoming high-energy neutron flux with soil depth, (b) creation of fast neutrons at each depth in the soil, and (c) scattering of the resulting fast neutrons before they reach the soil surface, all of which processes may have parameterized dependency on the chemistry and moisture content of the soil. The site-to-site variability in the parameters used in COSMIC is explored for 42 sample sites in the COsmic-ray Soil Moisture Observing System (COSMOS), and the comparative performance of COSMIC relative to MCNPX when applied to represent interactions between cosmic-ray neutrons and moist soil is explored. At an example site in Arizona, fast-neutron counts calculated by COSMIC from the average soil moisture profile given by an independent network of point measurements in the COSMOS probe footprint are similar to the fast-neutron intensity measured by the COSMOS probe. It was demonstrated that, when used within a data assimilation framework to assimilate COSMOS probe counts into the Noah land surface model at the Santa Rita Experimental Range field site, the calibrated COSMIC model provided an effective mechanism for translating model-calculated soil moisture profiles into aboveground fast-neutron count when applied with two radically different approaches used to remove the bias between data and model.


Introduction
Until recently area-average soil moisture at the hectometer horizontal scale has been difficult and costly to measure because of the need to take many point samples, but with the advent of the cosmic-ray method (Zreda et al., 2008(Zreda et al., , 2012Desilets et al., 2010) it is now feasible with a single instrument. However, a complicating aspect of measuring soil moisture using this method is that the volume of soil measured in the vertical varies with soil moisture content (Franz et al., 2012a).
One potentially important use of area-average soil moisture measured with the cosmic-ray method is through data assimilation methods to update the value of soil moisture states represented in the LSMs which are used to describe surface-atmosphere exchanges in meteorological and hydrological models. Typically such LSMs calculate (among many other things) time-varying estimates of soil moisture content in discrete layers of soil defined within the vertical soil profile. In order to make use of the area-average soil moisture provided by the cosmic-ray method, it is necessary to a. diagnose if there is a discrepancy in the modeled soil moisture status from the aboveground measured fastneutron count; and b. interpret knowledge of the extent of any discrepancy back into the LSM with weighting between layers reflecting their relative influence on the aboveground measured fast-neutron count.
This requires the availability and use of an accurate model to interpret the modeled soil moisture profiles in terms of the aboveground fast-neutron count.
In principle the required model needed to make such interpretation exists, specifically the Monte Carlo N-Particle eXtended (MCNPX: Pelowitz, 2005) neutron transport code, which was much used in establishing the cosmic-ray method (Zreda et al., 2008(Zreda et al., , 2012Desilets et al., 2010) currently being deployed in the COsmic-ray Soil Moisture Observing System (COSMOS: http://cosmos.hwr.arizona.edu/; Shuttleworth et al., 2010;Zreda et al., 2011Zreda et al., , 2012. Given the specified chemistry of the atmosphere and soil (including the amount of hydrogen present as water in the system), the MC-NPX code uses knowledge of nuclear collisions and libraries of nuclear properties for these constituents to track the life history of individual, randomly generated, incoming cosmic rays and their collision products through the atmosphere and in the soil. The code then counts the resulting fast neutrons (we use those in the range 10 eV to 100 eV) that enter a defined detector volume above the ground. In principle the MC-NPX code could be used in data assimilation applications to define (a) and (b) in the last paragraph. However, although accurate, the MCNPX code uses the time-consuming Monte Carlo computational method, and this means its use in data assimilation applications is impractical. Therefore an alternative model is needed which can efficiently reproduce the belowground physics, the resulting aboveground count rate and the belowground vertical source distribution of fast neutrons simulated by MCNPX. This paper describes the construction and calibration of such a model, the COsmic-ray Soil Moisture Interaction Code (COSMIC), which is simple, physically based and analytic, and which runs much faster than MCNPX because the nuclear processes and collision cross sections that are explicitly represented in MCNPX are re-captured in parameters that have dependency on the site-specific soil properties. These parameters are calibrated using multi-parameter optimization techniques against MCNPX calculations for a suite of hypothetical soil moisture profiles.

Physical processes represented in COSMIC
The COSMIC model assumes there are three dominant processes involved in generating the fast neutrons detected above moist soil (see Fig. 1). It is first assumed that there is an exponential reduction with depth in the number of the  high-energy neutrons that are available to create fast neutrons at any level in the soil. Calculations made with MC-NPX indicate that assuming such an exponential reduction in neutron flux is appropriate. There is reduction due to interaction both with the (dry) soil and with the water that is present in the soil. The exponential reduction therefore depends on two length constants L 1 and L 2 , in units of g per cm 2 , corresponding to interaction with the soil and the water (hydrogen), respectively. The mass of water includes both lattice water, i.e., that which is in the mineral grains and bound chemically with soil and considered fixed in time, and the pore water which is available to support transpiration or drainage and which consequently changes with time. Thus, the number of high-energy neutrons available at depth z in the soil is given by where N 0 he is the number of high-energy neutrons at the soil surface, m s (z) and m w (z) are respectively the integrated mass per unit area of dry soil and water (in g cm −2 ) between the depth z and the soil surface, and L 1 and L 2 (in g per unit area) are respectively determined by the chemistry of the soil and its total water content, including any chemically bound lattice water.
Second, it is assumed that at each depth z the number of fast neutrons created in the soil is proportional to the product of the number of high-energy neutrons available at that depth with the local density of dry soil per unit soil volume and the local density of soil water per unit soil volume at that depth, assuming the relative efficiency of creation of fast neutrons by soil is a factor α of the efficiency of their creation by water. Consequently, the number of fast neutrons created in the soil in the plane at level z is given by Hydrol. Earth Syst. Sci., 17, 3205-3217, 2013 www.hydrol-earth-syst-sci.net/17/3205/2013/ where C is a (unitless) "fast-neutron creation" constant for pure water, ρ s (z) is the local bulk density of dry soil and ρ w (z) the total soil water density, including lattice water. It is assumed that the direction in which the fast neutrons are generated at level z is isotropic, i.e., that they leave with equal probability in all directions.
Finally it is assumed that the fraction of fast neutrons originating in the soil in the plane at level z that are detected above the ground are reduced exponentially by an amount related to the distance traveled between the point of origin in this plane and the detector at the surface. There is then little reduction in the neutron count in the air between the soil surface and the fast-neutron detector mounted just a few meters above the surface. The reduction in fast neutrons in the moist soil is assumed to follow a functional form similar to that in Eq. (1), i.e., an exponential reduction, as for highenergy neutrons, but with different length constants L 3 and L 4 , in units of g cm −2 , corresponding to attenuation by soil and by (total) soil water, respectively. However, because the direction in which fast neutrons are generated at level z is assumed to be isotropic, fast neutrons reaching the surface will travel further if they do not originate directly below the detector, rather from a point that is more distant in the horizontal plane at level z. To allow for this it is necessary to calculate the integrated average of the attenuation for all points in this plane to the detector, with the attenuation distance being inversely proportional to cos (θ), where θ is the angle between the vertical below the detector and the line between the detector and each point in the plane; see Fig. 2. Consequently, the integrated average attenuation of the fast neutrons generated at level z before they reach the detector is given by the function A(z): which, because there is assumed symmetry around the vertical through the detector, reduces to The value of A(z) can be found numerically, but for efficiency it could also be adequately calculated using the approach described in Appendix A.
Combining the representations of the three physical processes considered in COSMIC described above, the analytic function describing N COSMOS , the number of fast neutrons reaching the COSMOS probe at a near-surface measurement point is The source volume element of fast neutrons created in the plane at depth z in the soil which may reach the measurement point P, but whose number is attenuated by an exponential factor with length constants L3 and L4 (in g per unit area), these being respectively determined by the chemistry of the soil and by the total water content of the soil, including lattice water.

Fig. 2.
The source volume element of fast neutrons created in the plane at depth z in the soil which may reach the measurement point P but whose number is attenuated by an exponential factor with length constants L 3 and L 4 (in g per unit area) -these being respectively determined by the chemistry of the soil and by the total water content of the soil, including lattice water.
Note that in Eq. (5), the product of the two constants (CN 0 ) that appears in Eq.
(2) has been replaced by a single constant, N, because the values of C and N 0 cannot be separately determined from a comparison between calculations made using COSMIC and MCNPX.

Determining the parameters to be used in COSMIC
To determine the values of the (in some cases site-specific) parameters to be used in COSMIC, at 42 selected sites in the COSMOS network (see Fig. 3) for which the required data were available at the time of this analysis, simulations using COSMIC were calibrated against equivalent calculations made with the MCNPX model. The MCNPX calculations were made using the site-specific COSMOS probe calibration based on gravimetric samples (see, for example, Franz et al., 2013a, b), corrected for the effect of atmospheric humidity (see Rosolem et al., 2013), and with site-specific bulk density of the soil, soil chemistry and lattice water content (see Table 2 in Zreda et al., 2012, for values).
Because L 2 and L 4 relate to attenuation by water alone, their values are independent of the soil chemistry of the site and they can be determined by substituting pure water for dry soil in MCNPX and COSMIC calculations. A simulation with MCNPX was made with pure water substituting for soil, and an exponential function then fitted to the calculated reduction in high-energy neutrons with depth calculated by MCNPX for pure water to determine L 2 . The original San Pedro site was then selected for determining L 4 and   the required value of the parameter N first defined at this site. This was accomplished by first optimizing the values of all remaining four COSMIC parameters (N, α, L 3 , L 4 ) at this site, with L 2 given as previously discussed and L 1 computed directly from MCNPX, in a similar manner to that described below. Once N is determined, COSMIC is configured to simulate pure water, and the parameter L 4 is fine-tuned to match the same neutron count obtained directly from MCNPX at the San Pedro site (after appropriate scaling using the F term described in the last paragraph of this section and shown in Table 1). Notice that for pure water simulations, the terms associated with parameters α, L 1 , and L 3 no longer appear in Eq. (5). Based on these pure water simulation comparisons, the values of L 2 and L 4 were set to 129.1 and 3.16 g cm −2 at all COSMOS sites.
The value of L 1 is easily determined for each site by running MCNPX with dry soil that has the site-specific soil chemistry and then fitting an exponential function to the calculated exponential reduction in high-energy neutrons with depth simulated by MCNPX (analogous to the method used to determined L 2 described above). Although the value of L 1 may depend on the soil chemistry present, our simulations with MCNPX at the 42 COSMOS sites considered in this study suggest that L 1 is only weakly related to soil chemistry, with site-to-site variability around the mean value for all sites being just ∼ 1 %. On this basis, adopting a fixed value equal to 162.0 g cm −2 irrespective of site is a reasonable assumption.
Data from individual sites in the COSMOS network are corrected for site to site differences in elevation and cutoff rigidity but local variability remains, likely associated with site-to-site differences in soil chemistry or vegetation cover. Individual site calibration of sensors is therefore required to allow for the fact that the observed neutron flux intensity at calibration does not necessarily equal the neutron flux intensity calculated by MCNPX when run with the soil chemistry and water content observed at calibration; see the final paragraph in Sect. 4. The values of the site-specific constants N, α and L 3 at all sites were then determined using multiparameter optimization techniques against calculations made using MCNPX. At each site calculations of the aboveground fast-neutron count are made using MCNPX for the 22 hypothetical profiles of volumetric water content illustrated in Fig. 4, i.e., for 10 profiles with different uniform volumetric water content, and 12 with different linear gradients of volumetric water content to a depth of 1 m and with uniform volumetric water content below 1 m. One criterion used in parameter optimization to define the preferred values of N, α, and L 3 is the weighted mean absolute error (MAE) between the aboveground fast-neutron counts calculated using the COSMIC model and the equivalent counts calculated by MCNPX with the same profiles. In each case, the weighted MAE is calculated based on the individual differences between the COSMIC neutron flux and MCNPX neutron flux for each profile, in absolute terms, and weighted by the probability density function of soil moisture historically observed at each site, with the most such commonly observed soil moisture values weighted to be twice as important as the least commonly observed value. The second criterion used in the optimization was that the cumulative contribution to aboveground fast neutrons as a function of depth given by the COS-MIC model matches that calculated by MCNPX as reported Table 1. Site-specific values of latitude and longitude; ρ s (g cm −3 ), θ lattice (m 3 m −3 ) and F ; and the parameters N, α (cm 3 g −1 ) and L 3 (g cm −2 ) obtained by calibrating the COSMIC model against MCNPX at the 42 COSMOS sites shown in Fig. 3 with L 1 = 162.0 g cm −2 , L 2 = 129.1 g cm −2 , and L 4 = 3.16 g cm −2 .

Latitude
Longitude by Zreda et al. (2008); i.e., at the site the cumulative contribution has a 2-e folding depth of around 0.76 m for a prescribed uniform volumetric water content of 0 %, and around 0.12 m for a prescribed uniform volumetric water content of 40 %, with zero lattice water content in both cases.
The multi-algorithm genetically adaptive multi-objective (AMALGAM) method (Vrugt and Robinson, 2007) was used to solve this multi-criteria minimization problem. AMAL-GAM contains highly desirable features for model optimization which facilitate parameter convergence, such as the use  of multi-operator search and self-adaptive offspring creation, as well as the implementation of population-based elitism search. The initial parent population of size n is generated using Latin hypercube sampling (McKay et al. 1979). The fast non-dominated sorting algorithm approach (Deb et al., 2002) is used to assign the Pareto-rank for multiple criteria. Subsequent generation of the offspring (with the same size n) occurs with the use of k operators. The approach adopted in this study, which is similar to that presented by Rosolem et al. (2012), uses a population of size n = 100, and number of operators (search strategies) k = 4, and set the maximum number of generations, s = 1000, so that the total number of simulations (s × n) is 100 000.
This multi-parameter optimization was made at all 42 sites considered in this study to obtain the site-specific preferred values of N, α, and L 3 when the values of L 1 , L 2 and L 4 are specified to be 162.0, 129.1 and 3.16 g cm −2 , respectively. The resulting optimal parameters are given in Table 1 (the factor F given in column four of this table is discussed and used later in Sect. 5). Figure 5 summarizes the overall results of the multi-parameter optimization procedure, given the value of the difference between the simulated neutron count given by COSMIC (with optimized parameters) and the equivalent neutron count scaled from MCNPX, normalized by the MCNPX count (represented by colors for each site and each hypothetical soil moisture profile). Because MCNPX is a Monte Carlo model, the neutron count given by MCNPX is subject to random sampling errors of the order 1 %, and this contributes to some of the normalized differences illustrated in Fig. 5. For a substantial majority of the sites and hypothetical soil moisture profiles the normalized difference between the COSMIC-and MCNPX-simulated neutron counts is within the range 2-3 %, and when averaged over all sites the normalized difference is much less than this (Fig. 5, bottom row). This range in normalized difference is comparable to the measurement uncertainty in the COSMOS probe and the sampling error in the soil moisture field at probe calibration, including for the drier soil profiles for which the differences are greatest.

Correlations and dependencies of optimized parameters
It is of interest to investigate the extent to which the sitespecific optimized values of N, α and L 3 are correlated with each other and with the site-specific values of ρ s , the average bulk density for the soil in g cm −3 , and θ lattice , the lattice water content of the soil in m 3 m −3 . In practice, there is no evidence of correlation between the site-specific value of the parameter N and the site-specific values of ρ s , α and L 3 : linear correlation of these three parameters with N gives R 2 values of 0.01, 0.19, and 0.01, respectively. There is also no evidence of correlation between the site-specific optimized values of α and N with θ lattice at each site (R 2 = 0.04 and 0.06, respectively), and little evidence of correlation of L 3 with θ lattice (R 2 = 0.30). However as Fig. 6 shows, the sitespecific values of L 3 and α both exhibit evidence of correlation with ρ s , the bulk density for the soil at each site, and the site-specific values of L 3 and α are also mutually correlated. Arguably L 3 and α are indeed both independently correlated with ρ s , but the possibility exists that one of the parameters (likely L 3 ) is correlated, and the apparent correlation of the second parameter (α) is because the process of optimization is not able to clearly separate these two variables because  their influence on N COSMOS calculated by Eqs. (3) and (5) is to change its value in opposite directions.
It is worth noting that, in physical terms, a strong correlation between L 3 and ρ s implies the attenuation of fast neutron by (dry) soil is not well described as an exponential decay with a simple single length constant that is independent of the density of soil as assumed in COSMIC. Instead the effective value of the length constant appears to be a near-linear function of soil density. Similarly a (true) correlation between α and ρ s implies that the creation of fast neutron from high neutrons is not perfectly described as a linear function of the local density of dry soil; i.e., in Eq. (2) the product [αρ s ] becomes [0.404(ρ s ) − 0.101(ρ s ) 2 ]. It is possible that the observed correlations of L 3 and α with ρ s may be useful for COSMOS sites where a multi-parameter optimization against MCNPX is not feasible because approximate estimates of L 3 and α might then be made from measured value of ρ s using the following equations: The marked variability in the site-specific optimized values of the parameter N must reflect substantial variability in one or both of the component constants C or N 0 he . However, there should be limited variability in N 0 he because the site-specific neutron calculations given by MCNPX against which calibration was made were corrected for local station effects using a scaling factor to account for differences in cosmic ray intensity as a result of the elevation/cutoff rigidity of the site where the probe is located (for details see Desilets and Zreda, 2003). The contributing variability is therefore presumably primarily associated with the effective value of C. This siteto-site variability is intrinsic to the COSMOS array (rather than a feature associated with the COSMIC model) and is present in the site-specific factor F (given in column 4 of Table 1). F is the ratio between the number of counts observed during COSMOS probe calibration at a specific site and the calculated neutron flux intensity given by MCNPX when run with the soil chemistry and water content (including lattice water) observed at each probe site during calibration. (Note: the factor 10 14 in F arises because MCNPX actually calculates neutron fluence, the time integration of neutron flux, rather than neutron count rate directly.) Figure 7 shows the strong interrelationship between the COSMIC parameter N found by multi-parameter optimization and the factor F : N = −24.46 + 63.16 × 10 −14 F.
The origin of the real site-to-site variability in F across the COSMOS array is currently under investigation. It is possible there is some remnant contribution to variability in F associated with the location and altitude of the probe although the neutron count rates were corrected for these (Desilets and Zreda, 2003). It is also possible that differences in the ambient water vapor content of the air during probe calibration may make some contribution to the variability in F at the level of a few percent (for details, see Rosolem et al., 2013). Otherwise the variability in F is presumably associated with site-to-site differences in soil chemistry or more likely vegetation cover (Franz et al., 2013a, b).

Application of the COSMOS probe at the Santa Rita study site
We tested COSMIC using soil moisture data from a COS-MOS probe and from a distributed sensors network at the Santa Rita Experimental Range field site in southern Arizona. A total of 180 time domain transmissivity (TDT) sensors (Fig. 8a) were installed (Franz et al., 2012b) in 18-paired profiles at 10, 20, 30, 50 and 70 cm within the footprint of the COSMOS probe (Fig. 8b). Figure 8c shows a comparison between the fast-neutron count observed by the COS-MOS probe and that calculated from the area-average soil moisture as measured with TDT sensors using MCNPX and COSMIC. Overall the COSMIC-derived fast-neutron intensity compares quite well with measurements from the COS-MOS probe, and (as should be expected) it compares extremely well with the fast-neutron intensity computed using MCNPX. In some cases, the after-rainfall response is slower than the COSMOS probe because the area-average . Relationship between the COSMIC parameter N found by multi-parameter ization and the factor F, this being the ratio between the number of counts observed the COSMOS probe calibration at a specific site and the calculated neutron flux ity given by MCNPX when run with the soil chemistry and water content (including water) observed at each probe site during calibration. Fig. 7. Relationship between the COSMIC parameter N found by multi-parameter optimization and the factor F , this being the ratio between the number of counts observed during the COSMOS probe calibration at a specific site and the calculated neutron flux intensity given by MCNPX when run with the soil chemistry and water content (including lattice water) observed at each probe site during calibration. soil moisture calculated from TDT point sensors does not sample the near-surface soil moisture above 10 cm depth and, as a result, does not recognize the faster rate of drying of surface soil moisture. Consequently, when the area-average profile measured by the TDT probes is used in the COSMIC model to calculate the COSMOS probe count, the estimated COSMOS count is underestimated.
As previously stated, the primary purpose of the COSMIC model is to facilitate use of observed COSMOS probe counts into LSMs through ensemble data assimilation methods. We foresee two broad data assimilation applications using COS-MIC, specifically to provide i. the best estimate of the rate of change in the areaaverage soil moisture profile when this is being calculated by a prescribed (but perhaps imperfect, e.g., biased) LSM, to obtain improvement in the calculated moisture loss from the surface to the atmosphere, in a Numerical Weather Prediction model for example. Arguably in this application the data assimilation process primarily needs to correct for weaknesses in the highfrequency dynamics of the soil moisture profile calculated by the model rather than its absolute value; and ii. the best estimate of the (albeit LSM-calculated) areaaverage profile of soil moisture at a COSMOS probe site, this as a basis for investigating and building models  of the relationship between area-average soil moisture and area-average hydro-ecological behavior at the site for example. In this application the data assimilation process primarily needs to correct for weaknesses in the absolute value of the model-calculated profile.
It is not the purpose of this paper to consider detailed aspects of the assimilation of COSMOS probe counts into LSMs at many sites and to investigate the validity of particular LSMs; these details will form the subject of future papers. Meanwhile we illustrate the fact that the COSMIC model can be used in the two applications described above by providing an overview of studies in which COSMOS probe data was assimilated into the Noah Land Surface Model (see Koren et al., 1999;Chen and Dudhia, 2001;Ek et al., 2003) at the Santa Rita Range field site (see Kurc and Benton, 2010;and Cavanaugh et al., 2011) for a period during the North American Monsoon when there were rainstorms that generated rapid changes in soil moisture. Ancillary near-surface hourly measurements of meteorological variables available at this site were used to provide the Noah forcing. Noah represents soil moisture in four layers (0.0-0.1 m; 0.1-0.4 m, 0.4-1.0 m, and 1.0-2.0 m) by calculating the input of water at the surface and the movement of water between layers and loss by transpiration from the upper three layers. The data assimilation used only the COSMOS data (i.e., hourly neutron counts) to update the values of soil moisture in each layer. The observational uncertainty in the COSMOS counts is well defined by Poisson statistics and equal to the square root of the sensor hourly count (Zreda et al., 2008), but, given the typical number of counts from an individual COS-MOS probe, this Poisson distribution of the errors can be adequately approximated by a Gaussian distribution. In each of the example cases discussed below the data assimilation is carried out within the National Center for Atmospheric Research (NCAR) Data Assimilation Research Testbed (DART) framework (Anderson et al., 2009), this being a community facility for ensemble data assimilation. The Bayesian framework employed in DART combines the probability distribution of the prior ensemble with the observation likelihood (data distribution) to compute an updated ensemble estimate (posterior distribution) and increments to the prior ensemble. Increments for each component of the prior state vector are computed by linear regression from the increments calculated in observation space. We use the ensemble adjustment Kalman filter (EAKF) discussed in Anderson (2001) applied hourly. The updated ensemble is obtained by shifting the prior ensemble to have the same mean as the continuous posterior distribution, and the posterior ensemble standard deviation is kept the same as the continuous posterior by linearly contracting the ensemble members around the mean. In this application we used 40 ensemble members with both the meteorological forcing and soil moisture initial conditions perturbed following standard procedures as described in the literature (see Table 2). The soil moisture initial conditions are perturbed around a reference value determined by the COSMOS sensor with an initial assumed uniform profile (the conversion from neutron counts to integrated soil moisture is achieved by applying Eq. A1 in Desilets et al., 2010). Sequential data assimilation was applied via the EAKF to neutron counts, and the soil moisture state variables in Noah updated appropriately every time a new (hourly) observation was available. Draper et al. (2011) state that when applying data assimilation methods, a primary goal is to address the cause of bias between the data and model rather than to rely on data assimilation to correct it, while Yilmaz and Crow (2013) also emphasize that biases should be removed prior to assimilating data. There are several ways to remove such bias (through a priori scaling approaches or through a bias estimation module, for example); in the context of a paper whose primary purpose is to describe the formulation and calibration of the COSMIC model, we follow Kumar et al. (2012) and choose to demonstrate application of COSMIC when using two radically different alternate approaches for removing relative bias, i.e. first by assuming the bias is solely in the data and "modifying the data to match the model", and second by assuming the bias is in the model and "recalibrating the model to match the data".
In fact there is a large systematic bias between soil moisture calculated by the Noah LSM and the value deduced from COSMOS observations at the Santa Rita field site. This Table 2. List of meteorological forcing variables applied to the Noah model and perturbed during ensemble data assimilation together with the nature of the perturbation applied to them. The perturbation distribution was either log-normal (i.e., multiplying the reference variable) or normal (i.e., adding to or subtracting from a reference value). The magnitude of perturbations used in the DART framework is based on a literature review of several studies including Zhou et al. (2006), Zhang et al. (2010), Reichle et al. (2002Reichle et al. ( , 2007Reichle et al. ( , 2008, Walker and Houser (2004), Sabater et al. (2007), Kumar et al. (2012), Dunne and Entekhabi (2005) is clearly apparent in the inset graph in the top panel of Fig. 9, which shows that the cumulative distribution function (CDF) of neutron counts computed by COSMIC using soil moisture profiles from an offline simulation of Noah LSM (NOAH-COSMIC, shown in black) has systematically lower values than those observed by both the COSMOS sensor (COSMOS-real, shown in blue) and the counts computed with the average soil moisture profile from the TDT network (TDT-derived, shown in purple). Although it is clear that in this particular case the source of bias originates from the inability of the model to accurately represent reality, nonetheless we proceed to demonstrate use of COSMIC when used in the "modifying the data to match the model" approach and apply CDF-matching (Reichle and Koster, 2004;Drusch et al., 2005) to scale the COSMOS observations (COSMOSscaled, in green) to match the CDF obtained from Noah LSM offline simulation. Figure 9a shows the time series of the resulting scaled version of the observed neutron count (green) together with the neutron count (calculated by COSMIC) from the soil moisture profiles simulated by the Noah model when running open loop (black) and with the assimilation of COSMOS data (red). Similarly, Fig. 9b shows the depthaverage soil moisture for the Noah model when running open loop (black) and with the assimilation of COSMOS data (red), together with the area-average soil moisture measured by the TDT network (purple). To enhance consistency between these three depth averages they are all weighted by the relative contribution to the aboveground fast-neutron flux for each level (calculated by COSMIC).
To demonstrate use of COSMIC when used in the "recalibrating the model to match the data" approach, we next   Panel (b) shows the depthaverage soil moisture for the Noah model when running open loop (black) and with the assimilation of COSMOS data (red) together with the area-average soil moisture measured by the TDT network (purple). To enhance consistency between the three depth averages in (b) they are all weighted by the relative contribution to the aboveground fast-neutron flux for each level (calculated by COSMIC). sought to eliminate the systematic bias by improving the performance of Noah LSM via a priori parameter calibration. When doing this we again employed the AMALGAM method (see Sect. 3) with n = 100, k = 4, and s = 200 to constrain 10 parameters used in Noah (and each individual layer soil moisture initial condition) which were selected based on a preliminary sensitivity analysis. We found that the values of all ten parameters were changed by calibration to some extent, but four model parameters changed significantly, namely FXEXP, REFKDT, SMCREF, and DKSAT, which control bare-soil evaporation, surface infiltration, onset of transpiration stress due to soil water content, and soil hydraulic conductivity, respectively. This multi-objective optimization was performed on the individual components of the mean squared error (Gupta et al., 2009;Rosolem et al., 2012) between observed neutron counts and neutron counts computed via COSMIC from model-derived soil moisture profiles. The recalibrated version of the Noah model was then used in an experiment in which the observed (unscaled) neutron counts were assimilated. Figure 10a shows the time series of the observed neutron count (green) together with the neutron count (calculated by COSMIC) from the soil  (black) and with the assimilation of COSMOS data (red), together with the area-average soil moisture measured by the TDT network (purple). To enhance consistency between these three depth averages they are all weighted by the relative contribution to the above ground fast neutron flux for each level (calculated by COSMIC).  (black) and with the assimilation of COSMOS data (red), together with the area-average soil moisture measured by the TDT network (purple). To enhance consistency between these three depth averages they are all weighted by the relative contribution to the aboveground fast-neutron flux for each level (calculated by COSMIC). moisture profiles simulated by the recalibrated Noah model when running open loop (black) and with the assimilation of COSMOS data (red). Figure 10b shows the depth-average soil moisture for the Noah model when running open loop (black) and with the assimilation of COSMOS data (red), together with the area-average soil moisture measured by the TDT network (purple). Again, to enhance consistency between these three depth averages they are all weighted by the relative contribution to the aboveground fast-neutron flux for each level (calculated by COSMIC).
In both of the very different data assimilation demonstrations just described, the COSMIC model provided an effective mechanism for translating model-calculated soil moisture profiles into aboveground fast-neutron count when applied using EAKF-based assimilation within the DART framework. The resulting improvements in model performance are illustrated in Figs. 9 and 10 and documented in Table 3. Arguably the two different approaches for removing bias between data and model just demonstrated (i.e., "modifying the data to match the model" and "recalibrating the model to match the data") might respectively be considered appropriate for use in the COSMOS probe data assimilation applications (i) and (ii) (see above). The results in Table 3 clearly demonstrate that there is an improvement in the statistical metrics when neutron counts are assimilated relative to the open-loop case either when the observations are scaled or the Noah model calibrated. However, Table 3 also suggests that the calibration of Noah successfully removed most   Table 3 when other metrics are analyzed and when the integrated soil moisture is compared with the average soil moisture measured by the TDT network.

Summary and conclusions
This study showed that COSMIC, a simple, physically based analytic model, can substitute for the time-consuming MC-NPX model in data assimilation applications, and that COS-MIC can be calibrated by multi-parameter optimization at 42 COSMOS sites to provide calculated neutron fluxes which are within a few percent of those given by the MCNPX model. The parameters α and L 3 are correlated with ρ s , the bulk density for the soil at each site, and consequently are mutually correlated. This correlation with ρ s might provide an approximate estimate of their value if parameter optimization against MCNPX model is not feasible. The value of N , the third optimized parameter in COSMIC, is very strongly related to F , i.e., to the ratio between the number of counts observed during COSMOS probe calibration at a specific site and the calculated neutron fluence given by MCNPX when run with the soil chemistry and water content (including lattice water) observed at each probe site during calibration. The origin of this real site-to-site variability in F across the COSMOS sensor array, which is presumably mainly associated with site-to-site differences in soil chemistry or more likely vegetation cover, is currently under investigation.
It was demonstrated at the Santa Rita Experimental Range field site that the aboveground neutron count rates calculated by COSMIC from an area-average soil moisture profile independently measured using TDT sensors agreed well with observed neutron count rates measured by the COSMOS probe at this site. It was further demonstrated that when the calibrated COSMIC model was applied at this site, it provided an effective mechanism for translating model-calculated soil moisture profiles into aboveground fast-neutron count when used within a data assimilation framework to assimilate COSMOS probe counts into the Noah model with two radically different approaches used to remove the bias between observations and model. The COSMIC model is freely available for download at the COSMOS website (http://cosmos. hwr.arizona.edu).