Evaluating topographic wetness indices across central New York agricultural landscapes

Accurately predicting soil moisture patterns in the landscape is a persistent challenge. In humid regions, topographic wetness indices (TWIs) are widely used to approximate relative soil moisture patterns. However, there are many ways to calculate TWIs and very few field studies have evaluated the different approaches – especially in the US. We calculated TWIs using over 400 unique formulations that considered different digital elevation model (DEM) resolutions (cell size), vertical precision of DEM, flow direction and slope algorithms, smoothing via low-pass filtering, and the inclusion of relevant soil properties. We correlated each TWI with observed patterns of soil moisture at five agricultural fields in central NY, USA, with each field visited five to eight times between August and November 2012. Using a mixed effects modeling approach, we were able to identify optimal TWI formulations applicable to moderate relief agricultural settings that may provide guidance for practitioners and future studies. Overall, TWIs were moderately well correlated with observed soil moisture patterns; in the best case the relationship between TWI and soil moisture had an averageR2 and Spearman correlation value of 0.61 and 0.78, respectively. In all cases, fine-scale (3 m) lidar-derived DEMs worked better than USGS 10 m DEMs and, in general, including soil properties improved correlations.


Introduction
Soil moisture is a key variable controlling a host of important hydrological and biogeochemical processes and, thus, imposes a considerable ecohydrological fingerprint on Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | proposed techniques to better describe and predict the spatial distribution of soil water (e.g., Zhao et al., 1980;Jackson, 1993;Larson et al., 2008;Mallick et al., 2009;Sayde et al., 2010). Perhaps the two most common approaches involve: (i) often complex, distributed watershed models that numerically simulate the physical processes governing soil water dynamics or (ii) more simple terrain-based indices based on topography and 5 sometimes soil properties.
The detailed numerical approach is typically incorporated into distributed hydrologic modeling frameworks and has been shown to provide reasonable simulations of soil moisture patterns (Frankenberger et al., 1999;Motovilov et al., 1999;Mehta et al., 2004;Cuo et al., 2006). However, such models often require extensive data input and 10 calibration, are generally prohibitively complex for conservation planners to use (Lane et al., 2006;White et al., 2010) and frequently suffer from equifinality issues (Beven, 2006).
Terrain indices offer a simpler alternative that, due to their parsimonious formulation and moderate parameterization requirements, can be efficiently applied at larger spatial 15 scales while maintaining a relatively fine spatial resolution. Such indices are typically applied via their cumulative distribution functions, which afford the estimation of total contributing area, as well as the spatial distribution of saturation deficit (or soil moisture) (Western et al., 1999). This facilitates both continuous-and event-based hydrologic predictions as well as targeted environmental management decisions. Although terrain 20 indices can include primary terrain attributes such as curvature, slope or aspect; here we focus on so-called compound terrain derivatives that synthesize several primary indices as they are generally better correlated with observed soil moisture patterns (Moore et al., 1988(Moore et al., , 1991Western et al., 1999).
The most well-known and widely applied compound terrain derivative in hydrology Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the TWI concept has been integrated into many popular hydrologic models (e.g., TOP-MODEL, Beven and Kirby, 1979;VSLF, Schneiderman et al., 2007;SWAT-VSA, Easton et al., 2008) and pollution risk indices (Agnew et al., 2006;Reaney et al., 2011;Marjerison et al., 2011;Buchanan et al., 2013). Despite its wide application, large-scale corroboration of TWI-based predictions of landscape-scale soil moisture patterns using 5 actual field observations are the exception rather than the rule. Indeed, most previous empirical validation efforts have focused on collecting high density field observations over very small spatial scales -typically individual hillslopes, fields or plots. Interestingly, these studies have found a wide variety of correlation strengths -with R 2 values ranging from 0-0.89 (Burt and Butcher, 1985;Moore et al., 1988;Ladson and Moore, 10 1992; Jordan, 1994;Schmidt and Persson, 2003;Western et al., 2004;Tague et al., 2010), and Spearman correlation coefficients between −0.13 and 0.65 (Nyberg, 1996;Hellstrand, 2012). Though some of this variability is undoubtedly attributable to differences in the physiography, geology, climate, and vegetation of the respective study areas, the fundamental reasons for these discrepancies are generally unresolved. 15 In this study we are focusing on agricultural landscapes in the Northeastern US, where several researchers have concluded that variable source area (VSA) plays a central role in agricultural NPS pollution (e.g., Rossing and Walter, 1995;Frankenberger, 1996;Gburek and Sharpley, 1998;Frankenberger et al., 1999;Walter et al., 2000Walter et al., , 2001Gburek et al., 2002;Czymmek et al., 2003;Agnew et al., 2006;Qiu et al., 20 2007;Qiu, 2010;Margerison et al., 2011), which is the leading source of regional freshwater impairment (USEPA, 2009). Risks of VSA storm runoff generation are closely correlated with soil moisture (or soil moisture deficit) (e.g., Walter et al., 2000;Agnew et al., 2006;Lyon et al., 2006a, b;Shaw and Walter, 2009;Cheng et al., 2013). Therefore, identifying effective methods for predicting patterns of soil moisture is important 25 for developing strategies that incorporate VSA hydrology into NPS agricultural pollution mitigation strategies. TWIs are a potentially useful tool for doing this, but it is not clear from previous studies how best to calculate it or which data should be used. Until recently, the latter issue was essentially moot because there were few options. Introduction In recent years, advances in geographic information systems (GIS) and an increase in the availability of high resolution light detection and ranging (LiDAR) data have resulted in detailed and potentially more realistic representations of surface topography, which is the primary data used to calculate a TWI. To some extent, discrepancies in TWI-soil moisture correlations of the previously mentioned studies may be due to variations in the accuracy of the underlying DEM data. Only a few studies have specifically examined the advantages of LiDAR-derived TWIs relative to other less precise DEM sources, such as the standard USGS 10 m DEMs (e.g., Tenenbaum et al., 2006;Murphy and Ogilvie, 2009).
In addition to vertical DEM precision and accuracy, researchers have demonstrated 10 that the TWI is sensitive to many other factors including: DEM cell size, flow direction algorithm, slope algorithm and the inclusion of relevant soil properties. For example, in two boreal forest sites in Sweden, Sørensen et al. (2006) demonstrated that correlations between the TWI and soil moisture, groundwater depth, soil pH and plant species richness varied with the choice of slope and contributing area algorithm. Sim- 15 ilarly, Güntner et al. (2004) found that the relationship between TWIs and mapped areas of saturation in two catchments in southwestern Germany were dependent on how slope, contributing area, soil properties and climate were incorporated into the TWI. Remarkably, despite the clear sensitivity of the TWI to these factors, no study has conducted a comprehensive and systematic evaluation in the US. 20 By correlating TWI maps with observed patterns of soil moisture at numerous agricultural fields in central NY, this study addresses three key research questions: (i) does the TWI provide reasonable estimates of soil water distribution in Northeast US agricultural landscapes, (ii) does that relationship hold across multiple field sites that possess moderately different topographic, land management and soils characteristics and (iii) given Introduction

Study area
Soil moisture measurements were made at five agricultural field sites located in four different catchments in south-central New York, USA (Fig. 1). The sites are characterized by moderate slopes (4.8-6.6 %) and agricultural land uses (i.e., typically soybean, 5 grass, corn, and fallow; Table 1). Soil types across the field sites were predominantly channery silt loams derived from siltstone, sandstone, shale, and limestone (i.e., Lanford, Eria and Erie-Ellery channery silt loams) and underlain by a shallow frangipan restrictive layer (average depth ∼ 0.4 m). Due to the low permeability of the shallow restrictive layer, soil moisture in the upper soil layer is a key variable influencing runoff 10 generation, which is primarily a saturation-excess process in the study region (Walter et al., 2003;Easton et al., 2008).

Field data
Volumetric soil moisture readings in the upper 12 cm were collected with Time Domain Reflectometry (TDR) probes across a gradient of TWI values at each site. A minimum 15 of three TDR readings were recorded at each sampling point and used to calculate the average point VWC for each date. All sampling points were located with GPS units (horizontal accuracy ∼ 3 m). Field sites were sampled from mid-August 2012 to the end of November 2012 (Table 2). For storms greater than 6 mm, a minimum of 24 h elapsed before collecting TDR measurements in order to allow for gravity-driven redis-20 tribution of soil moisture. All VWC measurements were normalized by the average field soil moisture for each sampling date. Consequently, all soil moisture values represent a relative measure of wetness. Gravimetric soil moisture measurements, made on soil cores taken from each site were used to calibrate the TDR probe. The soil cores were collected across a range 25 of wetness conditions. A calibration curve, which related TDR period and gravimetric 14046

TWI modifications and analysis methods
We examined how the strength of the correlations between soil moisture and TWI were influenced by different combinations of the following factors: (i) inclusion of soil prop-5 erties, (ii) vertical accuracy of the DEM source data, (iii) cell size of the base-DEM, (iv) slope algorithm, (v) contributing area algorithm and (vi) smoothing of the final TWI. All TWI values were extracted from the sample point coordinate data using bilinear interpolation from the four nearest grid cells. Bilinear interpolation provided a more representative estimate of the point TWI value given the 3 m horizontal accuracy of our GPS 10 units. The overall analyses resulted in 432 unique TWI formulations. The various parameter combinations used to construct the TWIs are discussed more explicitly below. All terrain analyses were conducted with SAGA-GIS and automated via the RSAGA package in R (Brenning, 2007). 15 Two different methods for calculating the TWI were compared: the original topographic index (TI) proposed by Beven and Kirkby (1979) and the soil-topographic wetness index (STI), which extends the purely topography-based TWI by accounting for spatial variation in hydrologically relevant soil properties (Beven, 1986 where α is the upslope contributing area per unit contour length and β is the slope (m m −1 ). The STI is expressed as (Walter et al., 2002;Lyon et al., 2004):

TWI form: STI vs. TI
where T is the soil transmissivity (m 2 d −1 ) computed as the product of the average saturated hydraulic conductivity (m day −1 ) and the depth to restrictive layer (m); note: 5 this is somewhat different from the STI originally proposed by Beven (1986), which assumed an exponential decrease in hydraulic conductivity with depth and the saturated hydraulic conductivity at the bottom of the soil is approximately zero. However, this way of calculating the STI has been used in several regional modeling studies and been shown to work reasonably well in the Northeast US (e.g., Agnew et al., 2006;Lyon et al., 2006a, b;Schneiderman et al., 2007;Easton et al., 2008). Soil properties were derived from the USDA-NRCS Soil Survey Geographic (SSURGO) database using the Soil Data Viewer application (USDA-NRCS, 2009).

Source data: USGS vs. LIDAR
TWIs were calculated based on publicly available United States Geological Survey

15
(USGS) DEMs as well as high resolution LiDAR data. The USGS DEMs were obtained from the National Elevation Dataset (http://viewer.nationalmap.gov/viewer/) at a 1/3 arc-second (∼ 10 m) resolution. Although the National Elevation Dataset does include 1/9 arc-second (∼ 3 m) DEMs, they were not available for our study region. The USGS DEMs, typically derived from any of four production methods (i.e., electronic im-20 age correlation, manual profiling on stereoplotters, contour-to-grid interpolation or an improved contour-to-grid interpolation known as "LineTrace+"), possess considerably less vertical accuracy than LiDAR DEMs ( the resulting grid resolution would exceed the scale at which the original source data were derived -thereby implying an erroneous degree of accuracy in the underlying elevation data. Consequently, all USGS DEMs were evaluated at the original 10 m resolution. The LiDAR DEMs were generated at 3 m and 10 m resolutions from point cloud data of filtered ground shots (average point spacing ∼ 0.67 m) via natural neighbor 5 interpolation.

Cell size: 3 m vs. 10 m
Previous studies have demonstrated that terrain derivatives (e.g., slope and contributing area) and, thus, TWIs can be substantially affected by the cell resolution of the base DEM (Hasan et al., 2012;Sørensen and Seibert, 2007). Here we investigate two commonly used DEM resolutions, which are particularly relevant for high resolution distributed hydrologic and water quality modeling: 3 m vs. 10 m. The LiDAR data were interpolated to both 10 m and 3 m DEMs. The overall parameter set for this group includes: (i) 10 m LiDAR TWIs and (ii) 3 m LiDAR TWIs. 15 Four methods for calculating slope were compared: (i) maximum triangle slope (MTS; Tarboton, 1997), (ii) least squares fitted plane (LSFP; Horn, 1981), (iii) 2nd degree polynomial (SDP; Zevenbergen and Thorne, 1987), and (iv) the downslope index (DSI; Hjerdt et al., 2004). In contrast to MTS, LSFP and SDP, which are considered "local slope" algorithms, because they only consider the cell of interest and its neighbors, DSI 20 is defined as the slope to the closest point that is d meters below the cell of interest.

Slope calculation: local slope vs. downslope index
The DSI provides a potential improvement to the local-slope methods because it can account for downslope controls on local soil moisture conditions and thereby relaxes the assumption of parallelism between surface topography and groundwater tables, i.e., the kinematic approximation of water

Flow accumulation algorithm
Perhaps the parameter that influences TWI values the most is the contributing area (a) which can be calculated with a variety of different algorithms. Here, we compare six different flow direction algorithms, which, broadly speaking, can be divided into 10 two main categories: single flow direction (SFD) and multiple flow direction (MFD). The principal difference between the single vs. multiple flow direction groups lies in how flow is apportioned to downslope cells. As the name implies, single flow direction algorithms assign all flow to a single downslope cell, whereas MFD algorithms allows flow to be split among multiple cells. 15 By far, the most commonly used algorithm currently is the D8 form proposed by O' Callaghan and Mark (1984) and coded as the default routine into the hydrologic toolsets of most popular GIS platforms (e.g., ArcGIS, MapWindow, QGIS). The D8 algorithm is simple and computationally efficient. D8 apportions all flow into a single downslope cell determined by the steepest gradient among eight cardinal and intercar-20 dinal directions. This may oversimplify actual flow paths, especially in convex terrain and flow-divergence areas, which may lead to incorrect representations of contributing area and flow pathways (overly straight and parallel). D8 is also very sensitive to minor elevation differences in adjacent cells and this can be exacerbated at high DEM resolutions (Park et al., 2009;Erskine et al., 2006). 25 The second SFD algorithm tested in this study is the randomized single-flow direction method (Rho8) proposed by Fairfield and Leymarie (1991). The method results in stochastic flowpath delineations through the incorporation of a uniformly distributed 14050 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | random variable into the calculation of slope gradient. This alleviates the overly linear and parallel flow line issues of D8, but flow is still apportioned into a single downslope cell. Moreover, the results are not reproducible and not always physically-based due to the random factor. The multiple flow direction (MD) approach of Freeman (1991) addresses the short-5 comings of SFD approaches by allowing for flow divergence into adjacent downslope cells as a proportion of the slope gradient. The MD method results in smoother, seemingly more physically realistic flow pathways and flow accumulation patterns relative to SFD algorithms -especially in steeper terrain. The biggest drawback is that in valley bottoms and other low-lying areas, flow dispersion can be unrealistic (Costa-Cabral 10 and Burges, 1994; Tarboton, 1997). The D∞ method of Tarboton (1997) helps reduce the excessive flow dispersion issues of the MD by calculating slope as a function of eight triangular facets, where flow is apportioned to the two downslope cells nearest to the steepest direction weighted as a function of their distance from this direction. Although D∞ does afford multi-cell 15 flow divergence, its restriction to only two downslope cells may become a limitation on convex hillslopes where dispersion is unrealistically confined. The braunschweiger relief model (BR, Bauer et al., 1985) also allows flow dispersion to multiple, adjacent downslope cells, but restricts dispersion to only three cellsthereby limiting the degree of divergence, but allowing more than D∞. The proportion 20 of flow allotted to each cell is determined by iteratively categorizing the slope direction as defined by an upslope polygon. The upslope polygon is solved for until the source cell is reached. Flow direction is then computed as a function of slope gradient and aspect of the four neighboring pixels (Park et al., 2009).
The final flow direction algorithm we evaluate is multiple triangular flow direction 25 (MD∞). First proposed by Seibert and McGlynn (2007), MD∞ extends the D∞ approach by allowing flow dispersion into more than two downslope cells. MD∞ attempts to strike a balance between the potentially excessive flow dispersion of MD and the restrictive flow dispersion of D∞ -especially on convex slopes. 10,2013 Evaluating topographic wetness indices across central New York The parameter set for flow accumulation calculation includes six flow direction algorithms: (i) D8, (ii) Rho8, (iii) BR, (iv) D∞, (v) MD and (vi) MD∞.

Smoothing: filtered vs. unfiltered
High resolution DEMs tend to result in high local variations in TWI values, which may translate to unrealistically irregular predictions of soil moisture and water table depths 5 (Hjerdt et al., 2004;Lanni et al., 2011). Low-pass digital filtering helps to smooth out anomalous local variations by averaging across a user-defined search window. Also, by averaging across non-local grid cells, filtering can potentially incorporate downslope influences, as well as "smear" the results of SFD algorithms, which may result in an intermediate level of flow dispersion. For this study, we applied a 3 × 3 pixel low-pass 10 mean filter to the TWI maps. Thus, the parameter set for this section include: (i) filtered TWIs and (ii) unfiltered TWIs.

TWI performance criteria and statistical methods
A mixed effects modeling analysis was used to identify the optimal TWI formulation for the USGS and LiDAR datasets. Subsequently, the optimal models were validated 15 against our observed data by calculating Spearman rank correlation coefficients (r s ) and coefficients of determination (R 2 ). Both sets of analyses are discussed in detail below.

Mixed effects modeling
To control for the lack of independence among sampling points and field sites (i.e., re-20 peated measures) we applied a linear mixed effect model structure with sampling date and point I.D. as random effects. Fixed effects included the main effect, TWI form, as well as field site and sampling date. The resulting optimal model was validated to verify that the underlying statistical assumptions were not violated; homogeneity of variance was evaluated by plotting residual vs. fitted values, independence was examined by HESSD 10,2013 Evaluating topographic wetness indices across central New York plotting residuals vs. each explanatory variable, and normality of residuals was evaluated by plotting theoretical quantiles vs. standardized residuals (Q-Q plots). We also evaluated the degree of spatial autocorrelation amongst soil moisture measurements via variogram analysis and no consistent trends were observed (data not shown). We attribute the lack of significant spatial autocorrelation to the fact that our field sampling 5 protocol was conducted using a cluster approach as opposed to linear transects or equal-interval sampling grids. The relative performance of the 432 different TWIs were evaluated by comparing Akaike Information Criterion (AIC) values derived from the mixed effects models. The AIC is a goodness-fit-index that provides a measure of the relative as opposed to abso-10 lute fit. Thus, the AIC is intended to facilitate model comparisons from the same dataset and aids in the selection of optimal models, with lower AIC values indicating a better fitting model (Akaike, 1973(Akaike, , 1974. The relative performance of the models in the six different TWI parameter-groups listed above (i.e., source data, TI form, cell size, slope algorithm, flow direction algo-15 rithm and smoothing) were evaluated by two methods: (i) comparing the mean, median and overall probability distribution of AIC values via violin plots (see Hintze and Nelson, 1998 for a detailed description of these plots) and (ii) by pairwise comparison of TWIs that share the exact same parameter values in all respects except for, of course, the particular TWI parameter in question (see Table 3 for an example). 20 The following generally accepted guidelines when comparing AICs were adopted for this study (Burnham and Anderson, 2002): models with AIC values within 2 of each other were not considered significantly different; AIC values within only 3-7 units of each other were considered moderately different; AIC values > 10 were considered significantly different from each other. 25 To facilitate the identification of the overall best fitting TWI from the entire set of models, we also calculated delta AICs (∆AIC), Akaike weights (AICw i ) and evidence ratios (E-ratio). The ∆AIC is simply the difference between the i-th model and the optimal model, calculated as follows: HESSD 10,2013 Evaluating topographic wetness indices across central New York where AIC i is the AIC value for the ith model and AIC opt is the AIC value of the best model (minimum AIC value). Akaike weights provide an effective way to interpret the ∆AIC values by comparing the ratio of each model to the best model relative to the entire set of candidate models as follows: given a set of K models being evaluated. Evidence ratios provide a more concise way to quantify the weight of evidence in support of one model over another and are calculated simply as the ratio of Akaike weights (AICw opt /AICw j ), where AICw opt is the estimated best model in the dataset, and j indexes the remaining models in the set. An 10 evidence ratio less than or equal to three relative to another model suggests equivalence between the models (Burnham and Anderson, 2002). All statistical analyses were conducted using the "lme4" package within the R statistical programming environment (Bates et al., 2011; R Core Team, 2011). 15 To validate the optimal models identified via the AIC analysis, we calculated r s and R 2 values, which were averaged across field sites and sampling dates as a means for controlling for the lack of independence among soil moisture measurements (albeit more crudely than the mixed effects models). The average r s and R 2 values not only help to evaluate the accuracy of the optimal TWIs, but also facilitate inter-study comparisons 20 as most previous research assessed the strength of correlation between soil moisture patterns (either observed or model generated) and various TWI formulations via these two metrics.

Source data: USGS vs. LIDAR
A comparison of the means and overall distributions of AIC values reveals that LIDARbased TWIs consistently provide a better fit to the patterns of observed soil moisture than USGS-based TWIs across the full range of parameter combinations (Fig. 2). Mean 5 AIC values differ by more than 170, with no overlap in distribution. The AIC distribution of the LiDAR dataset is substantially greater than the USGS indicating that the different parameter combinations had a greater influence on the performance of LiDAR TWIs. The majority of other researchers have evaluated the effect of vertical DEM accuracy on terrain indices by either: (i) comparing TWIs with different vertical information contents to each other via spatial statistics (distribution functions, spatial pattern analysis; Sørensen and Seibert, 2007;Vaze et al., 2010) or (ii) by calculating topographic attributes from DEMs of varying information content and evaluating their effect on hydrologic and water quality model predictions at the basin outlet (Zhang and Montgomery, 1994;Grabs et al., 2009;Kenward et al., 2000). In general, these studies have found that higher quality vertical information results in appreciable improvements in the representation of topographic surfaces, more accurate delineations of hydrologically relevant parameters and more appropriate model outputs especially regarding spatially distributed information. To our knowledge, only two other studies, Tenenbaum et al. (2006) and Murphy and Ogilvie (2009) Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | eations were necessary, but not in a forested catchment where a coarser, photogrammatic DEM better captured the more generalized soil moisture patterns. Despite some variability in results, the AIC values in this study indicate that the higher data storage costs and lower computation efficiency associated with higher resolution LiDAR data is justified by the substantial improvement in predictive ability. However, we 5 recognize that such datasets are not always readily obtainable. Consequently, hereafter, we will analyze the LiDAR and USGS datasets separately to facilitate identification of the optimal TWI for both DEM source types.

TWI form: STI vs. TI
The distribution, mean and median of TWI formulations that incorporated SSURGO 10 soils data correlated with soil moisture patterns better than those that did not include both the LiDAR and USGS datasets (Fig. 3). The average pairwise difference in AIC values was 31 and 23 for the LiDAR and USGS datasets, respectively. This suggests substantial improvement in predictive accuracy due to the inclusion of soils data regardless of DEM source. The pairwise comparison found only four moderate (∆AIC < 10) 15 exceptions to this rule and only in the LiDAR dataset (Table 4). In other words, the STI fit the empirical dataset better than the TI in 428 out of 432 cases. Also, it should be noted that these four exceptions were well outside the lower quartile range of the STI group and so were not among the better forms of STI. Furthermore, all exceptions used a 10 m cell size and BR flow accumulation, which as we show later, performed 20 relatively poorly and could therefore, be a result of a chance combination of factors.
Few studies have directly examined the benefits of including soils data in the TWI formulation via comparison with empirical data. Güntner et al. (2004) compared the ability of STIs and TIs to predict the aerial extent of saturation as defined by pedological and geobotanical mapping criteria. The incorporation of soils data was found to improve 25 index performance only when transmissivity values were calibrated. However, the authors acknowledge that the soils data used in their study were only "rough estimates" and thus, their results are not necessarily comparable with ours. Overall, our results suggest that despite the fact that SSURGO soils data are generated at coarse scales, they provide useful, hydrologically relevant information that helps to improve LiDAR-and USGS-based terrain indices at the scale of individual farm fields.

Cell size: 3 m vs. 10 m 5
The means and overall distributions of AIC values suggest that higher resolution 3 m TWIs provide a far better fit to observed patterns of soil moisture relative to 10 m LiDAR-based TWIs (Fig. 4). Additionally, the 3 m TWIs outperformed the 10 m across all pairwise comparisons, with an average pairwise AIC difference of over 55. The better correlations to soil moisture with the 3 m TWIs indicate that the added data stor-10 age and associated computation costs of the 3 m dataset may be warranted. This is somewhat in contrast with the findings of Zhang and Montgomery (1994) who argue that terrain derivatives computed from 10 m DEMs provide a reasonable compromise between complexity and accuracy. However, their study was conducted almost two decades ago, when data storage limitations and processing rates were more of a con- 15 cern. Further, their study evaluated the appropriateness of DEM resolution via comparisons of frequency distributions and TOPMODEL predictions, but no field data were used. Sørensen and Seibert (2007), on the other hand, found substantial differences between 5 m and 10 m TWI grids -though they do not necessarily recommend one over the other and instead point out that the appropriate resolution may be dependent 20 on the particular terrain feature or hydrologic characteristic in question.
We attribute the consistently better performance of the 3 m-based TWI-values primarily to more accurate and discrete delineations of flow pathways which, based on our field observations were quite small (often on the order of 1-4 m). According to our field observations, these micro-topographical features exert a considerable influence 25 on downslope soil moisture distribution. However, these features occur at scales much finer than 10 m-based TWIs and are therefore often not captured appropriately. Note, however, that Wolock and Price (1994)  better represented by coarser DEMs as they result in smoother, more realistic predictions, though it is not obvious that this is strongly comparable to soil moisture.

LiDAR
The least squares fitted plane (LSFP), maximum triangular slope (MTS) and 2nd de-5 gree polynomial (SDP) methods, as well as the downslope index with the distance parameter set to 2 m (DSI-2m) outperformed the downslope index set to 5 and 10 m (DSI-5m, DSI-10m, respectively) across the LiDAR dataset (including both 3 and 10 m cell sizes; Fig. 5). Also, although the means and distributions of the best performing slope algorithms were similar, the maximum triangular slope resulted in the best-fitting 10 TWI and did moderately better than the next best method with a mean pairwise AIC difference of 4. There were, however, 35 cases where the DSI-2m fitted the LiDAR data better (see Buchanan, 2013 for a table of the specific TWIs) and the DSI-2m possessed a lower mean AIC value. This suggests there may be little difference between the two methods. The generally better performance of the local slope algorithms for the 15 LiDAR dataset is consistent with Günter et al. (2004) and Sørensen et al. (2006) who found that the local slope achieved a higher correlation with observed patterns of soil moisture, wetness degree and groundwater depth than the DSI.

USGS
In contrast to the LiDAR dataset, the downslope index with a d parameter set to 5 m re-20 sulted in the best-fitting USGS TWIs, while the all three local slope algorithms resulted in the worst performing TWIs (Fig. 5). The mean pairwise difference in AIC values for the top two groups was 9, indicating a moderate advantage to using the DSI-5m vs. DSI-2m. Additionally, there were no exceptions to this across the pairwise comparisons. 10,2013 Evaluating topographic wetness indices across central New York The likely explanation for the stronger performance of the non-local downslope index in the USGS dataset is that USGS DEMs provide a more generalized representation of the actual topography and therefore emphasize coarser-scale terrain characteristics such as the transition from hilltop to valley bottom. Conversely, LiDAR DEMs capture more subtle micro-topographical features such as small surface depressions and 5 drainages that exert a much smaller influence on upslope drainage conditions. Thus, the DSI is likely be overly sensitive to the highly varied terrain surfaces of LiDAR DEMs and may therefore, lead to erroneous soil moisture predictions upslope of these small features. The DSI is probably more applicable to coarser USGS DEMs that better capture large-scale surface forms which are likely to exert an appreciable effect on upslope 10 drainage. This has important implications for non-local slope algorithms -suggesting the need for an additional parameter that adjusts for the scale of topographic features, such that larger hillslope transitions are emphasized while very small topography is de-emphasized.

LiDAR
The mean and overall distributions of AIC values suggest that the multiple flow direction algorithms fit patterns of observed soil moisture much better than single flow direction when using 3 and 10 m LiDAR-derived TWIs (Fig. 6). The average AIC difference in the SFD vs. MFD groups was roughly 27, which highlights the rather substantial ad-20 vantage of using MFD. Even so, there was very little difference in the performance level amongst the MFD groups when using LiDAR data (i.e., the means of the top three MFD groups were essentially equal). The only real exception was the BR algorithm of Bauer et al. (1985), which performed considerably worse than the other MFD formulations. The BR method results in similar index values to D∞, MD and MD∞ in upland ar- 25 eas, but much lower values in drainages and low-lying areas (data not shown). These small drainages and subtle convergent zones were important features in the sites used HESSD 10,2013 Evaluating topographic wetness indices across central New York in this study. Their omission by the BR algorithm is likely the root cause of its poor performance relative to the other MFD algorithms. Overall, the MD∞ of Seibert and McGlynn (2007) achieved the lowest AIC value, suggesting it produced the best-fitting model. However, the mean pairwise difference between MD∞ and the next best group was < 2 AICs and there were over 25 pairwise exceptions (see Buchanan, 2013 for  table of the specific TWIs). Similar to our findings, Güntner et al. (2004), Sørensen et al. (2006), and Park et al. (2009) showed that MFD algorithms resulted in considerably higher correlations with observed soil moisture patterns than SFD. Nevertheless, Park et al. (2009) and Erskine et al. (2006) showed that the relative differences between the SFD and MFD groups were inversely related to cell size, which indicates an interaction between our analysis of cell size and flow contributing algorithms. In particular, Park et al. (2009) found that beyond 20 m cell sizes the performance of single and multiple flow direction algorithms tended to converge. To investigate this potential interaction, we plotted the AIC distributions for each flow accumulation scheme for each cell size in the LiDAR 15 dataset (Fig. 7). It is evident from Fig. 7, that the multi-flow direction algorithms provide better model fits across the range of tested cell sizes. However, the AIC difference between the means of the SFD and MFD groups declines from 35 to 18 when going from a 3 m to 10 m grid size, corroborating the idea that MFD performance declines inversely with cell size. As Erskine et al. (2006) points out, single-and multiple-direction 20 algorithms are most similar in flow convergence zones (e.g., valley bottoms) and as cell size increases, the "percentage of the total drainage area classified in the lower region [convergent areas] increases" -yielding more and more similarity in the represented topography with increasing cell size. Interestingly, Sørensen et al. (2006), Endreny and Wood (2003) and Güntner et al. (2004) used raster DEMs with grid sizes greater than the 20 m similarity threshold identified by Park et al. (2009) and yet still found substantial differences between SFD and MFD algorithms. This discrepancy may be explained by differing vertical accuracies in the base DEMs between studies or differences in the HESSD 10,2013 Evaluating topographic wetness indices across central New York topography of their unique study sites. Regardless, the lack of inter-study agreement warrants further, more systematic investigation.

USGS
Similarity in the means of the single-and multiple-direction flow accumulation methods suggests the choice of algorithms is not as consequential when using coarser USGS 5 DEMs (Fig. 8). Interestingly, however, when examining the best performing TWIs from each flow direction type, a trend appears that is the reverse of the LiDAR dataset. Namely, that single-as opposed to multiple-direction formulations performed systematically better. It is important to note, however, that the average pairwise AIC difference in the top two groups with the lowest AIC values (i.e., D8 and Rho8) was only 3.5 and, additionally, there were 14 minor to significant exceptions where another flow accumulation algorithm outperformed the D8 in pair-wise comparisons (see Buchanan, 2013  for table of the specific TWIs). Importantly, the SFD algorithms achieved their best fit to the empirical data only when the TWIs were smoothed via low-pass filtering (Fig. 9). When the TWIs remained 15 un-filtered, MD achieved the best AIC ranking. By smoothing the SFD-based indices, filtering effectively introduces flow dispersion, suggesting that an intermediate level of dispersion may be desirable when using lower resolution USGS DEMs. Indeed, numerous other studies conducted using coarse elevation models, have concluded that an intermediate approach between the SFD and MFD methods achieved the most realis-20 tic flow distribution patterns (i.e., Holmgren, 1994;Tarboton, 1997;Endreny and Wood, 2003;Güntner et al., 2004). The implications of the filtering effects are discussed in more detail in the following section. 10,2013 Evaluating topographic wetness indices across central New York

LiDAR
The means and overall frequency distributions of the filtered vs. unfiltered TWIs in the LiDAR dataset indicate little advantage to either method (Fig. 10). Even so, unfiltered LiDAR TWIs did achieve the lowest AIC values and the mean pairwise difference 5 between the filtered and unfiltered groups was 7 suggesting a moderate benefit to unsmoothed LiDAR-based TWIs. Although this finding is not strongly supported by our data, it is in direct contrast to Lanni et al. (2011) who demonstrated that their dynamic topographic index, calculated using high resolution (2 m) LiDAR DEMs, performed better when smoothed via a 3 × 3 low-pass filter. The lack of agreement between our studies may be due to the fact that we employed clustered empirical field data whereas Lanni et al. (2011)  (2011) study, which emphasize coarser-scale soil moisture dynamics, were likely better represented by TWIs that incorporated non-local topographic information (i.e. are filtered).

USGS
Unlike the LiDAR dataset, filtering the USGS-TWIs resulted in a substantial improve-25 ment in model fit (Fig. 10) Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | filtered vs. unfiltered TWIs was over 17, suggesting substantial improvement in predictive accuracy due to the smoothing of predicted wetness surfaces. Filtering the USGS TWIs likely improves their predictive ability for the same reasons that the downslope index did -because it helps to account for downslope controls on local drainage status which are more appropriately captured by coarser DEMs. In other words, filtering 5 averages out the effects of local anomalies and also incorporates a measure of nonlocal topographic effects, which results in a smoother, more contiguous, more realistic surface at larger hillslope scales.

Best overall model
The top ten best performing TWI formulations for both the LIDAR and USGS TWIs 10 are presented in Table 8. Model 1a possessed the lowest AIC value and the highest AIC weight (5917 and 0.39, respectively), indicating that it was the best model among the set of tested models in the LiDAR Dataset (Table 8). However, models 2a-4a all possess evidence ratios of three or less which provides little evidence that model 1a is in fact notably better than models 2a-4a. From this we can conclude that when 15 dealing with LiDAR data, the best TWI formulation will: (i) incorporate soils data, (ii) be interpolated to fine grid resolutions, less than 10 m, (iii) utilize a local slope algorithm such as LSFP, SDP or MTS as opposed to the DSI, (iv) employ a multiple flow direction algorithm such as MD∞ or D∞ and (v) remain unfiltered. When dealing with coarser USGS-based TWIs, our results suggest that Models 1b 20 and 2b are roughly equal in terms of their fit to observed moisture patterns (i.e., evidence ratios ≤ 3) (Table 8). Thus, the optimal parameter set when using USGS TWIs will: (i) incorporate soils data, (ii) utilize a slope algorithm that accounts for downslope controls such as DSI set to an intermediate d parameter (e.g., 5 m), (iii) employ a single flow direction algorithm and (iv) importantly, be smoothed via low-pass filtering. 25 For comparative purposes we have also included a plot of the best performing Li-DAR and USGS TWIs vs. observed soil moisture for each sample date, along with the associated R 2 value (Figs. 11 and 12 dates for the LiDAR dataset was roughly 0.61, i.e., approximately 61 % of the variation in soil moisture was explained by this TWI. However, the best-performing USGS TWI only explained roughly 32 % of the soil moisture variation on average. Higher R 2 values for both the USGS and LiDAR datasets were achieved when the soil moisture readings were binned, according to their TWI values, into integer categories or wetness classes 5 similar to Schneiderman et al. (2007) and Easton et al. (2008). Binning the TDR readings has the effect of averaging over larger spatial scales, which helps to reduce the effect of anomalous TDR readings, thus improving soil moisture-TWI correlations. For example, by binning the TDR readings into TWI-integer classes, the mean R 2 values increased substantially from 0.61 to 0.79 for the LiDAR TWIs and from 0.32 to 0.72 for the USGS TWIs. The majority of studies conducted prior to the year 2000 found coefficients of determination that seldom exceeded 0.5, and typically ranged from 0-0.4 (e.g., Burt and Butcher, 1985;Moore et al., 1988;Ladson and Moore, 1992;Jordan, 1994;Western et al., 1999Western et al., , 2004Tague et al., 2010). Notably, much like the USGS DEMs used in this 15 research, these older studies generally used DEMs derived from lower-quality elevation data. The fact that the range of R 2 values from our USGS TWI is consistent with those of the older studies implies that elevation accuracy may have played a strong role in limiting predictive ability.
Taking advantage of the availability of higher quality elevation data, several more  Sulebak et al. (2000), and Schmidt and Persson (2003) used high resolution TWIs (< 10 m) derived from high resolution elevation source data (not necessarily LiDAR), 25 and found R 2 values ranging from 0.51 to 0.87. The average Spearman coefficients corroborate the R 2 and AIC analyses (Fig. 13a). The upper range of Spearman values observed in this study (i.e., 0.7-0.78) were comparable with those of Sørensen et al. (2006) van Meerveld andMcDonnell (2006) andCantón et al. (2004). Note, both the Spearman vs. AIC analyses generally indicated the similar performance for the TWIs, e.g., high Spearman coefficients were strongly correlated with low AICs (Fig. 13b). Accordance of Spearman and AIC values lends credence to our findings and statistical methods and helps to facilitate comparisons with other studies that employed the Spearman 5 metric.
Despite the fact that we were able to demonstrate good correlations between LiDARderived TWIs and observed soil water patterns, which were consistent with those of other more recent research, on average 40 % of the variation remained unexplained. This is, perhaps, unsurprising considering that these simple indices are overlooking 10 several other well-proven factors that influence the spatial distribution of soil water. The effect of ET on soil moisture is particularly influential and varies based on vegetation type, aspect, and solar radiation; to name a few factors that are not included in the TWI indices. Another factor that may account for the discrepancies between TWIs and measured soil moisture is the inherently different scales between the base data used 15 to generate TWIs and the scale at which the TDR probe measures soil moisture, even with multiple measurements to characterize a sampling point.
Moreover, soil moisture dynamics are known to change not only through space, but also through time. Nevertheless, a core assumption of the TWIs examined here, and in most other research, is that of steady-state, wherein time-dependent storage terms 20 are neglected. As pointed out by Barling et al. (1994), rainstorms will rarely be of sufficient depth or duration to achieve steady-state subsurface flow. To address this issue, several researchers have explored more dynamic topographic indices which relax the steady-state assumption (e.g., Barling et al., 1994;Wilson et al., 2005). Although these may offer some improvements in terms of physical realism over the standard TWI, the 25 dynamic and quasi-dynamic indices have yet to be widely adopted or well-tested beyond their original papers. Additionally, these more advanced conceptualizations require considerably more input data and are sufficiently complex to start blurring the line between what constitutes a distributed hydrological model and a wetness index. 10,2013 Evaluating topographic wetness indices across central New York The simplicity of the standard TWI is really at the heart of its popularity and yet this simplicity also leads to variable results when trying to represent dynamic processes via a static index.

Conclusions
We identified some notable differences among different formulations of TWIs and their 5 correlation to spatial patterns of soil moisture in agricultural settings in central, NY. Most importantly, we found that some TWI-forms correlate relatively well with soil moisture. Our principal findings include: -LiDAR-derived TWIs achieved good correlations with observed patterns of soil moisture in fields in northeastern US.

10
-LiDAR-derived TWIs achieved appreciably better correlations than USGS-based TWIs. Thus, when given a choice between using LiDAR or USGS DEMs for constructing TWI maps, we recommend the former.
-TWIs that include soil transmissivity (STI) work better than the simpler TI (we used the SSURGO dataset and calculated transmissivity as the product of the 15 soil depth to a restrictive layer and average saturated hydraulic conductivity of that soil).
-The optimal formulation for a LiDAR TWI will: -Use a fine resolution DEM (we used 3 m).
-Use the Multiple Triangular Flow Direction algorithm (Seibert and McGlynn, 2007) to compute flow accumulation values. -Not apply a low-pass smoothing filter (we used a 3 × 3 low-pass filter).

HESSD
-The optimal formulation for a USGS TWI will: -Use the Downslope Index (Hjerdt et al., 2004) with a d parameter set to 5 m to compute slope (Tarboton, 1997).
-Use the D8 Flow Direction algorithm (O'Callaghan and Mark, 1984) to com-5 pute flow accumulation values.
Despite the encouraging LiDAR-based TWI-soil moisture correlations observed in this study, on average, roughly 40 % of the variation in soil moisture remained unexplained by the TWI. This is perhaps unsurprising considering we were attempting to describe an inherently dynamic process with a static index. Future studies may want to evaluate the cost-effectiveness (in terms of complexity and computational efficiency) of other TWI formulations, which either relax steady-state assumptions (e.g., Lanni et al., 2011), incorporate a measure of spatio-temporal variations in evapotranspiration (Ludwig and Mauser, 2000) or account for other terrain attributes such as aspect 15 or time-variable channel initiation thresholds (Xiande et al., 2002;Gomez-Plaze et al., 2001;Kim and Lee, 2004;respectively