Statistical analysis and modelling of surface runoff from arable ﬁelds in central Europe

. Surface runoff generation on arable ﬁelds is an important driver of ﬂooding, on-site and off-site damages by erosion, and of nutrient and agrochemical transport. In general, three different processes generate surface runoff (Hor-tonian runoff, saturation excess runoff, and return of subsurface ﬂow). Despite the developments in our understanding of these processes it remains difﬁcult to predict which processes govern runoff generation during the course of an event or throughout the year, when soil and vegetation on arable land are passing many states. We analysed the results from 317 rainfall simulations on 209 soils from different landscapes with a resolution of 14 286 runoff measurements to determine temporal and spatial differences in variables governing surface runoff, and to derive and test a statistical model of surface runoff generation independent from an a priori selection of modelled process types. Measured runoff was related to 20 time-invariant soil properties, three variable soil properties, four rain properties, three land use properties and many derived variables describing interactions and curvilinear behaviour. In an iterative multiple regression procedure, six of these properties/variables best described initial abstraction and the hydrograph. To estimate initial abstraction, the percentages of stone cover above 10 % and of sand content in the bulk soil were needed, while the hydrograph could be predicted best from rain depth exceeding initial abstraction, rainfall intensity, soil organic carbon content, and time since last tillage. Combining the multiple regressions to estimate initial abstraction and surface runoff allowed modelling of event-speciﬁc hydrographs without an a priori assumption of the underlying process. The statistical model described the measured data well and performed equally well during validation. In both cases, the model explained 71 and 58 % of variability in accumulated runoff volume and instantaneous runoff rate (RSME: 5.2 mm and 0.23 mm min − 1 , respectively), while RMSE of runoff volume predicted by the curve number model was 50 % higher (7.7 mm). Stone if most for time Time Green

Abstract. Surface runoff generation on arable fields is an important driver of flooding, on-site and off-site damages by erosion, and of nutrient and agrochemical transport. In general, three different processes generate surface runoff (Hortonian runoff, saturation excess runoff, and return of subsurface flow). Despite the developments in our understanding of these processes it remains difficult to predict which processes govern runoff generation during the course of an event or throughout the year, when soil and vegetation on arable land are passing many states. We analysed the results from 317 rainfall simulations on 209 soils from different landscapes with a resolution of 14 286 runoff measurements to determine temporal and spatial differences in variables governing surface runoff, and to derive and test a statistical model of surface runoff generation independent from an a priori selection of modelled process types. Measured runoff was related to 20 time-invariant soil properties, three variable soil properties, four rain properties, three land use properties and many derived variables describing interactions and curvilinear behaviour. In an iterative multiple regression procedure, six of these properties/variables best described initial abstraction and the hydrograph. To estimate initial abstraction, the percentages of stone cover above 10 % and of sand content in the bulk soil were needed, while the hydrograph could be predicted best from rain depth exceeding initial abstraction, rainfall intensity, soil organic carbon content, and time since last tillage. Combining the multiple regressions to estimate initial abstraction and surface runoff allowed mod-elling of event-specific hydrographs without an a priori assumption of the underlying process. The statistical model described the measured data well and performed equally well during validation. In both cases, the model explained 71 and 58 % of variability in accumulated runoff volume and instantaneous runoff rate (RSME: 5.2 mm and 0.23 mm min −1 , respectively), while RMSE of runoff volume predicted by the curve number model was 50 % higher (7.7 mm). Stone cover, if it exceeded 10 %, was most important for the initial abstraction, while time since tillage was most important for the hydrograph. Time since tillage is not taken into account either in typical lumped hydrological models (e.g. SCS curve number approach) or in more mechanistic models using Horton, Green and Ampt, or Philip type approaches to address infiltration although tillage affects many physical and biological soil properties that subsequently and gradually change again. This finding should foster a discussion regarding our ability to predict surface runoff from arable land, which seemed to be dominated by agricultural operations that introduce man-made seasonality in soil hydraulic properties. EUROSTAT, 2012). Runoff generation is the driver of on-site and off-site damages by erosion processes and of nutrient and agrochemical transport (e.g. Haygarth et al., 2006) into open water bodies especially during local floods (e.g. Evrard et al., 2008). Thus, surface runoff generation on arable land is important for hydrological modelling, especially when water quality is considered.

P. Fiener et al.: Surface runoff from arable fields in central Europe
In general, it is acknowledged that three mechanisms generate surface runoff : (i) unsaturated surface runoff (Hortonian-type runoff), (ii) saturation-excess surface runoff, and (iii) return of subsurface storm flow, where the last is detectable in some cases already on the plot scale but becomes increasingly important when moving from the plot to the catchment scale and from the event to longer time scales. Not all excess water generated by these mechanisms contributes to surface runoff because some is stored on the surface as depression storage (infiltrating after rain events) and detention storage (partly running off after events) (Mohamoud et al., 1990). On the catchment scale, surface runoff partly re-infiltrates along its pathway to the stream network (runon infiltration; e.g. Nahar et al., 2008). Many models are available to address one or more of these mechanisms. These include relatively simple approaches that lump all processes operating along the flow path (e.g. the SCS curve number; Mockus, 1972) on a daily time scale or more mechanistic approaches on much shorter time scales (minutes) addressing a specific process that creates excess water, like models of the Green and Ampt (1911), Philip (1969) or Horton (1940) type. The mechanistic models may then be applied in a spatially distributed context including further processes occurring during runoff accumulation (for an extensive model overview see, e.g., Borah and Bera (2003); Migliaccio and Srivastava (2007); or the various results from the "distributed model inter-comparison project" (Smith et al., 2004)). Small-watershed-scale models dealing with surface runoff and soil erosion from arable land often stick to Hortonian-type surface runoff generation approaches (Assouline and Mualem, 2006;Fiener et al., 2008), assuming that surface sealing during heavy rainfall events dominates runoff generation on partly bare soils. Larger-scale models typically use Green and Ampt or Philip approaches assuming that infiltration is governed by a propagating wetting front depending on soil properties within the soil column (e.g. Kale and Sahoo, 2011;Klar et al., 2008). However, as processes dominating infiltration and surface runoff generation may vary inter-and intra-annually Vivoni et al., 2007) and even within an event (e.g. Silburn and Connolly, 1995), it is important to address potential switches between runoff generation mechanisms in advanced modelling approaches Tian et al., 2012).
Despite the improvements of modelling approaches to address different mechanisms of surface runoff generation simultaneously (e.g. the THREW model; Li et al., 2012), it remains challenging to account for the specific temporal and spatial variability of soil and crop characteristics in agricul-tural landscapes (Fiener et al., 2011a;Green et al., 2003), which may affect infiltration. This challenge results from the interaction with agronomic decisions dominating the soilvegetation system by influencing (i) the seasonal variability of soil properties and surface roughness depending on tillage operations and (ii) the associated seasonality of plant growth. The first relates to the mostly texture-based, static estimates of important soil variables, e.g. porosity, used in many modelling approaches. The second is associated with the seasonality of plant and residue cover potentially protecting the soils from crusting (for a review see Fiener et al., 2011a). Despite the developments in our understanding of individual processes in specific cases, it remains difficult to predict which processes govern runoff generation while soil and vegetation are passing many states during a crop rotation.
The major objectives of this study were (i) to statistically analyse 317 hydrographs from rainfall simulations carried out on different arable soils covering many landscapes with different crops to determine temporal and spatial differences in variables governing surface runoff during rainfall events and (ii) to derive and test a statistical model of surface runoff generation independent from an a priori selection of modelled processes. This model should operate on the plot scale (1-10 m 2 ) for the event scale with a resolution of minutes to obtain hydrographs, but it should take into account the variation of driving variables that happens on the scale of crop rotations and the catchment scale. To become operational, e.g. while implementing the plot approach into a distributed event model, it has to rely on variables that usually are available or can be made available on these temporal and spatial scales. This also requires choosing a statistical model and not a process model, because it would be impossible on these scales to identify the underlying processes. For instance, a return flow had been identified on some plots by the use of tracers despite a plot length of only 4.5 m (Haider, 1994), while this information was missing for most other plots because no tracers had been analysed and it would also be missing in the application case.

Rainfall simulations and range of examined conditions
We used rainfall simulations carried out on 209 plots located in central Europe. These plots covered a broad variety of locations, soils (developed from loess, sand dunes, moraines, Tertiary and Mesozoic sediments and basement rocks) and soil properties (Table 1) as well as a broad variety of crops (long-term bare fallow, different small-grain crops, maize and sugar beet) in different development stages (Table 1). Slope, plot length and plot width varied from 1.6 % to 23.6 %, from 4 to 22 m, and from 1 to 2 m, respectively. Fiener et al. (2011b) have shown that this data set covers (a) Bulk soil; (b) w / w indicates that the soil fractions are calculated relative to the total mass of the soil (kg kg −1 ); (c) fine earth fraction; (d) according to Sinowski et al. (1995).
most independent variables sufficiently to represent arable landscapes in humid, temperate climate. This is especially true for rain properties, for soil properties and for the distribution over seasons. However, the variation in plot dimensions is rather limited with a strong collinearity between width and length of the plots, with both restrictions being typical for rainfall simulation experiments. The simulations were performed through five different research groups using different types and set-ups of Veejet nozzle rainfall simulators. Rainfall intensities varied between 31 and 99 mm h −1 , while specific kinetic energy varied from 12 to 20 J m −2 mm −1 and total rainfall duration varied between 590 and 6180 s. Time to runoff was recorded and plot discharge was measured (approx. every minute) by collecting runoff with calibrated buckets at the lower end of the plots equipped with flow collection gutters (Fiener et al., 2011b).
The data of the different research groups carrying out the simulations had been intensively quality-checked and homogenised into one consistent data set that is freely available (Seibert et al., 2011). Details on the locations, the types of rainfall simulators, plot treatments (e.g. fixed plots vs. moving plots), and measurement conditions used by the different groups are given by Fiener et al. (2011b).
From the overall data set (Seibert et al., 2011) we chose the 317 simulations where no artificial pre-wetting of the soils had been applied through preceding simulations (dry runs). Excluding runoff measurements during afterflow (runoff after end of simulated rainfall), this resulted in 14286 runoff 4124 P. Fiener et al.: Surface runoff from arable fields in central Europe measurements (on average 47 measurements per simulation) used for further analysis.

Statistical analysis and model development
The selection of any infiltration model makes a fundamental assumption on the underlying runoff generation processes (e.g. crusting vs. infiltration front propagation vs. dominance of preferential flow). Following two different and widely used approaches, we fitted Horton-type equations and Green-Ampt-type equations to the hydrographs. Both infiltration equations were flexible enough to be meaningfully fitted to our data despite their contrasting mechanistic justification. Preliminary results showed that both approaches resulted in nearly identical shapes of the hydrograph and similar efficiencies (R 2 was usually above 0.95) and the root mean squared error (RMSE), which was below 0.1 mm min −1 for both types of equation, was equal to the unexplained variance in a geostatistical analysis (Fiener at al., 2011b) that does not force any theoretical equation through the data and thus yields the best possible fit. It is important to note that this apparently small error only quantifies the random error of multiple runoff rate measurements within an event. Many errors of the infiltration rate apply to all measurements within an event (e.g. errors in plot size or rain intensity; for more details see Fiener et al., 2011b) and potentially cause large errors in the parameters of the infiltration equations despite a close fit. In consequence, we were not able to decide which process governed runoff generation.
Furthermore, we encountered the problem of equifinality (Beven and Binley, 1992); that is, many parameter combinations gave statistically similar good results for the same hydrograph and the same infiltration equation (e.g. the RMSE may only change between 0.032 and 0.035 mm min −1 for the same hydrograph, while the initial infiltration rate of the Horton model changed by a factor of three and the decay constant changed by a factor of ten).
Since both approaches yielded identical results and we did not want to decide a priori on a specific modelling philosophy, we followed a different, purely statistical approach to estimate surface runoff generation from rainfall plots. We focused on and analysed four support points of the hydrographs. These were initial abstraction, defined as rain depth till runoff, and total runoff after 20, 30 and 40 mm of rain (P a , Q P 20 , Q P 30 , Q P 40 , respectively; Table 2). Support points for lower or higher rain depths narrowed the data set and left only subsets which had very early runoff or where high rain depths were applied. Support points for lower or higher rain, hence, were not used at this stage because this reduced the available range of soils, rains and land uses. For the selected four support points, multiple regressions utilising soil, rain and land use variables (Table 1) were developed independently following an iterative approach (e.g. Crawley, 2009) taking likely interactions between variables and curvilinear behaviour into account. Given that many vari-ables correlate (e.g. texture classes but also variables that were obtained by data transformation) and thus also correlate similarly to the support points, we chose those variables out of similarly efficient variables that were widely available (e.g. avoiding unusual texture classes), that were meaningful and consistent with current knowledge (e.g. avoiding very narrow texture classes), and that did not produce an unrealistic behaviour when extended beyond the range covered by measurements (e.g. avoiding transformations that became very steep beyond the measured range). Further, we avoided over-parameterisation by calculating the Bayesian information criterion (BIC; Kuha, 2004).
Given that some variables were not available for the entire data set (Table 1), such a variable could not be included in the equation developed during one of the successive steps as neither deletion nor imputation of the missing data seemed appropriate. To examine whether such a variable would have had explanatory power, we calculated the residuals between the prediction developed from the entire data set and the measured runoff of the respective subset of data (Framstad et al., 1985). These residuals were then correlated to the omitted variable to examine whether the omitted variable could improve the prediction. For example, soil moisture at the very surface or in the plough horizon may likely affect initial abstraction, but these variables were not available for all hydrographs; hence, we developed a prediction equation for initial abstraction without considering soil moisture; then, we calculated the residuals of this equation for those hydrographs where the soil moisture was available; these residuals were then correlated with the soil moistures to examine whether soil moisture could explain some of the unexplained variation. None of the other (incomplete) variables had explanatory power and hence it did not become necessary to consider them in estimating surface runoff.
The selected support points could be predicted using the same soil properties (indicating that dominant influences did not change during the different rainfall events), while only the calibration parameters changed depending on rain depth. Hence, the equations of the selected support points were combined in the next step into one equation, in which the parameterisation depended on rain depth. This equation was then finally fitted to all 14 286 runoff measurements of the 317 hydrographs (approximately 1 min time steps).

Model and validation
To examine whether the final equation to predict runoff generation would be transferable to other areas, we used a tenfold stratified cross validation, which is regarded best for model selection (Kohavi, 1995). Therefore, we randomly chose 90 % of the 317 hydrographs, which sufficiently stratified the data so that they contained about the same proportion of labels as the original data set. For this subset of hydrographs we determined the equation parameters, while the remaining 10 % of hydrographs were used for model validation. This procedure was repeated ten times assuring that every hydrograph was used once for validation. The ten folds yielded a family of similar equations for all subsets that satisfactorily predicted the validation data (see Results). Finally, we compared the quality of our predictions with predictions derived following the classical curve number (CN) approach (Mockus, 1972), which is the most prominent statistical approach to estimate surface runoff. The hydrological soil groups of the CN approach were assigned based on the soil descriptions (not based entirely on topsoil properties as recorded in the database). For fallow, row crop and small grain a low runoff disposition was always assumed. Furthermore, CNs were estimated with an alternative approach following Auerswald and Haider (1996) using soil cover of row crops and small grains, respectively. The second CN approach was developed using a subset of the data set used in this study (Auerswald and Haider, 1996).
All statistical analyses were carried out using the GNU R version 2.14.0 (R Development Core Team, 2011). Besides R 2 and RMSE we also used the Nash-Sutcliffe efficiency (NSE; Nash and Sutcliffe, 1970) as a goodness of fit parameter.

Support points
The initial abstraction P a ranged from 0.7 to 62 mm for the 317 hydrographs, but only two of the variables contributed to the explanation of this variation. These were total stone cover exceeding 10 % F stone>10 % (range 0 to 25 %), which was calculated as F stone>10 % = max (0; F stone -10), and sand content (0.063 to 2 mm) of the bulk soil C Sa_tot (range 2 to 87 %). With increasing stone cover, time to runoff (and hence initial abstraction) increased, while increasing sand content promoted earlier runoff (Eq. 1): Equation (1) explained 53 % of the variation (RMSE 6 mm) of P a , while F stone>10 % and C Sa_tot explained 37 and 10 mm of the variation, respectively. The RMSE was rather large (and R 2 low), indicating that initial abstraction was strongly influenced by factors that could not be captured by the available variables. Remarkably, rain intensity, which spanned from 29 to 99 mm h −1 , had no influence on initial abstraction (R 2 = 0.0002), while it dominated the time to runoff because initial abstraction was reached earlier with increasing rain intensity. Also, soil moisture in the surface soil (0.03 m; range: 2 to 26 w / w-%) or in the plough layer (range: 8 to 40 w / w-%), which both may especially influence early runoff, did not improve the prediction of P a . Q P 20 , Q P 30 and Q P 40 were all explained best by the same variables, namely rain intensity, time since tillage and organic carbon content. This lead to equations of the following type: where Q P is the accumulated runoff volume (mm) since the start of rain to rain depth P (mm); p is rain intensity (mm h −1 ); t sT is time since tillage (d); C SOC is soil carbon content (%); and f , g, h, k and l are empirical parameters that vary with rain depth P . In general, the higher p was, the more runoff was observed after a given rain depth because the time available for infiltration decreased. The strongest influence, however, was exhibited by t sT , which usually is not regarded in hydrological modelling. With increasing t sT runoff decreased. For example, runoff after 30 mm of rain was on average 20 mm if the rainfall occurred within less than an hour after tillage, while it was less than 5 mm if the rainfall occurred more than 100 days after tillage. This effect was particularly pronounced for short t sT (in the range of few hours to single days) although it lasted even for more than 200 days. This strongly decreasing effect made it necessary to use the logarithm and to use a second term (ln(t sT ) 4 ) in Eq. (2), which compensates some of the term (ln(t sT )) at high t sT . Increasing C SOC also decreased runoff and again this effect was sub-proportional. Despite the large number of available explanatory variables (Table 1) and the large number of measurements, no further variable improved the runoff prediction. This was especially true for soil physical properties that are commonly assumed to influence runoff (e.g. texture variables, porosity, and moisture).

Hydrograph prediction
Given the identical behaviour of all support points, the parameters of Eq.
The combining of Eqs. (1) and (3) allowed the computation of hydrographs for all 317 events. The calculated hydrographs explained 72 % of the variability of the measured accumulated runoff volumes (RMSE 5.2 mm; NSE 0.71), as compared to 58 % of the variation in instantaneous runoff rates (RMSE 0.23 mm mm −1 ; NSE 0.56). The error distributions (Fig. 1) showed a pronounced excess kurtosis, indicating that the errors were usually less than half as indicated by the RMSEs with the exception of some hydrographs that were poorly predictable. We checked these hydrographs and the corresponding experimental descriptions but found no anomalies that could explain the behaviour of these hydrographs. It is important to note that RMSEs also account for sampling errors associated with field measurements and for inconsistencies among research groups that contributed to the combined data set. The measured instantaneous runoff rates per minute are subject to random errors that level out when rates over a longer period of time are combined in the calculation of the accumulated runoff volume, while systematic errors (bias) of the rate measurements also affect runoff volume. The difference in performance of rates and volume thus was due to the influence of random error. The random error in measured runoff rates along a single hydrograph typically was ±0.1 mm mm −1 (Fig. 2) or half of the overall RMSE. No model can capture such random errors and also the biases, which are even more difficult to identify (e.g. errors in plot size determination). It is hence unlikely that another equation could explain the hydrographs better. Examples of measured and predicted hydrographs selected to be close to the mean RMSE are given in Fig. 2. They show rainfall simulations on a long-term bare fallow soil in seedbed conditions that was rained on six times during three years. Among the six hydrographs, Fig. 2f Fig. 3. Modelled accumulated runoff volumes (Q P 20 to 60 ) for different rainfall depths (20 to 60 mm) and varying total sand content C Sa_tot , stone cover F stone , time since tillage t sT , soil organic carbon content C SOC , and rainfall intensity p as used in Eqs. (1) and (3); for the modelling approach all variables except the one varied were kept constant at their mean value (for values see Fig. 4).
of t sT as this hydrograph was obtained only one hour after tillage, while the other hydrographs were obtained 3 to 5 days after tillage. Despite near-constant soil, plot and rain properties for some of the other hydrographs (e.g. D and E, except for the fact that more rain was applied in the case of E), there were differences for which no explanation exists and which hence can also not be captured by the model. Despite this, the model with only five variables explained all hydrographs reasonably well even though three variables (F stone>10 % , C Sa_tot , and C SOC ) were held constant because they were determined only once on this plot.
The sensitivities of the variables within the complete model were analysed by changing the values of each variable within its measured range (Table 1), while rainfall depth increased from 0 to 60 mm and the other variables were held constant at their mean values (Figs. 3 and 4). With increasing sand content, runoff started earlier (Fig. 4), but the effect was small and most prominent for small sand contents (approximately 0 to 10 %; Fig. 3). Stone cover had a much , and maximum (thin line) values of total sand content C Sa_tot , stone cover F stone , time since tillage t sT , soil organic carbon content C SOC , and rainfall intensity p. Numbers denote the minimum, mean and maximum of each variable. All variables were kept constant at their mean value except the one varied. For F stone the minimum and the mean result in the same hydrograph as stone cover becomes active only for F stone>10 % . Table 3. Calibration and validation results of accumulated runoff volumes (Q) and instantaneous runoff rates (q) for all 317 hydrographs used in a ten-fold cross validation; Goodness-of-fit parameters were calculated based on the full model/data resolution of 1 min; NSE is Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970), R 2 is the coefficient of determination, and RMSE is the root mean square error; n indicates the number of single measurements used for calibration and validation.  3 and 4). Increasing stone cover increasingly retarded runoff, but this became effective only above a threshold of 10 % stones (Fig. 3). Consequently, stone cover can be neglected for many soils because the average stone cover in our data set was 6.6 %. Importantly, sand content and stone cover influenced the whole hydrograph (Fig. 4) beyond the start of runoff due to the fact that Eq. (1) was needed to calculate Eq. (3). With increasing rainfall intensity, instantaneous runoff rates and accumulated volumes increased as predicted by Eq. (3). This also influenced the start of runoff. Runoff started slightly later with decreasing rain intensity (Fig. 4) even though intensity was not part of Eq. (1). This is because the influence of intensity on initial abstraction was rather weak when compared to the random scatter of initial abstraction. Using all runoff measurements, as in Eq. (3), instead of using only one data point (initial abstraction) reduced the random scatter, and thus this influence became visible in the final Eq. (3). Thus, Eq. (1) was not sufficient to calculate the start of runoff and so was used as an intermediate step in the development of Eq. (3). The same behaviour was true for all other variables that additionally entered Eq. (3).
The influence of C SOC was of similar strength as rainfall intensity. Instantaneous runoff rates and accumulated volumes decreased with increasing C SOC (Fig. 3) and caused the runoff to start later (Fig. 4). The time since tillage t sT effect was about 30 % stronger than C SOC and rainfall intensity (compare final ranges of runoff volume and rate), but this was an effect of the very short t sT (minimum: 1 h) that were possible with small plots and artificial rainfall but which will unlikely occur on larger fields that need considerably longer than 1 h for tillage. Considering the range of time relevant for whole fields, the influence of t sT was similar in strength to the other influences. The change during the first 12 days after tillage was about the same as the change occurring during the following 215 days (Fig. 4). Modelled vs. measured initial abstraction P a , and accumulated runoff after 20, 30 and 40 mm of rainfall (Q P 20 , Q P 30 and Q P 40 , respectively); data shown combine all validation results of the ten-fold cross validation; root mean square errors are 7.0, 3.5, 5.3, and 6.9 for P a , Q P 20 , Q P 30 , and Q P 40 , respectively.

Model validation
The restricted data sets of the folds created during cross validation led to models similar to those using the full data set. The prediction quality did not differ between the calibration and the validation data sets for both runoff volume and rate (Table 3), indicating that all models were equally suitable for predictions. The models explained the validation data with a NSE between 0.55 and 0.71 (Table 3, Fig. 5). Runoff volume again was modelled more accurately than runoff rate. Runoff varied between 0 and 59 mm and could be predicted with RMSE = 5.2 mm. However, the models performed somewhat weaker for initial abstraction, as mentioned earlier, since P a is strongly influenced by factors that could not be captured with the available variables. In general, prediction quality increased with rainfall volume and hence surface runoff volume (Fig. 5).
Using the CN approach according to Mockus (1972) increased the RMSE of runoff volume by about 50 % (RMSE = 7.7 mm). The same was true when using the CN approach by Auerswald and Haider (1996) (RMSE = 7.9 mm).

Initial abstraction
In general, initial abstraction showed substantially more random (unexplained) variability than subsequent runoff rates, indicating that these measurements are more prone to uncertainty. The high variability of initial abstraction under more or less identical plot conditions could have resulted from small random differences; e.g. compaction at the downslope end of the plot will encourage early runoff or small depressions at the outlet will increase detention storage and hence delay first runoff. Such random differences are likely to occur given that most of the plots were situated in ordinary farmed fields. Also subjective decisions by the technical staff carrying out the rainfall simulations are necessary when recording the first runoff (whether it starts with the first single drop or the first continuous flow). These decisions will differ among research groups contributing the data, persons within a group and even for the same person during different measuring campaigns. Hence, when initial abstraction was analysed without consideration of the following runoff measurements, it was best explained by the combination of only two soil properties, namely F stone>10 % and sand content (Eq. 1), despite its large variability (Table 2). However, all other variables which influenced the hydrograph also affected initial abstraction (Fig. 4) because (at the plot scale) abstraction must become larger the slower the hydrograph rises. The effect of F stone>10 % most probably resulted from the macropore space under stones created during tillage that can store runoff. The threshold indicated that small stone contents, which usually also are associated with small and rounded stones, did not exhibit this effect. In this case it can be expected that the small stones are embedded within the soil matrix and may even decrease infiltration rates (Wilcox et al., 1988). This threshold agrees with the calculation of soil erodibility in the revised universal soil loss equation, which also uses a threshold of 10 % for the consideration of stones (Roemkens et al., 1997). Also  suggested this threshold. In general, the importance of the variable F stone>10 % is in line with findings of Poesen et al. (1990), indicating that stones not fully embedded in the surface soil layer typically lead to preferential infiltration of runoff under these stones, and with Tromble (1976), who found a positive relation between infiltration and stone cover after ploughing rangeland. Even though the influence of stones on initial abstraction was large, this applied only for a small number of soils. Only 36 % of our soils had a stone cover just above the threshold and only 16 % were above a stone cover of > 15 %. For the USA it was estimated that stones need to be considered in the calculation of soil erodibility on 16 % of the land area (Roemkens et al., 1997). Similar percentages may hence be found in many temperate areas of the world, while in other areas like the Mediterranean stony soils may even occupy much larger areas (60 % according to Poe-sen and Lavee, 1994) and cause the low erosion rates there (Cerdan et al., 2011). The influence of sand content was opposite to what might be expected (e.g. from the influence of texture in the SCS CN model) although our model is still in general agreement with the assessment of coarse-textured soils by the CN model due to the fact that stones had a much larger influence than sand and because the CN model does not explicitly distinguish between the effects of stones and sand. There is little systematic research on the effect of sand on runoff, which impedes the interpretation of this result. It is remarkable, however, that the influence of sand only promoted early runoff but not later runoff (Figs. 3, 4). Likely, the increasing sand content decreased aggregate stability (Boix-Fayos et al., 2001) and increased slaking forces (Auerswald, 1995) due to the usually dry soil surface of sandy soils. Both promote the breakdown of aggregates and thus accelerate sealing and decrease depression storage on the soil surface (Mohamoud et al., 1990).

Hydrograph shape
The hydrographs could be predicted surprisingly well with an interaction of simple rain, soil and land-use variables despite the large variation in the data set. These were rain depth exceeding initial abstraction, rain intensity, soil organic carbon content and time since tillage. The importance of rain depth exceeding initial abstraction and rain intensity is obvious and is also important in many other surface runoff estimates (e.g. Appels et al., 2011).
The influence of C SOC on hydraulic parameters (e.g. Rajkai et al., 2004;Scheinost et al., 1997) and erosion (Guerra, 1994) has been shown in several studies. Its influence on the hydrograph likely results from (i) a larger aggregate stability (Auerswald, 1995;Tisdall and Oades, 1982), (ii) larger unsaturated hydraulic conductivity, and (iii) higher biological activity (e.g. Anderson and Domsch, 1989;Weigand et al., 1995) especially by earthworms creating more voids for runoff intake . It is important to note that the soils for which these relationships have been specifically quantified by Weigand et al. (1995) and Auerswald et al. (1995 comprise a large portion of the present data set. It is thus likely that biological activity, earthworm abundance and cross-sectional area of biopores, which were available for these soils, would have been good predictors for the entire data set if they had been available for all runs. However, given that these variables are usually not available for prediction, C SOC is preferable even though it may only influence infiltration indirectly via aggregate stability and biopore cross-sectional area. More difficult to interpret is the importance of t sT , because this variable is rarely analysed in relation to runoff generation (and is included neither in the lumped CN model nor in any of the mechanistic models to predict runoff generation) despite the fact that many publications compare the different tillage treatments (e.g. Auerswald et al., 1994;Silburn and Connolly, 1995;Choudhary et al., 1995) and thus acknowledge the prominent impact of tillage on runoff. However, these comparisons are usually done between treatments, while the changes over time are hardly considered although tillage impacts many physical and biological soil properties, which then gradually change until the next tillage (Caron et al., 1992;Dexter et al., 1998;Franzluebbers et al., 1995;Zobeck and Onstad 1987). Surface runoff decreased with increasing t sT , while the opposite might be expected from the typically observed decrease in porosity following a number of drying-wetting cycles after tillage (Ahuja et al., 2006;Franzluebbers et al., 1995;Onstad, 1984) and the decrease in detention and depression storage due to a decrease in random roughness with consecutive rainfalls (Zobeck and Onstad, 1987). Several processes likely contribute at different time scales as t sT covered nearly four magnitudes (1 h to 227 days; Table 1). (i) In the short term (several hours after tillage) the fast drying of freshly tilled soil can increase infiltration capacity and stabilise aggregates during drying (Crouch and Novruzi, 1989;Gollany et al., 1991). The latter reduces soil crusting potential and promotes infiltration. (ii) Within several days following tillage, age hardening of the aggregates will take place due to drying (cycles) and due to biological activity. Biological activity produces binding substances, including hyphae that form more and closer bonds between soil particles, causing cementing substances to precipitate at newly formed particle contacts (Dexter et al., 1988;Kemper and Rosenau, 1984;Schweikle et al., 1974). All of these mid-term processes of soil structure stabilisation potentially prevent soil crusting, which is most important shortly after tillage since soils are not fully covered by growing crops. (iii) In the long run (weeks to months), t sT is probably also a proxy for the development of plant cover, including changes in tilth underneath a cover and the development of connected biopores reaching the soil surface, even though none of the four cover variables (Table 1) entered any equation. These interpretations have to remain speculative given the little attention t sT has previously attained in runoff studies. To our knowledge, this parameter has only be analysed in respect to aggregate stability and soil erosion, where it can exhibit a large effect (e.g. Auerswald, 1993;Auerswald et al., 1994;Caron et al., 1992;Shainberg et al., 1996), but not for runoff generation. Typically this information is not reported in publications, which may explain the often large difference in runoff between different studies as well as some of the unexplained scatter within individual studies given the large changes that can happen at short t sT . More attention should be paid to variables related to tillage practices given the fact that seedbed conditions, which fall into this range, are often analysed.
It is remarkable that the CN approach by Auerswald and Haider (1996) did not perform better than the original version by Mockus (1972) although Auerswald and Haider (1996) had used a subset of our data to develop their equation, which predicts CN from soil cover. Within their subset of data, soil cover mainly changed due to early plant growth and hence it had statistically a similar power to t sT . For the entire data set, t sT was superior to soil cover because it also described the changes immediately after tillage before the onset of plant growth. Additionally, t sT can also serve as an indicator for long-term changes, while soil cover approaches its final value usually two months after seeding.
It is debatable whether any empirical or mechanistic approach to model surface runoff generation can be reliably transferred to other sites given the multitude of conceivable influences. As our data set covers a large range of rainfall, topography, soil and land-use properties (Table 1) the results from the validation are encouraging for our statistical approach. The overall RMSE of accumulated runoff volume and instantaneous runoff rate of 5.2 mm and 0.23 mm mm −1 , respectively, probably cannot be lowered markedly by another model predicting rain excess and runoff generation because such differences already existed in the data measured in replicated plots (Fig. 2). The differences must be caused either by systematic measuring errors like a wrong rain intensity or by properties that were not measured, and thus would not be available for other types of models (e.g. antecedent sealing, biopore density, biopore connectivity etc.).

Conclusions
The large data set of 317 rainfall simulations (14 286 runoff measurements) represented a wide range of arable soils and crops. Runoff measurements were related to 20 timeinvariant soil properties, three variable soil properties, four rain properties, three land use properties and derived variables. In an iterative multiple regression procedure six of these properties/variables best described initial abstraction and the hydrograph. The fraction of stone cover above 10 % F stone>10 % and the content of total sand in the fine earth fraction C Sa_tot were needed to estimate initial abstraction, while the hydrograph could be predicted from rain depth exceeding initial abstraction P r , rainfall intensity p, soil organic matter content C SOC , and time since last tillage t sT . The resulting model predicted event hydrographs without a priori assumptions of the underlying process (e.g. Hortonian vs. saturation runoff generation). Validating this approach by creating a family of models by ten-fold cross validation indicated that these models explained 72 % of variability in runoff volume and 58 % of runoff rate (RSME: 5.2 mm and 0.23 mm mm −1 , respectively) of the training data and also of the validation data. It outperformed the CN approach, and thus implementation in spatially distributed and temporally continuous models that capture agricultural management seems promising.
Stone cover was most important for the initial abstraction, while t sT was most important for the hydrograph. These variables are not taken into account either in typical lumped hydrological models (e.g. CN approach) or in more mechanistic models using Horton, Green and Ampt, or Philip type approaches to address infiltration. This finding should foster a discussion regarding our ability to accurately model surface runoff from arable land, which seemed to be dominated by agricultural operations introducing a man-made seasonality to soil hydraulic properties.