Benchmark levels for the consumptive water footprint of crop production for different environmental conditions: a case study for winter wheat in China

Abstract. Meeting growing food demands while simultaneously shrinking the water footprint (WF) of agricultural production is one of the greatest societal challenges. Benchmarks for the WF of crop production can serve as a reference and be helpful in setting WF reduction targets. The consumptive WF of crops, the consumption of rainwater stored in the soil (green WF), and the consumption of irrigation water (blue WF) over the crop growing period varies spatially and temporally depending on environmental factors like climate and soil. The study explores which environmental factors should be distinguished when determining benchmark levels for the consumptive WF of crops. Hereto we determine benchmark levels for the consumptive WF of winter wheat production in China for all separate years in the period 1961–2008, for rain-fed vs. irrigated croplands, for wet vs. dry years, for warm vs. cold years, for four different soil classes, and for two different climate zones. We simulate consumptive WFs of winter wheat production with the crop water productivity model AquaCrop at a 5 by 5 arcmin resolution, accounting for water stress only. The results show that (i) benchmark levels determined for individual years for the country as a whole remain within a range of ±20 % around long-term mean levels over 1961–2008, (ii) the WF benchmarks for irrigated winter wheat are 8–10 % larger than those for rain-fed winter wheat, (iii) WF benchmarks for wet years are 1–3 % smaller than for dry years, (iv) WF benchmarks for warm years are 7–8 % smaller than for cold years, (v) WF benchmarks differ by about 10–12 % across different soil texture classes, and (vi) WF benchmarks for the humid zone are 26–31 % smaller than for the arid zone, which has relatively higher reference evapotranspiration in general and lower yields in rain-fed fields. We conclude that when determining benchmark levels for the consumptive WF of a crop, it is useful to primarily distinguish between different climate zones. If actual consumptive WFs of winter wheat throughout China were reduced to the benchmark levels set by the best 25 % of Chinese winter wheat production (1224 m3 t−1 for arid areas and 841 m3 t−1 for humid areas), the water saving in an average year would be 53 % of the current water consumption at winter wheat fields in China. The majority of the yield increase and associated improvement in water productivity can be achieved in southern China.


Introduction
Half of the large river basins in the world face severe blue water scarcity for at least one month a year (Hoekstra et al., 2012). Agriculture is the largest consumer of water in the world and therefore responsible for a large part of the water scarcity in the world. Still, global food demand continues to increase, due to growing populations and changing diets. Meeting growing food demands and simultaneously reducing the water footprint (WF) of agricultural production is therefore one of the greatest societal challenges of our time (Foley et al., 2011;Hoekstra and Wiedmann, 2014). In crop pro-duction, individual farmers generally aim to maximize their economic return through raising their productivity per unit of input such as capital, labour, land, and fertilizer. When water is scarce, raising production per unit of water (i.e. increasing water productivity in terms of t m −3 or reducing the WF in m 3 t −1 ) is a key challenge in order to save water and achieve sustainable water use at catchment level. Even when water is not scarce, it makes sense to have a reasonable level of water productivity, i.e. a good amount of "crop per drop". Farmers, however, generally lack incentives for saving water, since they pay little for their water use compared to other input factors, even under conditions of high water scarcity. In order to provide producers with an incentive to reduce the WF of their products to reasonable levels, Hoekstra (2013 has proposed to develop WF benchmarks, which can be used by governments, farmers and customers (crop traders and retailers) for setting WF reduction targets. Setting WF benchmarks for different products, particularly water-intensive products like crops, is fundamental for wise water allocation and fair sharing of water resources among different sectors and users (Hoekstra, 2013). WF benchmarks of crop production could be global, but would preferably be context-specific, given the fact that the WF of growing a crop varies as a function of environmental factors such as climate and soil (Mekonnen and Hoekstra, 2011;Siebert and Döll, 2010;Tuninetti et al., 2015).
The WF of a crop is determined by both environmental conditions (e.g. climate, soil texture, CO 2 concentration in the air) that cannot be controlled by humans and managerial factors (e.g. application of fertilizers and pesticides, irrigation technology and strategy, mulching practice) (Zwart et al., 2010;Mekonnen and Hoekstra, 2011;Brauman et al., 2013). Benchmarks for the WF of growing a crop can, for example, be set by looking at what WF level is not exceeded by the best 20-25 % of the total production in an area. Alternatively, benchmarks can be determined by estimating the WF associated with the best available technology and management practice (Hoekstra, 2013. Mekonnen and Hoekstra (2014) followed the first approach and developed global benchmarks for both the consumptive (green plus blue) WF and the degradative (grey) WF for a large number of crops, based on estimated WF values for 1996-2005 at a spatial resolution of 5 by 5 arcmin. Chukalla et al. (2015) followed the second approach and explored reduction potentials of consumptive WFs for a few crops by applying different types of alternative irrigation techniques and strategies and different types of alternative mulching practices. They found that the highest reduction (∼ 29 %) in the consumptive WF of a crop could be achieved when applying drip or subsurface drip irrigation in combination with deficit irrigation and synthetic mulching.
Research in developing benchmark levels for the consumptive WF of crop production is still in its infancy. An important question that has been insufficiently addressed is which environmental factors should play a role when devel-oping WF benchmarks. It is nice to have one global benchmark for the consumptive WF per crop, as a global reference, like the ones developed by Mekonnen and Hoekstra (2014), but it remains unclear whether it is reasonable to expect the same water productivity under different environmental conditions. In their global analysis, Mekonnen and Hoekstra (2014) found that a crop in a temperate climate generally has a smaller WF than the same crop in a tropical climate, but this can still be due to other factors (e.g. better management practices in temperate climates), so that this is not a sufficient finding to diversify benchmark levels based on the distinction between temperate and tropical. Besides, even though Mekonnen and Hoekstra (2014) found a difference between different climates, for each crop considered it was found that the 10 % best global production (e.g. with smallest WFs) was always at least partly in the tropics as well. In other words, a WF benchmark developed in the temperate part of the world still offers a reference value that can be achieved in the tropics as well. Next to climate, soil also affects evapotranspiration and yield and thus the WF of a crop. Tolk and Howell (2012), for example, analyse the variation of consumptive WFs of sunflower in relation to different types of soils. There has not been yet, though, a systematic study looking at how environmental factors influence the consumptive WFs of crops and to which extent it makes sense to diversify WF benchmark levels based on specific environmental factors.
The current study aims to contribute to this discussion through an explorative study for winter wheat in China. We explore which environmental factors should be distinguished when determining benchmark levels for the consumptive WF of crops. We subsequently determine benchmark levels for the consumptive WF of winter wheat production in China for all separate years in the period 1961-2008, for rain-fed vs. irrigated croplands, for wet vs. dry years, for warm vs. cold years, for four different soil classes, and for two different climate zones. Winter wheat in China accounts for 95 % of total wheat production in China, which is the world's biggest wheat producer (FAO, 2014). Winter wheat covers 96 % of China's harvested wheat area and is grown across China's different climate zones (NBSC, 2013). In order to avoid interference from managerial factors that cause differences in evapotranspiration and yield, we simulate WFs by means of FAO's water productivity model AquaCrop Raes et al., 2009;Steduto et al., 2009), at a resolution of 5 by 5 arcmin, considering only water stress and not taking into account other stresses such as from soil fertility, salinity, frost, or pest and diseases.
2 Method and data 2.1 Estimating consumptive WF of growing a crop The consumptive (green and blue) WF of growing a crop (m 3 t −1 ) equals the total actual evapotranspiration (ET, m 3 ha −1 ) over the cropping period divided by the crop yield (Y , t h −1 ). In the current study, the ET and Y of growing winter wheat in China were simulated on a daily basis, at 5 by 5 arcmin resolution, with FAO's crop water productivity model AquaCrop Raes et al., 2009;Steduto et al., 2009), run for the whole period 1961-2008. Compared to other crop growth models, AquaCrop has a significantly smaller number of parameters and better balances between simplicity, accuracy, and robustness (Steduto et al., 2007;Confalonieri et al., 2016). The model performance on simulating crop growth and water use has been well tested for a variety of crop types under diverse environmental conditions (e.g. Kumar et al., 2014;Jin et al., 2014;Abedinpour et al., 2012;Mkhabela and Bullock, 2012;Andarzian et al., 2011;Stricevic et al., 2011;Heng et al., 2009;Farahani et al., 2009;García-vila et al., 2009). AquaCrop has been applied in WF accounting at field (Chukalla et al., 2015), river basin (Zhuo et al., 2016a), and national level (Zhuo et al., 2016b) at high spatial resolution.
AquaCrop simulates water-driven crop water productivity with a dynamic daily soil water balance: where S [t] (mm) refers to the soil water content at the end of day t, PR [t] (mm) the precipitation on day t, IRR [t] (mm) the irrigation water applied on day t, CR [t] (mm) the capillary rise from groundwater, ET [t] (mm) daily actual evapotranspiration, RO [t] (mm) daily surface runoff and DP [t] (mm) deep percolation. CR [t] is assumed to be zero because the groundwater depth is considered to be much larger than 1 m (Allen et al., 1998). The green and blue WFs are determined by green and blue ET over the cropping period, respectively, divided by Y . Following Chukalla et al. (2015) and Zhuo et al. (2016a, b), the daily green and blue ET (mm) were separated by tracking the daily incoming and outgoing green and blue water fluxes at the boundaries of the root zone: where S green and S blue refer to the green and blue soil water content, respectively. The initial soil water moisture at the start of the growing period is assumed to be green water. The contribution of precipitation (green water) and irrigation (blue water) to surface runoff was calculated based on the respective magnitudes of precipitation and irrigation to the total green plus blue water inflow. The green and blue components in DP and ET were calculated per day based on the fractions of green and blue water in the total soil water content at the end of the previous day. Y was determined by multiplying the above-ground biomass (B) and the harvest index (HI, %). HI was adjusted to water and temperature stress depending on timing and extent of the stress by an adjustment factor (f HI ) from the reference harvest index (HI 0 ) (Raes et al., 2011): Only water stress is considered in modelling, which is determined by the water availability in the root zone, thus leaving out the effects of non-environmental factors (e.g. technology, fertilization) on crop growth. For irrigated fields, we assume that the applied irrigation volumes are equal to the net irrigation requirement. We used the same input crop parameters, including a fixed crop calendar, reference harvested index, and maximum root depth as calibrated for China's winter wheat, as in Zhuo et al. (2016b). We simulated winter wheat production per grid cell over the years based on the irrigated and rain-fed harvested areas of around the year 2000, as obtained from Portmann et al. (2010) (Fig. 1) in order to avoid in the simulations the effects of changes in where and how much wheat is grown. Data on monthly precipitation, reference evapotranspiration (ET 0 ), and temperature at 30 arcmin resolution were taken from the CRU-TS 3.10 dataset (Harris et al., 2014). Soil texture data were obtained from Dijkshoorn et al. (2008). For hydraulic characteristics for each type of soil, the indicative values provided by AquaCrop were used. Data on total soil water capacity were obtained from Batjes (2012).

Benchmarking consumptive WF of growing a crop
Following Mekonnen and Hoekstra (2014), benchmark levels for the consumptive WF of crop production were determined by ranking the grid-level WF values from the small-   est to the largest against the corresponding cumulative percentage of total crop production. As in the earlier study, we did not distinguish between green and blue WF benchmarks for two reasons. Firstly, the ratio of green to blue WF of a crop heavily depends on local green water resources availability, which is defined by the climate of a certain time in a certain location. Location-specific blue WF benchmarks can be developed as a function of the overall consumptive WF benchmarks and local green water availability . Secondly, the purpose of the current study is to find out to which environmental factor the consumptive WF benchmark is most sensitive.
In order to analyse differences in consumptive WFs in relatively dry vs. relatively wet years, we evenly group the 48 considered years  into relative dry, average and relatively wet years. We ranked the years based on the annual precipitation over the cropping area of winter wheat in China (Fig. 2a), classifying the 16 years with the lowest precipitation into the group of dry years and the 16 years with the highest precipitation into the group of wet years, with the other 16 years remaining for the group of average years. The average annual precipitation levels of the relatively dry, average and relatively wet years are 760, 799, and 850 mm yr −1 , respectively.
We also grouped the years considered into relatively cold, average and relatively warm years based on annual mean temperature (Fig. 2b) and into years with relatively low, average and high ET 0 (Fig. 2c). The average annual mean temperatures of the relative cold, average and warm years are 10.7, 11.2, and 11.8 • C, respectively. The average annual ET 0 values in the three categories of years are 874, 896, and 927 mm yr −1 .
For determining WF benchmarks for different soil texture classes, the soil types in the USDA (US Department of Agriculture) soil texture triangles were grouped into four soil classes (Raes et al., 2011): sandy soils, loamy soils, sandy clayey soils, and silty clayey soils. Each soil class has different ranges of field capacity, permanent wilting point and saturated water content (Table 1). The difference between soil water content and permanent wilting point defines the total available soil water content in the root zone. Given certain soil water content, a soil with a higher field capacity has less deep percolation. With the same water input from precipitation or irrigation and the same soil water content, soils with a smaller saturated soil water content will generate more surface runoff (Raes et al., 2011). Figure 3 shows the spatial distribution of the four soil classes across mainland China.
For determining WF benchmarks for different climate zones, we classify climate based on UNEP's aridity index (AI) Thomas, 1997, 1992). The AI is an indicator of dryness, defined as the ratio of precipitation to reference evapotranspiration, with five levels of aridity: hyper-arid (AI < 0.05), arid (0.05 < AI < 0.2), semi-arid (0.2 < AI < 0.5), dry sub-humid (0.5 < AI < 0.65), and humid (AI > 0.65). To determine the geographic spread of the five climate zones in China we used the data on annual precipitation and ET 0 averaged over the period 1961-2008 at 30 by 30 arcmin resolution (Harris et al., 2014) (Fig. 4). In the current study, we group the five climate zones into two broad zones: the arid to semi-arid (Arid) zone (AI < 0.5) and the humid to semi-humid (Humid) zone (AI > 0.5).
3 Result 3.1 Benchmark levels for the consumptive WF as determined for different years and for rain-fed and irrigated croplands separately We calculated the benchmark levels at different production percentiles for the consumptive WF of winter wheat (m 3 t −1 ) for the country as a whole, year by year, for the period 1961-2008. The results are summarized in Fig. 5. The benchmarks, determined per year and per production percentile, generally vary within ±20 % of the long-term mean value over the period 1961-2008. We find that the best 10 % of winter wheat production in China (with smallest WFs) has a maximum long-term average consumptive WF of 777 m 3 t −1 , which is larger than the maximum consumptive WF of the best 10 % of wheat production globally (592 m 3 t −1 ) that was reported by Mekonnen and Hoekstra (2014). We note here that the figures are not fully comparable, because Mekonnen and Hoekstra (2014) consider total wheat (both spring and winter wheat), use another model, and consider another period. We find that the best 20 % of winter wheat production in China has a maximum long-term average consumptive WF of 825 m 3 t −1 , which is smaller than the reported maximum consumptive WF of the best 20 % of wheat production globally (992 m 3 t −1 ). Finally, we find that the best 25 % of winter wheat production in China has a maximum  long-term average consumptive WF of 849 m 3 t −1 , which is again smaller than the maximum consumptive WF of the best 25 % of wheat production globally (1069 m 3 t −1 ). The national average consumptive WF of rain-fed winter wheat (1120 m 3 t −1 ) is larger than the national average consumptive WF of irrigated winter wheat (1075 m 3 t −1 ). How-ever, the benchmark levels determined by the best 10, 20, and 25 % of production for rain-fed winter wheat are lower than for irrigated winter wheat. The reason is that the yields in rain-fed production are generally higher than the yields in irrigated production at the same benchmark percentile. The highest rain-fed yields occur in the southern wet area with  sufficient precipitation over the cropping period, so that little water stress results in high rain-fed yields. The WF benchmarks for irrigated winter wheat are 8 % (for the 10th production percentile) to 10 % (for the 25th production percentile) higher than for rain-fed winter wheat.

Benchmark levels for the consumptive WF for dry vs. wet years
In a relatively dry or wet year, when considering winter wheat areas in China as a whole, we do not find typically different consumptive WFs in winter wheat production (Table 2). The WF benchmarks are consistently higher in dry than in wet years (1-3 %), but the differences between benchmark levels for the consumptive WF for dry vs. wet years are small compared to the variations within the dry and wet year categories (±11-14 %).

Benchmark levels for the consumptive WF for warm vs. cold years
Overall, considering irrigated and rain-fed croplands together, WF benchmarks for relatively warm years are 7-8 % smaller than for relatively cold years, which is not much when seen in the context of fluctuations in the WFs within the three temperature categories (Table 3). In irrigated areas, WF benchmarks for warm years are 11 % smaller, on average, than for cold years. In rain-fed areas, WF benchmarks for warm years are smaller than for cold years as well, but WF benchmarks in average years are not in between the WF benchmarks found for cold and warm years but higher than both. The lower values in cold years relate to lower ET, while the lower values in warm years relate to higher yields.  The findings when considering different ET 0 classes are similar when looking at the different temperature classes (Table 4). Overall, considering irrigated and rain-fed croplands together, WF benchmarks for years with high ET 0 are on average 5 % smaller than for years with average ET 0 and only 2 % smaller than for years with low ET 0 . Again, differences between consumptive WFs for years with relatively low or high ET 0 are small when seen in the context of fluctuations in the WFs within the three ET 0 categories (±3-6 %). Table 5 shows the consumptive WFs of winter wheat at different production percentiles in four soil classes in China. The simulated winter wheat production in sandy clayey soils accounts for 60 % of national total, followed by the production in sandy soils (24 %), silty clayey soils (8 %) and loamy soils (8 %) on average over the studied period. No consistent trends can be observed when we compare the benchmarks across the different soil classes. Overall, when we take irrigated and rain-fed fields together, the WF benchmarks for sandy soils are 10-12 % lower than the WF benchmarks for loamy soils. More specifically, we find that the WF benchmarks for irrigated winter wheat in sandy soils are about 15 % smaller than the WF benchmarks for the other three soil classes, due to relatively low ET. Without water stress, as is the case in the irrigated croplands, soil evaporation from sandy soils is less than from the other soil types because of the fast percolation of water below the root zone in the sandy soils, causing lower ET over the cropping period (Asseng et al., 2001). At rain-fed fields with limited water availabil- ity, crop yields are mainly affected by the soil water holding capacity. Therefore, consumptive WFs in sandy soils are larger than in the other three soils, due to the smaller crop yield in case of poorer water holding capacity. The observed differences in WFs of winter wheat in different soil classes agree with the experimental observations by Tolk and Howell (2012) for the case of irrigated sunflower in a semiarid environment as well as with the fieldwork-based simulations by Asseng et al. (2001) for irrigated and rain-fed wheat in the Mediterranean climatic region of Western Australia.

Benchmark levels for the consumptive WF for different climate zones
Consumptive WFs of winter wheat at different production percentiles in arid and humid zones in China are shown in Table 6. Significant differences between the benchmarks for different climate zones can be observed. Overall, considering irrigated and rain-fed croplands together, WF benchmarks for the humid zone are 26 % (for the 10th production percentile) to 31 % (for the 25th production percentile) smaller than for the arid zone. The WF benchmarks for winter wheat in China as a whole (when we take the arid and humid zones together) are close to the benchmarks for the humid zone, caused by the fact that most (96 % on average over the study period) of the simulated winter wheat production in China occurs in the humid zone.
In the irrigated areas, WF benchmarks for the humid zone are 26-30 % smaller than for the arid zone; in the rain-fed areas, they are 29-43 % smaller. The relatively large WFs in rain-fed fields in the arid zone logically follow from the water stress and resultant low yields. For the irrigated fields, the larger WFs in the arid zone are caused by the relatively high ET 0 and ET. The results confirm the findings from previous studies that the WF of crops, especially rain-fed crops, is negatively correlated with precipitation and positively correlated with ET 0 (Zwart et al., 2010;Zhuo et al., 2014). The differ- Figure 6. Simulated consumptive water footprints (WFs) of winter wheat, categorized into four classes (the best 10 % of production, the next best 10 %, the second next best 5 %, and the worst 75 % of production), accounting for different benchmark levels for humid vs. arid parts of China, for the year 2005 (climatic average year). ences between the WF benchmarks for irrigated and rain-fed winter wheat are 7-9 % in the humid zone and 3-11 % in the arid zone. Figure 6 shows, for both the humid and arid part of China and for the various winter wheat production areas, whether they contribute to the best 10 % of national winter wheat production in that climate zone (in the sense of having smallest WFs), to the next best 10 %, to the best 5 % after that, or to the worst 75 % (with WFs beyond the 25th percentile benchmark). Within the arid zone, consumptive WFs below the 25th percentile benchmark level were mostly located in Xinjiang province, with relatively high irrigation density (∼ 98 % of the harvested area). In the humid zone, consumptive WFs below the 25th percentile benchmark level were gathered in the southwest, where ET 0 is smaller than in other places (Fig. 4b).
3.6 Water saving potential by reducing WFs to selected benchmark levels The WF benchmarks for different climate zones differ much more significantly (26-31 %) than for different soils (10-12 %). WF benchmarks differ even less if we compare irrigated vs. rain-fed fields (8-10 %), warm vs. cold years (7-8 %), or wet vs. dry years (1-3 %). Therefore, when determining benchmark levels for the consumptive WF of a crop, it seems most useful to primarily distinguish between different climate zones, at least in the case of winter wheat in China. In this section, we analyse the potential water saving if actual consumptive WFs of winter wheat throughout China were reduced to the climate-specific benchmark levels set by the best 10 % of Chinese winter wheat production (1042 m 3 t −1 for arid areas and 776 m 3 t −1 for humid areas), the best 20 % of Chinese winter wheat production (1170 m 3 t −1 for arid areas and 819 m 3 t −1 for humid areas), or the best 25 % of Chinese winter wheat production (1224 m 3 t −1 for arid areas and 841 m 3 t −1 for humid areas).
Taking the estimated actual consumptive WFs of winter wheat in 2005, an average climatic year, as calibrated by the provincial statistics on yield of winter wheat (NBSC, 2013), we find that consumptive WFs in 75 % of the planted grids in arid zones and in 96 % of the planted grids in humid zones are over the 25th percentile benchmarks. This is largely due to low actual vs. potential yields. Figure 7 shows differences between actual provincial yields of winter wheat and the simulated yield potentials from the current study (assuming no crops stresses except water stress in rain-fed areas). The largest yield gaps occur in the southern provinces in the humid zone. The largest yield gap was observed in Fujian province. South China has 81 % of national blue water resources (Jiang, 2015). However, the risk of water shortage is increasing in the wet south with the operation of the Southto-North Water Transfer Project and the increasing competition for water resources between different sectors. Therefore, reducing WFs down to benchmark levels is as important for the relatively wet south of China as it is for the drier north. Table 7 shows the (green plus blue) water saving that would be achieved if actual consumptive WFs of winter wheat everywhere in China were reduced to the climatedifferentiated WF benchmark levels set by the 10th, 20th and 25th percentiles of production, in an average year (2005). We find that if in both the arid and humid zones the actual consumptive WFs were reduced to the respective 25th percentile benchmark level, the water saving in an average year would be 53 % of the current water consumption at winter wheat fields in China, which is 201 billion m 3 yr −1 in absolute terms. We further find that the water saving potential in the arid zone is substantially higher than in the humid zone.

Discussion
The consumptive WF of a crop in m 3 t −1 most strongly depends on the crop yield in t ha −1 and much less on the evapotranspiration from the crop over the growing period in m 3 ha −1 (Tuninetti et al., 2015;Mekonnen and Hoekstra, 2011). The simulated consumptive WFs of winter wheat in China have been based on modelling under a hypothetical condition without effects of managerial factors on crop growth. For evaluating our simulations of crop growth, we compared the simulated averaged yields of winter wheat of Chinese provinces for  to the corresponding agroclimatic attainable yields at different agricultural input levels in the GAEZ database (FAO/IIASA, 2011) (Fig. 8). The GAEZ agro-climatic attainable yields account for different levels of yield constraints from four factors in addition to water stress: (i) pest, disease, and weed damage on plant growth, (ii) direct and indirect climatic damages on quality of produce, (iii) efficiency of farming operations, and (iv) frost hazards. Current simulated yields of irrigated winter wheat are closest to the agro-climatically attainable yields with intermediate input levels and the yields of rain-fed winter wheat are closest to the agro-climatically attainable yields with high input levels. The simulated national average yield in the current study (6.5 t ha −1 ) is 23 % higher than the attainable wheat yield for China in the year 2000 (5.3 t ha −1 ) estimated by Mueller et al. (2012). The study shows that climate is the primary factor to be considered when setting consumptive WF benchmarks. This finding is probably a little sensitive to the model used; the precise WF benchmark figures found per climate zone, however, will be more sensitive to the model used. Subsequent studies, comparing WF benchmark estimates per cli- Figure 8. Comparison between the simulated yield of winter wheat and the agro-climatically attainable yield according to FAO/IIASA (2011) at provincial level in China. Averaged over the period 1961-1990. mate zone using different models, are necessary to quantify the uncertainty in the WF benchmarks presented in this study.
Further research could also explore whether crop varieties used should play a role when developing WF benchmarks, given the fact that some crop varieties may inherently be more productive than others. On the other hand, one could also consider that choosing a productive crop variety is part of the managerial choices. Since crop variety is not a given environmental condition but a choice, one could argue that accepting a less strict WF reference level for a less productive crop variety cannot be justified.
An important remaining research question is also how combinations of specific techniques and practices can actually lead to the WF reductions that will be necessary in different locations if the Chinese government were to adopt certain WF benchmarks as targets to achieve greater water productivity. Suppose, for example, that two WF benchmarks for winter wheat were adopted in China: 1224 m 3 t −1 for arid areas and 841 m 3 t −1 for humid areas. Although the simulations suggest that these levels are feasible throughout the arid and humid zone, respectively, whatever the type of soil, whether fields are rain-fed or irrigated, whether it is a cold or warm year, and whether it is a dry or wet year, in some places it will be harder and more would need to be done than in other places.
We studied benchmarks for combined green and blue WFs and did not look at each colour separately. For rain-fed lands, the benchmark levels presented in this study are obviously green WF benchmarks. For irrigated lands, the presented benchmark levels for overall consumptive WFs would need further specification into green and blue. Further research would need to be done to translate a certain benchmark level for the overall consumptive WF of a crop into a specific blue WF benchmark level per specific location as a function of the amount of rain per location, recognizing that the blue ratio in the WF will need to be larger if less green water is available.

Conclusions
Based on the case of winter wheat in China we find that (i) benchmark levels for the consumptive WF, determined for individual years for the country as a whole, remain within a range of ±20 % around long-term mean levels over 1961-2008; (ii) the WF benchmarks for irrigated winter wheat are 8-10 % larger than those for rain-fed winter wheat; (iii) WF benchmarks for wet years are on average 1-3 % smaller than for dry years; (iv) WF benchmarks for warm years are on average 7-8 % smaller than for cold years; (v) WF benchmarks differ by about 10-12 % across different soil texture classes; and (vi) WF benchmarks for the humid zone are 26-31 % smaller than for the arid zone, which has relatively higher ET 0 in general and lower yields in rain-fed fields. Therefore, we conclude that when determining benchmark levels for the consumptive WF of a crop, it is useful to primarily distinguish between different climate zones. We estimated that when in both the arid and humid zones, the actual consumptive WFs are reduced to climate-specific benchmark levels set by the 25th percentile of production and the water saving in an average year would be 53 % of the current water consumption at winter wheat fields in China, with the greatest relative savings in the arid zone.

Data availability
Data used in this paper is available upon request to the corresponding author.
Author contributions. Arjen Y. Hoekstra, La Zhuo, and Mesfin M. Mekonnen designed the study. La Zhuo carried it out. La Zhuo prepared the manuscript with contributions from all coauthors.