Case-based formalization and reasoning method for 1 knowledge in digital terrain analysis ─ Illustrated by 2 determining the catchment area threshold for extracting 3 drainage networks 4 5

5 C.-Z. Qin1,2,* X.-W. Wu1,3 J.-C. Jiang4 A-X. Zhu1,2,5,6 6 [1]{State Key Laboratory of Resources and Environmental Information System, Institute of 7 Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China} 8 [2]{Jiangsu Center for Collaborative Innovation in Geographical Information Resource 9 Development and Application, Nanjing 210023, China} 10 [3] {College of Resources and Environment, University of Chinese Academy of Sciences, 11 Beijing 100049, China} 12 [4] {Smart City Research Center, Hangzhou Dianzi University, Hangzhou, 310012, China} 13 [5] {Department of Geography, University of Wisconsin-Madison, Madison, WI 53706, USA} 14 [6] {Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing 15 210023, China} 16 Correspondence to: C.-Z. Qin (qincz@lreis.ac.cn) 17 18 Abstract 19 Application of digital terrain analysis (DTA), which is typically a modeling process involving 20 workflow building, relies heavily on DTA domain knowledge of the match between the 21 algorithm (and its parameter settings) and the application context (including the target task, 22 the terrain in the study area, the DEM resolution, etc.), which is referred to as application23 context knowledge. However, existing DTA-assisted tools often cannot use application24 context knowledge because this type of DTA knowledge has not been formalized to be 25 available for inference in these tools. This situation makes the DTA workflow-building 26 process difficult for users, especially non-expert users. This paper proposes a case-based 27


Introduction
Digital terrain analysis (DTA) is a useful approach because it can handle the complexity of GIS spatial analysis and has been widely used in geography and related fields (Wilson, 2012).
More and more users, including many with little knowledge of DTA, are becoming involved in DTA applications.Use of DTA is typically a non-trivial workflow-building process consisting of organizing the various DTA tasks and specifying the algorithm (including parameter settings) for each task (Hengl and Reuter, 2009).This workflow-building process relies heavily on knowledge of the match between DTA algorithm specifications and the particular application context.However, current DTA-assisted tools (e.g., ArcGIS, GRASS, SAGA, White Box, TauDEM, etc.) provide very limited support during the DTA application modeling process (Qin et al., 2011).It is therefore difficult for users, especially those with little knowledge of DTA, to use DTA correctly and effectively.
Knowledge used during DTA workflow building can be classified into three types (Qin et al., 2011): 1) task knowledge, which describes the relationship between DTA tasks and their input/output; 2) algorithm knowledge, which is the meta-data of a DTA algorithm (including its parameters); and 3) the so-called application-context knowledge consisting of how to specify the suitable algorithm and its parameter settings for a DTA task according to the application context (such as application goals, study area characteristics, and DEM resolution) (Qin et al., 2013).This knowledge is called application-matching knowledge in Lu et al. (2012).
Among the three types of DTA knowledge, both task knowledge and algorithm knowledge have been formalized by means of rule or semantic networks (Russell and Norvig, 2009) and hence can be used in existing DTA-assisted tools (e.g., ModelBuilder in ArcGIS).However, application-context knowledge, which is crucial for building a suitable DTA model for a specific application, is more difficult for a user to acquire than the other two types of knowledge.Currently, there is no well-established formalization method by which DTA tools can provide more effective assistance to DTA applications.This situation exists mainly because this type of DTA knowledge is largely inaccurate and non-systematic, and often exists only in documents for specific case studies (DTA application instances) or even just in the experience of domain experts.
To solve this problem, this paper proposes a case-based formalization for DTA case studies involving DTA application-context knowledge and a corresponding case-based reasoning method.A DTA-assisted tool can then use this type of knowledge to reduce the difficulty of DTA application modeling.

Basic idea
Cases are a commonly used way of formalizing non-systematic knowledge in artificial intelligence.A case is a record of an existing problem-solving instance and its contextual information, which has two requisite parts: the problem and the solution (Kaster et al., 2005).
The problem describes the application purpose of the case and its contextual information.The solution is a set of methods (including their parameter settings) for achieving this purpose.
Note that the case is not the same as the concept of a prototype (Minda and Smith, 2001), which can also use existing instances to describe empirical knowledge and has been applied in the geographical domain (e.g., Qi et al., 2006;Qin et al., 2009).The prototype highlights the representativeness of the instances, whereas the case does not.Currently, most DTA application-context knowledge is empirical knowledge that often exists in application instances and is difficult to formalize in as explicit rules or mathematical equations.In this situation, the case is a suitable way to formalize DTA application-context knowledge (Lu et al., 2012).
Case-based reasoning (CBR) (Schank, 1983) is a method of solving problems by referring the solution of a new problem to the solutions of existing similar cases (Aamodt et al., 1994;Watson and Marir, 1994).Compared with traditional rule-based knowledge representation and reasoning methods, the case-based method can simplify knowledge acquisition into case acquisition, with no need for an explicit expression model of domain knowledge (Watson and Marir, 1994).Therefore, the case-based method is suitable for application domains that lack a systematic expression of empirical domain knowledge.A case-based reasoning method could be designed to use DTA application cases to reduce the difficulty of DTA application modeling for users.

Methodology
According to the basic idea presented above, a case-based formalization methodology is designed for DTA application instances containing application-context knowledge and the corresponding inferences (Fig. 1).Case formalization and the corresponding case-based reasoning method are the two main stages in the methodology.

Case formalization
Case formalization is the process of extracting and describing each individual case in a formal way, so that the case can be retrieved by a corresponding case-based reasoning method.
Among the parts of a case, the case problem consists of a set of factors describing the contextual information associated with the case.This set of factors is quantified using a set of quantitative attributes that are directly involved in case-based reasoning.It is of crucial importance to design and quantify these factors properly for case-based reasoning.The solution part of a case, which records the candidate problem-solving result of the case-based reasoning, is not necessary to participate in the reasoning procedure.The case output is an optional part of the description that is used to record the status of factors describing the case problem after the case occured (Kolodner, 1993).Therefore, the key to designing a case-based formalization of DTA application-context knowledge is how to choose and quantify a set of factors influencing DTA algorithm selection and parameter setting to describe the case problem appropriately.
According to the characteristics of DTA application modeling, the case problem can be described based on three groups of factors that influence DTA algorithm selection and parameter setting (Table 1): application purpose, data characteristics, and study area characteristics.For example, a single flow-direction algorithm (e.g., the classic D8 algorithm) is suitable for deriving flow accumulation from a SRTM DEM (with a resolution of 90 m) for drainage network extraction in high-relief areas, whereas a multiple flow-direction algorithm should be used with a 10-m DEM created from a contour map for estimating detailed spatial based reasoning by calculating the similarity between this new application problem and the problem part of each case in the case base.The solution of the case with the highest similarity is reused for the new DTA application problem.Note that in the conceptual framework of a case-based reasoning method, the solution of the retrieved case with the highest similarity might be further revised to adapt to the new application problem when the final solution for the new application problem is retained in the case base (Watson and Marir, 1994).However, the method developed in this preliminary study currently considers neither the revision nor the retention process.
Calculating the similarity between a new DTA application problem in case format and the problem part of each case in the DTA case base consists of the following two steps: Step 1. Calculate the similarity of each individual attribute between the new application problem and the problem description of an existing case.As usual the range of the similarity value is [0, 1]; the larger the value, the more similar are the two cases.As mentioned above, the attributes used to formalize the problem part of a DTA application case may have different value types, such as enumeration type (e.g., application purpose), single-value type (e.g., spatial resolution and area), or even a frequency distribution (e.g., hypsometric curve).For each attribute, a similarity function should be designed correspondingly to quantify the deviation on this attribute between the new application problem and an existing case.The design is generated in an empirical way and should match the domain knowledge.
Step 2. Synthesize the similarity values for every individual attribute to calculate the overall similarity between the new application problem and the problem description of an existing case.In the geographical domain, a minimum operator based on the limiting factor principle is often used to synthesize similarity values on multiple attributes (Qin et al., 2009).

Design of a detailed method
In this section, the methodology presented in the previous section is concretized by designing a detailed case-based formalization method for DTA application instances containing application-context knowledge and the corresponding inferences.The key issue in method design is designing a set of quantitative attributes describing the case problem and the similarity function on each individual attribute.Because the gridded DEM is widely used in practical applications, this method is designed mainly for grid-based DTA, although the methodology is available for both grid-and vector-based DTA.

Selection of attributes
The set of quantitative attributes should be designed to effectively reflect the contextual information related to DTA application modeling, and be fit for the case-based reasoning to follow.The purpose of a DTA application case is naturally described by an enumeration-type attribute, i.e., the name of the target task.Here, cell size has been chosen as the attribute to quantify the data characteristics of a DTA application case; other potential factors (such as type of data source) for describing data characteristics are not currently considered.
To describe the study area characteristics of a DTA application case, the area and the terrain condition of the case are considered in the current method.Like cell size, area is an attribute with a single numeric value.Terrain condition is an important and comprehensive factor indicating the difference in study area characteristics between a new DTA application problem and an existing case.
In this study, the three following aspects were designed to describe the terrain condition factor empirically: 1) Relief.The relief attribute is a commonly used value to describe the overall terrain condition of a study area, whether it is steep or gently sloping.
2) Slope distribution.The slope distribution provides information on the proportions of different intensities of local relief in the area, which cannot be described by the relief in the overall area and is useful for judging the reasonableness of a DTA algorithm selection and its parameter settings.To describe in detail the slope distribution in a study area, we quantified it by a relief-slope frequency distribution.For this purpose, the slope gradient was divided into seven grades: 0°-3°, 3°-8°, 8°-15°, 15°-25°, 25°-35°, 35°-45°, and 45°-90° (Tang et al., 2006).The relief of the study area was classified into one of ten levels with equal step.The relief-slope frequency distribution obtained in this way is a two-dimensional table with 10 level ×7 grade data items.Considering the influence of DEM resolution on the slope gradient calculation (Chang et al., 1991;Grohmann, 2015), a relief-slope cumulative frequency distribution were used here instead of the relief-slope frequency distribution to provide a quantitative description that relieves the DEM resolution effect.The relief-slope cumulative frequency in each relief level is calculated by accumulating the number of cells within each 7 Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2015-539, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 19 January 2016 c Author(s) 2016.CC-BY 3.0 License.slope gradient grade from low to high grade in this relief level.Note that the 10-level division of elevation considers only the relative relationship among the elevation levels inside the study area.The elevation level might consist of a distinct elevation step for a study area, in which case the relief of the study area would be ignored for this attribute.This proposed design appears to be not only a convenient way to automate similarity calculations in casebased reasoning, but also reasonable because the relief attribute reflects the relief information throughout the study area.
3) Landscape development stage for the study area, which can provide information on the geomorphic processes (mainly hydrological erosion process) affecting terrain conditions in a study area (often a watershed).This information is useful for judging the reasonableness of a choice of DTA algorithm and its parameter settings related to hydrological and erosion processes.In this study, the hypsometric curve (Strahler, 1952), which is normally used to analyze the landscape development stage of river basins, was used as an attribute to describe this aspect.
In the proposed method, location is not used as a study area characteristics.This decision was made because the influence of the study area location in DTA application-context knowledge could be reflected by the terrain condition of the study area, which directly impacts the choice of DTA algorithm and parameter settings and has already been considered in the method.For similar reasons and for the sake of brevity, in the proposed method, environmental conditions other than terrain condition are not considered.
Table 2 lists the attributes used to formalize a case problem in this method.

Similarity function on each individual attribute
The design of the similarity function for an individual attribute should be compatible with the value type of the attribute and in accord with domain knowledge regarding the level of similarity due to the difference in the attribute value between the new application problem and an existing case.For an attribute of the enumeration type, its similarity value between a new application problem and an existing case can be calculated by a Boolean function (Fig. 2a).
When the attribute values are matched, the similarity value is 1, otherwise it is 0.
For an attribute of the single numeric value type, two commonly used kinds of basic similarity function are considered in this study: the linear function and the bell-shaped function (Fig. 2).
Both kinds of similarity function accord with common sense in that the similarity is 1 for the minimum difference (i.e., zero) of attribute value, and the greater the difference in attribute value, the lower is the similarity.With the linear function, the similarity value is set to 0 or 1 when the absolute difference of the attribute between a new application problem and an existing case reaches its maximum or minimum value.The similarity can be calculated for other difference values by linear interpolation (Fig. 2b).The similarity function based on a linear function fits the specification that the maximum difference in attribute values can be preset.
With the bell-shaped function, the maximum difference in attribute values is not easy to preset and does not need to be.A simplified version of the commonly used bell-shaped function (Shi et al., 2005;Qin et al., 2009;Fig. 2c) is: where  is the similarity between a new application problem and an existing case; and   are attribute values of the new application problem and the existing case respectively; and  is the shape-adjusting parameter of the function.When the difference between   and   is equal to , the similarity  = 0.5 (Fig. 2).Some sort of numerical transformation on the attribute value could be necessary for the similarity calculation to yield a reasonable reflection of the similarity level due to differences in the attribute.
For an attribute of more complex type (such as a frequency distribution), a quantitative index should be designed to quantify the difference in an attribute between a new application problem and an existing case.Then the similarity on this attribute can be calculated based on this index, similarly to the single numeric-value type.
Based on these kinds of basic similarity function, similarity functions for each individual attribute used for case-based reasoning in this paper were designed as shown in Table 2.The following discussion introduces them one by one.

Name of target task
The name of the target task is an attribute of the enumeration type.The similarity value for this attribute between a new application problem and an existing case can be calculated by a Boolean function.When the names of two target tasks match, the similarity value is 1, otherwise it is 0.

Cell size
Note that the difference in magnitude of cell size can better reflect the level of similarity between DTA applications than the numerical difference in cell size.The greater the difference in magnitude, the lower is the similarity.According to this knowledge, a base-10 logarithmic transformation was applied to the cell size during the similarity calculations.
Because it is not easy to preset the maximum of the attribute value after logarithmic transformation, the bell-shaped function based on Eq. ( 1) was used to calculate similarity for cell size.Furthermore,  in Eq. ( 1) is set to 0.5, which means that the similarity in cell size between a new application problem and an existing case will decrease to 0.5 when their difference in cell size reaches one order of magnitude (e.g., 1 m vs. 10 m, or vice versa).The similarity function used in the proposed method for cell size is shown in Table 2.

Area
Like cell size, area is also an attribute of the single numeric value type.The greater the difference in magnitude between two areas, the lower is their similarity on area.Similarly to the design for the cell size attribute, a base-10 logarithmic transformation is applied to the area attribute and then the similarity function for this attribute is designed based on the bellshaped function.The  in Eq. ( 1) has been set to 1.5 for the area attribute by trial and error (see Table 2).

Relief
The greater the difference in relief value between a new application problem and an existing case, the lower is the similarity.The maximum difference in relief values between two DTA application areas can be preset due to the geometric nature of the Earth.Hence, the similarity function for the relief attribute was designed as a linear function using the absolute difference between the relief of the new DTA application problem and that of existing case.Mount Everest to sea level).The similarity function used in this method for the relief attribute is shown in Table 2.

Relief-slope cumulative frequency distribution (describing the slope distribution)
The relief-slope cumulative frequency distribution is a two-dimensional table with 10 level × 7 grade data items.This two-dimensional table can be viewed as a DEM having a volume with a constant projected area.The greater the overlap in volume between the distribution of a new application problem and that of an existing case, the higher is the similarity.Therefore, the similarity function for the relief-slope cumulative frequency distribution was designed as the ratio of the intersection volume to the union volume between two distributions (Table 2).

Hypsometric curve (describing the landscape development stage)
The hypsometric curve is often summarized as a single numeric value, the hypsometric integral (HI, with a value range of [0,1]), which can be used to classify landscape development into three stages: youth (HI > 0.6), maturity (0.35 < HI < 0.6), and old age (HI < 0.35) (Strahler, 1952).The HI was used to design a similarity function for the hypsometric curve between a new application problem and an existing case, which is a linear function using the absolute difference of their HI values.When the absolute difference in HI is 0, the corresponding similarity is 1.The similarity is 0 for the maximum possible deviation from the HI of the new application problem (see Table 2).
The overall similarity between a new application problem and an existing case is calculated as the minimum of all similarity values for every individual attribute between the new application problem and the existing case.

Experimental design
The extraction of a drainage network, one of the most important DTA applications, was taken drainage network extraction to the user.However, it is difficult for users, especially nonexpert users, to determine the appropriate threshold for their applications.
Therefore, this experiment was designed to focus on using the proposed method to determine the CA threshold for drainage network extraction.This means that the cases used in this experiment have the same name as the target task, i.e., drainage network extraction.The core of the solution part of the cases is the parameter value, i.e., the CA threshold.Although this experiment is somewhat simplified, we believe that it can evaluate the proposed method as effectively as an experiment with a more complex design.

Preparation of a case base
The case base prepared for this experiment includes 124 cases of drainage network extraction (Fig. 3).Each case originated from an article related to the target task that was recently prepare other attributes of this case, using trial and error.

Evaluation method
Among the 124 cases in the case base, 50 cases randomly selected were used as independent evaluation cases, which were assumed to be new application problems without a solution and were solved by the reasoning method proposed.The other 74 cases were set aside as the case base to be used by the proposed case-based reasoning method.
To perform a quantitative evaluation of the results from the proposed method on the 50 evaluation cases, an index was used, specifically the relative error of river density (E): where   and   are the river density values of a new application problem (i.e., an evaluation case), obtained respectively from the original CA threshold and the CA threshold solution obtained from the 74-case base by the proposed reasoning method. is the relative error in river density for the evaluation case.The smaller the value of , the more reasonable is the result obtained for the evaluation case using the proposed method.Four levels of E were established empirically to reflect the reasonableness level: reasonable (E∈[0,0.1]),acceptable (E∈(0.1,0.25]),questionable (E∈(0.25,0.5]),and unreasonable (E ∈ (0.5,+ ∞ )).Representative cases were also selected to discuss the reasonableness of its similarity result obtained using the proposed method.The relationship between E and the similarity value of the solution case to the evaluation case was also analyzed to discuss the performance of the proposed method.

Experimental results and discussion
Table 3 lists the results of 50 evaluation cases solved by the proposed method using the case base presented in the previous section.The similarities between every evaluation case and its most similar case as reasoned by the proposed method were found in this experiment to lie within a value range from 0.47 to 0.9.The higher the similarity, the lower is the uncertainty of the result from the proposed method.
According to the relative error of river density (E), the counts of evaluation cases with reasonable, acceptable, questionable, and unreasonable results are 26, 16, 3, and 5 respectively (Table 3).This shows that the proposed method performs satisfactorily.Taking of the evaluation cases (Fig. 4).Their values of relative error of river density are 0.07 (reasonable level) and 0.24 (acceptable level) respectively.
The evaluation results with questionable and unreasonable levels also have lower similarities.
This means that there is no case in the current case base that has an application context highly similar to that of the evaluation case.Hence, the solution from the proposed method has higher uncertainty and might lead to questionable or even unreasonable application results for new application problems.Taking the result for the YbbsRiver [1.01] evaluation case (E=0.4;questionable) as an example, the similarities between this evaluation case and other cases in the case base depend mostly on the similarities on the cell size attribute during the case-based reasoning process proposed in this paper (Table 4).Because the cell size of the YbbsRiver case is 10 m, which is relatively unlike cell size (30 m or 90 m) of most other cases in the case base, the overall similarities between this evaluation case and these cases in the case base are mainly limited by the individual similarity on cell size when synthesizing the similarities on individual attributes by the proposed method.Furthermore, Table 4 shows that the CA threshold values of the cases with the top 10 highest similarity values to the YbbsRiver evaluation case would make the E value of the application result for the evaluation case 14 questionable or even unreasonable (E: 0.33-21.73).The solution selected by the proposed method achieved a relatively better application result.
As for the reasoning results on the Kasilian [0.08] evaluation case (E=0.63;unreasonable) using the proposed method, no individual attribute has a controlling effect on the overall similarity between the Kasilian evaluation case and the other cases in the case base (Table 5).
The CA threshold values of the cases with the top 10 highest similarity values to the Kasilian evaluation case would almost always lead to an unreasonable E value of the application result for the evaluation case (E: 0.48-0.92).The similarities between this evaluation case and the cases in the case base are lower (Table 5).This problem could be mitigated by extending the case base to contain cases with more combinations of data characteristics and study area characteristics.
The distribution of the similarity results of the evaluation cases from the proposed method among the reasonableness levels of the drainage network results using the solved CA thresholds was also analyzed (Table 6).All solution cases with higher similarity (above 0.7) to the evaluation cases produced reasonable and acceptable drainage network results, whereas solution cases with lower similarity (below 0.7) often produced the questionable and unreasonable drainage network results.This shows the effectiveness with which similarity reflects uncertainty in the proposed method.

Summary
Although DTA application-context knowledge is of key importance in building an appropriate DTA application, currently this type of knowledge has not been formalized to be available for DTA-assisted tools to relieve the modeling burden of DTA users (especially non-expert users).
This paper has proposed a case-based methodology for formalizing DTA application-context knowledge and corresponding case-based reasoning.A detailed method based on this methodology has been developed.Taking drainage network extraction from a gridded DEM as an application example, 124 cases (50 for evaluation and 74 for reasoning) of drainage network extraction from peer-reviewed journal articles were used to evaluate the performance of the proposed method.Preliminary evaluation results show the reasonableness of the proposed case-based method.
Additional research is needed to enhance the proposed method.Currently the proposed methodology is implemented as a primary method in this paper.The design for the individual attributes and their quantification in each case could be improved to describe the applicationcontext knowledge in a more adaptive way for various DTA application targets.Another possible improvement to the method would be to revise the solution part of the case as suggested by case-based reasoning before applying the solution to the new application problem.The possibility of synthesizing the solutions of the cases in the base with higher similarity to build a solution to the new application problem could be also explored.
Automatic or semi-automatic methods of creating cases are needed to speed up the expansion of the case base (not only for the current target task, but also for other DTA application tasks).
An expanded case base containing as many cases as possible with more combinations of all kinds of characteristics would improve the application effectiveness of the proposed method.
The size of the case base also matters when evaluating the effectiveness of the case-based reasoning method and its successive versions.However, current cases used in the experiment were mainly manually prepared from journal articles, except for certain attribute calculations (e.g., relief, hypsometric curve), for which an automatic computer program was used.This inefficient way of preparing cases needs to be improved through automatic or semi-automatic case-extraction methods.
Corresponding to a zero similarity value, the maximum difference between two relief values is the larger of the relief differences between the new application problem values and each of two extreme cases (a flat area with zero relief, and an area with relief from the 8848 m of 10 Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2015-539,2016   Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 19 January 2016 c Author(s) 2016.CC-BY 3.0 License.
as an example to evaluate the proposed method.The general workflow of river network extraction based on a gridded DEM includes the following three DTA tasks in sequence: 1) 11 Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2015-539,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 19 January 2016 c Author(s) 2016.CC-BY 3.0 License.preparing a DEM by filling in the artificial pits and removing absolutely flat areas; 2) using a flow direction algorithm to derive the spatial distribution of the catchment area (CA); and 3) setting a CA threshold to extract the drainage network from the spatial distribution of the CA.In this DTA workflow, proper selection of the DTA algorithms (such as the DEM preparation algorithm and the flow direction algorithm) and of parameter values (e.g., the CA threshold) is based on DTA application-context matching knowledge.In many geographical information systems (such as ArcGIS), the DTA algorithm used for drainage network extraction has often been set to a default selection (e.g., the D8 algorithm as the default flow direction algorithm) in such a way that the user cannot choose the DTA algorithm.The CA threshold is an empirical parameter which varies with the study area characteristics and affects the extraction results directly.Current DTA-related tools often leave the choice of CA threshold for published in mainstream journals of related domains (such as Water Resources Research, Hydrology and Earth System Sciences, Hydrological Processes, Computers & Geosciences, Advances in Water Resources; see the Appendix document for the list of the articles used for cases).These articles are supposed to provide good solutions for their specific study areas based on experts' experience and knowledge of the target task.Each case was manually prepared from a journal article.The main work involved in preparing the case problem was extracting each attribute of the study area, whereas the work involved in preparing the case solution consisted of extracting the CA threshold used in the article.Normally, the cell size used is clearly stated in the article and can be filled in as the Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2015-539,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 19 January 2016 c Author(s) 2016.CC-BY 3.0 License.corresponding case attribute.However, this is often not true for other attributes.Therefore, an automatic program was applied to a free DEM dataset of the study area (mainly an SRTM DEM with a resolution of 90 m and an ASTER GDEM with a resolution of 30 m) to derive the other attributes (such as area, relief, relief-slope cumulative frequency distribution, and hypsometric curve) for each case.For the solution part of each case, the CA threshold given explicitly in each article was recorded directly.If the CA threshold was shown only implicitly in the drainage network figure in an article, it was determined based on visual comparison between the drainage network given in the article and those extracted from the DEMs used to the results on two evaluation cases, Godavari [1053] (the "[1053]" means that the original CA threshold recorded in the Godavari case was 1053 km 2 ) and Burdekin [502] ("[502]" defined similarly) as examples, their most similar cases in the case base as reasoned by the proposed method were KrishnaRiver [908.08] and MahanadiRiver [891] respectively.The CA threshold values from the solution of the most similar cases (908.08 km 2 and 891 km 2 ) were applied respectively to the Godavari and Burdekin evaluation cases.The extracted drainage networks are with close spatial distribution as those extracted with the original CA thresholds Hydrol.EarthSyst.Sci.Discuss., doi:10.5194/hess-2015-539,2016   Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 19 January 2016 c Author(s) 2016.CC-BY 3.0 License.

Figure 1 .Figure 2 .
Figure 1.Structure of the case-based formalization and reasoning method for DTA application-context knowledge.

Figure 3 .Figure 4 .
Figure 3. Spatial distribution of the cases used in this study (the box in the map shows an example of a formalized case).

Table 2 .
Attributes used in this study to formalize the case problem and the corresponding similarity functions for case-based reasoning using DTA application-context knowledge.Note:   is the similarity (value range: [0, 1]) of an individual attribute between a new application problem and the i-th case;   ,   are the DEM resolutions (m) of the new application problem and the i-th case respectively;   ,   are the areas (km 2 ) of the new application problem and the i-th case respectively;   ,   are the relief (m) of the new application problem and the i-th case respectively;   ,   are the histograms of the relief-slope cumulative frequency distributions of the new application problem and the i-th case respectively; and   ,   are the hypsometric integrals of the new application problem and the i-th case respectively.20 Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2015-539,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 19 January 2016 c Author(s) 2016.CC-BY 3.0 License.

Table 5 .
Top 10 similarity values between the Kasilian evaluation case and existing cases as 1