27.04.2015 Views

PROOF COVER SHEET

PROOF COVER SHEET

PROOF COVER SHEET

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>PROOF</strong> <strong>COVER</strong> <strong>SHEET</strong><br />

Journal acronym: TGIS<br />

Author(s): Shiuan Wan, Tsu-Chiang Lei and Tein-Yin Chou<br />

Article title: A landslide expert system: image classification through integration of<br />

data mining approaches for multi-category analysis<br />

Article no: 613397<br />

Enclosures: 1) Query sheet<br />

2) Article proofs<br />

Dear Author,<br />

1. Please check these proofs carefully. It is the responsibility of the corresponding<br />

author to check these and approve or amend them. A second proof is not normally<br />

provided. Taylor & Francis cannot be held responsible for uncorrected errors, even if<br />

introduced during the production process. Once your corrections have been added to the<br />

article, it will be considered ready for publication.<br />

For detailed guidance on how to check your proofs, please see<br />

http://journalauthors.tandf.co.uk/production/checkingproofs.asp.<br />

2. Please review the table of contributors below and confirm that the first and last<br />

names are structured correctly and that the authors are listed in the correct order of<br />

contribution. This check is to ensure that your name will appear correctly online and<br />

when the article is indexed.<br />

Sequence Prefix Given name(s) Surname Suffix<br />

1 Shiuan Wan<br />

2 Tsu-Chiang Lei<br />

3 Tein-Yin Chou


Queries are marked in the margins of the proofs. Unless advised otherwise, submit all<br />

corrections and answers to the queries using the CATS online correction form, and then<br />

press the “Submit All Corrections” button.<br />

AUTHOR QUERIES<br />

General query: You have warranted that you have secured the necessary written<br />

permission from the appropriate copyright owner for the reproduction of any text,<br />

illustration, or other material in your article. (Please see<br />

http://journalauthors.tandf.co.uk/preparation/permission.asp.) Please check that any<br />

required acknowledgements have been included to reflect this.<br />

AQ1<br />

AQ2<br />

AQ3<br />

AQ4<br />

AQ5<br />

AQ6<br />

AQ7<br />

AQ8<br />

AQ9<br />

AQ10<br />

AQ11<br />

AQ12<br />

AQ13<br />

Please provide the department name for the author “Chou”, if applicable.<br />

Please provide the expansion for SPOT, if applicable.<br />

Please check whether the year is correct as inserted for the author “Wan” in<br />

the sentence “Wan (2009) used a ...”.<br />

Please check whether the edits to the sentence “In the past, in this ...”are<br />

correct.<br />

The citation ’Lee and Vachtsevanos, 2002’ has not been included in the<br />

reference list. Please check.<br />

The sense of the sentence “Then, the modification of Thematic Map ...”is<br />

not clear. Please consider rephrasing for clarity.<br />

The citation ‘Wan and Yen 2006’ has not been included in the reference list.<br />

Please check.<br />

Please check whether any term is missing after the phrase “given in” in the<br />

sentence “The method for extracting ...”.<br />

Please provide the expansion for ERDAS, if applicable.<br />

The sense of the sentence “Different atmospheric condition can result in<br />

different qualities of Figure” is not clear. Please check.<br />

The reference citation “Sinha and Laplante 2004” not included in the<br />

reference list. Please provide.<br />

Both the terms “k-means” and “K-means” are present in the text. As such, the<br />

term “k-means” has been changed to “K-means” throughout. Please check<br />

whether this is OK.<br />

Please check whether the temporary citation of Table 6 is correct as inserted<br />

here.<br />

AQ14 As there are three authors, the acknowledgement section has been changed<br />

accordingly. Please check whether this is OK.<br />

AQ15 Please provide the citation for the reference “Baeza and Corominas, 2001”.


AQ16 Please provide the place and date of the proceedings, publisher name and<br />

location, and editor group for the reference “Deogun et al. 1994”, if<br />

applicable.<br />

AQ17 Please provide the place and date of the proceedings, publisher details and<br />

editor group for the reference “Katzberg and Ziarko 1993”, if applicable.<br />

AQ18 Please provide the access date for the reference “Lee and Choi 2004a”.<br />

AQ19 Please provide the access date for the reference “Lin 2008”.<br />

AQ20 Please provide the access date for the reference “Lin et al. 2007”.<br />

AQ21 Please provide the citation for the reference “Maleta et al. 2005”.<br />

AQ22 Please provide the publisher name and location and the date of the proceeding<br />

for the reference “Nguyen and Skowron 1995”.<br />

AQ23 Please provide the publisher name and location for the reference “Nguyen and<br />

Nguyen 1998b”.<br />

AQ24 Plesae provide the access date for the reference “RSES 2.2 User’s Guide,<br />

2005”.<br />

AQ25 Table 3a and b has been set as Tables 3 and 4, respectively, and the subsequent<br />

tables renumbered. Please check whether this is OK.


International Journal of Geographical Information Science<br />

Vol. 00, No. 00, Xxxx 2011, 1–24<br />

A landslide expert system: image classification through integration of<br />

data mining approaches for multi-category analysis<br />

Shiuan Wan a *, Tsu-Chiang Lei b and Tein-Yin Chou c<br />

AQ1<br />

a Department of Information Management, Ling Tung University, Taichung, Taiwan; b Department of<br />

Urban Planning and Spatial Information, Feng Chung University, Taichung, Taiwan; c GIS Center,<br />

Feng Chia University, Taichung, Taiwan<br />

(Received 16 February 2011; final version received 4 August 2011)<br />

5<br />

Remote Sensing (RS) data can assist in the classification of landscapes to identify<br />

landslides. Recognizing the relationship between landform/landscape and landslide<br />

areas is, however, complex. Soil properties, geomorphological, and groundwater condi- 10<br />

tions govern the instability of slopes. Previous study of Wan (2009; A spatial decision<br />

support system for extracting the core factors and thresholds for landslide susceptibility<br />

map. Engineering Geology, 108, 237–251) used the maximum-likelihood classifier to<br />

classify the multi-category landslide image data. Unfortunately, the classification does<br />

not consider the geomorphologic condition. Accordingly, a Landslide Expert System 15<br />

was developed to modify these problems. The system uses multi-date SPOT image data<br />

to develop the landslide database. The threshold slope which becomes vulnerable to<br />

landslides is obtained by the K-means method. Then, an innovative Data Mining technique<br />

– Discrete Rough Sets (DRS) – is applied to obtain the core variables and their<br />

relevant thresholds. Finally, the Expert Knowledge Translation Platform (EKTP) is used 20<br />

to create the rules for classification. This study used a new approach called ‘Rough Set<br />

Tree’ to demonstrate the performance of the approach. The classification of landslide<br />

vulnerable areas, bare land, rock, streams, and water-body is greatly improved.<br />

Keywords: expert system; landslide; data mining<br />

AQ2<br />

1. Introduction 25<br />

In general, landslides are dramatic events in the progressive degradation of a slope. They<br />

are usually driven by rainfall or gravity forces, but may be triggered by tectonic movement<br />

(Mayoraz et al. 1996, Floris et al. 2004). The tendency of a slope to move is described<br />

as the instability of the slope, whereas the failures are described as the actual mass movement.<br />

This may occur along well-defined planes with large catastrophic displacements or 30<br />

by slow movement. Although slopes are likely to degrade over time, movement still may<br />

be triggered into more dramatic failure by a variety of natural or manmade activities. That<br />

is, a landslide occurs when the shear stresses within the slope exceed the shear strength<br />

of the soil. Therefore, geomorphologic investigations become crucial to yield information<br />

about the stability of topographic features in a particular study area. Vegetation can 35<br />

greatly influence the surface runoff and reduce the pore water pressure and water content<br />

of the soil (Huete 1988). Simple numerical vegetation indices can be used to describe<br />

*Corresponding author. Email: shiuan123@mail.ltu.edu.tw<br />

ISSN 1365-8816 print/ISSN 1362-3087 online<br />

© 2011 Taylor & Francis<br />

DOI: 10.1080/13658816.2011.613397<br />

http://www.informaworld.com


2 S. Wan et al.<br />

the vegetation condition on a slope. In this study, vegetation conditions and slope are two<br />

important variables for generating a landslide susceptibility map.<br />

To observe the vegetation condition and slope instability, satellite Remote Sensing (RS) 40<br />

images and Digital Elevation Model (DEM) data are used. The capability of RS image data<br />

to provide information about vegetation conditions over extensive areas is well accepted<br />

(Lin et al. 2006). DEMs are widely used to study the morphological characteristics of the<br />

landform (Lin et al. 2007, Lin 2008).<br />

Although the DEM and RS data are important for mapping landslide, an efficient deci- 45<br />

sion system is necessary to model the simulated occurrence of a landslide. Unfortunately,<br />

few decision systems integrate RS and DEM data. Landslide hazard mapping is often<br />

performed by intersecting hillslope instability factors and vegetation conditions, with the<br />

result usually managed as a Thematic Map (Bannari et al. 1995, Mayoraz et al. 1996).<br />

There are two kinds of decision systems: (1) a decision support system (DSS) and (2) an 50<br />

expert system (DS). Decision systems are now widely applied in many fields.<br />

A fundamental question arises: how to develop these decision systems efficiently? For<br />

instance, an effective DSS for landslide risk (Wan 2009) usually requires (1) an integration<br />

system to collect the environmental data (by either RS or DEM); (2) a data-driven<br />

method to search for the best representative study dataset (such as cross validation method); 55<br />

and (3) a good classifier. In his study, Wan (2009) used a maximum-likelihood classifier<br />

(MLC). The classification accuracy was about 77.8% but improvements were needed to<br />

discriminate between different categories. However, resolving the difficulties of the classification<br />

of various categories involves a great deal of expert knowledge (Ziarko 1991,<br />

Zerger 2002, Wan et al. 2009, Pradhan and Lee 2010, Wan et al. 2010a). The objective of 60<br />

this study is to develop an ES to enhance the accuracy. The resolution consisted of correcting<br />

the following issues: (1) A threshold of slope is found to distinguish many of the<br />

‘easily confused’ categories, such as bare land and channels. For instance, rock, streams,<br />

and landslide are easily confused categories in image classification. (2) A translation<br />

scheme is generated to transfer the incorrectly classified samples into correct categories. 65<br />

This experience was of help in constructing an ES to enhance the accuracy of prediction<br />

effectively.<br />

To sum up, enhancing the prediction accuracy of the classification of landslide risk<br />

involves (1) the selection of a classifier and (2) a translation platform created from expert<br />

knowledge. The categories discriminated by classifier in this study include water-body, 70<br />

stream, grassland, timberland, landslide area, bare land (rock), and potential landslide area<br />

(sensitivity area). In the past, in this sort of classification application, statistical models<br />

from a set of given data were generally used. In typical studies, a learning algorithm passively<br />

accepts randomly selected training examples. Providing labeled examples is costly<br />

in terms of human time and effort (Lee and Choi 2004). Selected training examples of 75<br />

this type may be quite biased. For example, some researchers describe a Support Vector<br />

Machine (SVM) for the prediction of landslides or debris flows (Yesilnacar and Topal<br />

2005, Wan and Lei 2009, Yilmaz 2009) which was trained by a few selected training<br />

examples. The classification outcomes were found to remain the same if all data except<br />

the support vectors were omitted from the training set. Thus, only a few examples define 80<br />

the separating surface and all the rest were redundant to the classifier. Given that experience,<br />

we planned to develop an ES which integrates a Data Mining approach with<br />

an effective translation strategy. Specifically, the complexity of landform categories versus<br />

the landslide occurrence is a difficult problem; therefore, we developed an Expert<br />

Knowledge Translation Platform (EKTP) to tackle the performance of landslide occurrence 85<br />

classification. A similar concept is presented in Ahlqvist (2005).<br />

AQ3<br />

AQ4


International Journal of Geographical Information Science 3<br />

In recent years, Data Mining in the geosciences (Lei et al. 2008, Wan et al. 2010b) has<br />

identified new approaches. Also, the development of DSS and ES provides pathways to<br />

integrate different sources of data to enhance the quality of classification. The main problems<br />

that can be approached using Rough Set theory include rough classification (Deogun 90<br />

et al. 1994, Słowiński et al. 1994), reduction of an information system (Katzberg and<br />

Ziarko 1993), discovery of data dependencies (Ziarko 1991, Ahlqvist et al. 2000, 2003),<br />

and other new Data Mining applications (Deogun et al. 1994). Accordingly, we consider<br />

Rough Set theory (Pawlak 1982, 1991) as the engine of DSS or ES to handle the vagueness<br />

and uncertainty of data (Pawlak et al. 1995, Lee and Vachtsevanos 2002).<br />

95<br />

The advantage of Discrete Rough Set (DRS) over Rough Set is that it can handle continuous<br />

data and transform them into a discrete dataset (Nguyen and Skowron 1995, Nguyen<br />

and Nguyen 1998a, 1998b). Not much work has been done in the feature selection field of<br />

image classification using this technique (Leung et al. 2007). Such research can offer three<br />

crucial advantages: 100<br />

AQ5<br />

(1) Dimensional reduction: dimension reduction is the process of reducing the number<br />

of random variables under consideration (Kirshnaiash and Kanal 1982).<br />

(2) Thresholds: the image classification from the cutting points of core attributes.<br />

(3) DRS provides ancillary criterion rules for each of the attributes obtained from the<br />

image datasets. 105<br />

As mentioned above, this is completely different from the traditional deterministic or statistical<br />

methods that require predetermined ‘weights’(Van Westen et al. 2003, Lee et al.<br />

2004) and assume distributions (such as uniform distribution or normal distributions) in<br />

the independent variables. Taking advantage of the benefits of the DRS, therefore, our<br />

study integrated the DRS to display the following process: 110<br />

(1) We study the soil properties, slope instability, and vegetation conditions. We<br />

initially use K-means to find the threshold of slope instability from the DEM<br />

data. Then, we use the DRS to generate significant image features of vegetation<br />

conditions by original RS image data with ancillary vegetation indicators.<br />

(2) We also discover the image classification tree-rules through DRS, providing crucial 115<br />

information for decision making.<br />

(3) Based on the observations on the changes of categories (such as water-body,<br />

timberland, sensitive area, rock, stream, and landslide) before/after a landslide,<br />

we develop an ancillary tool EKTP to improve the efficiency of multi-category<br />

classification. 120<br />

A review of the literature suggests that it is important to test the Rough Set tree as a tool to<br />

enhance the classification efforts (Ananthanarayana et al. 2003, Wei et al. 2005). Hence,<br />

our research has two parts: (1) A series of DEM data are used to find the threshold of<br />

slope for landslide occurrence. (2) The DRS method is applied to tackle the RS data.<br />

Another important contribution of this study is to generate a Rough Set tree to describe 125<br />

the occurrence of a landslide in our study area. In the process of constructing a tree, the<br />

criteria of selecting test attributes will influence the classification accuracy of the tree. In<br />

this study, the degree of dependency of condition attributes (input variables) to decision<br />

attributes (output decisions) is found. DRS theory is used as a heuristic way to select the<br />

attribute that will accurately separate the samples into individual classes (Pawlak 1982, 130<br />

1991, Pawlak et al. 1995, Walaczak and Massart 1999, Goh and Law 2003). Figure 1


4 S. Wan et al.<br />

Discrete rough set<br />

Seven bands of<br />

RS image<br />

Sampling<br />

Thresholds of slope<br />

Accuracy<br />

rate<br />

Theme map<br />

Arc-Gis<br />

program<br />

Core factor<br />

with thresholds<br />

Knowledge<br />

database<br />

Figure 1.<br />

The steps for Discrete Rough Set + Theme Map.<br />

shows the steps of the DRS and how the Thematic Map is generated. We used three original<br />

image bands and four ancillary information as the study material. Then, we selected<br />

45 samples of image data to study the relations between the landslide occurrences versus<br />

nonoccurrences. The classifier DRS (developed by Rough Set Exploration System 135<br />

(RSES)) is used to attain the core factors and thresholds (through dimensional reduction).<br />

Then, the spatial knowledge database is generated. Next, we study the samples and use<br />

the thresholds of the selecting samples to create EKTP by the slopes. The first classification<br />

process was done by DRS and then the Thematic Map is created initially. Then,<br />

the modification of Thematic Map enhances the performance accuracy was based on the 140<br />

EKTP. Finally, the multi-category classification was presented by Rough Set-based tree<br />

structure.<br />

AQ6<br />

2. Study area, scenario, and materials<br />

2.1. Study area<br />

In general, the analyses of landslide occurrences usually rely on inventory maps. 145<br />

Unfortunately, the study area is short of historical data of landslides and the inventory<br />

map does not exist. In this research, the study area is selected at Shei-Pa National Park<br />

(see Figure 2a) and the data collected belong to the period after the Chi-Chi earthquake.<br />

Accordingly, a good case study of fracture geology may contain complicated geology<br />

and fault crossover. The entire area is surrounded by giant woods with natural beautiful 150<br />

scenery. The study area in the central part of Taiwan (E 121:13:39, N 24:09:57). The Shei-<br />

Pa National Park (area: 75,000 ha.) is situated 40–80 km from the Chelung Pu fault, which<br />

transmitted tremendous energy to the central part of Taiwan during this event. Utilizing<br />

DEMs, SPOT-image data, field investigation, and the attribute data from GIS (morphology,<br />

geology, landslide, slope, soil type, and so on), we built a new database for observing 155<br />

landslide events.<br />

To derive the geomorphological data, we relied on a well-developed DEM database.<br />

This was used to construct a series of knowledge rules for landslide occurrence. The<br />

DEM was a subset clipped from the DEM database of Taiwan by the Center for Space


International Journal of Geographical Information Science 5<br />

Figure 2. The (a) location, (b) 2006/07/29, (c) 2006/10/20 remote sensing images, and<br />

(d) 3D-DEM model image data.<br />

and Remote Sensing Research, National Central University. The DEM database of Taiwan 160<br />

had a resolution of 40 m × 40 m. This study made use of two different SPOT image scenes<br />

(2006/7/29 and 2006/10/20). The size of each pixel is 20 × 20 (m) resampled to match<br />

the resolution of the DEM data.


6 S. Wan et al.<br />

2.2. Scenario<br />

In our study, a catastrophic earthquake occurred on 21 September 1999 in Central Taiwan. 165<br />

The seismic magnitude reached 7.3 on the Richter scale at the center. The Chi-Chi earthquake<br />

triggered many landslides, dammed lakes and a large number of people were killed.<br />

The damage caused by this earthquake was the result of numerous issues including urban<br />

and rural development, land utilization, and soil and water conservation. Usually, the average<br />

rainfall is about 3000–4200 mm/year. Two years later, typhoon Toraji brought a heavy 170<br />

rainfall (about 1750 mm in 3 days) causing more landslides. A large number of these landslides<br />

have been mapped from SPOT images and some of these are verified by a detailed<br />

field investigation. Figure 2b and c presents the two different scenarios of the monitoring<br />

time. The key point of using these images is to observe the changes for landscape which<br />

will be a good resource for landslide analysis. 175<br />

2.3. Spatial database and expert’s experiences<br />

The environmental landslide database included two indicators: (1) slope and the elevation<br />

and (2) the spectrum of image data and vegetation indicators. In the first part, the slope<br />

data are calculated from DEM data. Applying the DEM data, the elevations of observation<br />

spots were attained. Then, the slopes of those spots are calculated: (1) the average elevation 180<br />

is about 1200–1800 m and (2) the average slope is 23.78 ◦ with standard deviation of 4.7 ◦ .<br />

The instability of the slope is usually governed by soil type and the slope of the landforms<br />

(Wan et al. 2008). Figure 2d shows one of the profiles of 3D-DEM model of the Shei-Pa<br />

National Park. The vegetation indices were derived from our SPOT image dataset.<br />

2.4. Type of landslide 185<br />

The landslides triggered by the Chi-Chi earthquake can be classified into four broad categories:<br />

(1) ubiquitous, relatively shallow slides on very steep slopes underlain by stiff soils<br />

and jointed rock; (2) rock falls; (3) sparse deep seated failures; and (4) rare, very large<br />

coherent deep seated landslides (Khazai and Sitar 2003). In our study, the initial material<br />

for analysis of landslide samples included geology data, soil distribution map, and eleva- 190<br />

tion map. The soil type and soil depth were measured and then the engineering analysis<br />

of stability taken into account for the factors of slope and elevation (see next section).<br />

However, slopes with different soil-type conditions (such as permeability) behave differently<br />

in response to rainfall process or earthquake excitations (Khazai and Sitar 2003, Hong<br />

et al. 2005). In this research, the assumption of study area with regard to slope failures is 195<br />

similar (see Wan et al. 2010b for the detailed process of categorizing the soil type). Thus,<br />

it is presumed the soil type and soil depth are similar for each analyzed sample. Hence,<br />

their stability problems of slope failures are analyzed rationally.<br />

3. Research methods<br />

3.1. Applying K-means to search the threshold of slope instability 200<br />

K-means is an iterative clustering algorithm where items (or samples) are moved among<br />

sets of clusters until the desired sets are reached (Wan and Yen 2006). The cluster center<br />

is defined as the mean value of each cluster. While implementing K-means, the first step is<br />

to assign number of clusters and the initial value for each cluster center. Then, assign each<br />

items (or samples) to the cluster, which has the closest center, and to calculate the new 205<br />

AQ7


International Journal of Geographical Information Science 7<br />

mean value for each cluster as a new center. Repeat this step until the convergence criteria<br />

are met. The algorithm is inherently iterative. The performance of the K-means depends<br />

on the initial positions of the cluster center, thereby making it advisable either to employ<br />

proper initial cluster or to allow more iterations (Rand 1971, Darken, and Moody 1990).<br />

3.2. Prominent classifier-DRS method 210<br />

This study focuses on a new technique of Data Mining scheme – Rough Set. The difference<br />

between conventional Rough Set and DRS is that data in conventional Rough Set must be<br />

predetermined in different groups/classes and DRS can handle continuous data and transform<br />

them into a discrete dataset. Furthermore, the DRS provides classification rules from<br />

the image data. This renders the knowledge database efficient for maximum separation 215<br />

among the categories in image classification.<br />

3.2.1. Rough Sets theory and discretization process (Nguyen and Skowron 1995)<br />

There are three stages involved in DRS. In the first stage, the ‘Information Table’ must<br />

be developed for the description of the characteristic attributes (inputs). In this table, a<br />

relation in a multi-attribute set is displayed. Then, all the attributes must be clustered into 220<br />

appropriate classes to construct a ‘Decision Attribute.’ The final step is to obtain the Cores<br />

and Reducts of the data attributes. Reducts and Cores are two fundamental concepts related<br />

to attribute reduction. The minimal subsets of attributes that discriminate equivalent classes<br />

of the relation, which is discriminable by the entire set of attributes, are called Reducts. The<br />

Core is the common part of all Reducts. 225<br />

3.2.1.1. RS theory to discretization the attribute values. The RS theory, first described by<br />

Pawlak (1982), is a formal approximation of a crisp set (i.e., conventional set) in terms of<br />

a pair of sets which give the lower and upper approximation of the original set.<br />

Step 1: Create information system table (I)<br />

Let I = (U, A) be an information system (attribute-value system), where U is a nonempty 230<br />

set of finite objects (the universe) and A is a nonempty, finite set of attributes such that<br />

a : U → V a for every a ∈ A. V a is the set of values that attribute a may take. With any<br />

P ⊆ A, there is an associated equivalence relation IND(P):<br />

IND(P) = { (x, y) ∈ U 2∣ ∣ ∀a ∈ P, a(x) = a(y)<br />

}<br />

(1)<br />

The partition of U generated by IND(P) is denoted U/IND(P) and can be calculated as<br />

follows: 235<br />

U/IND(P) =⊗{U/IND({a})| a ∈ P} (2)<br />

where U is the universe of the dataset; the symbol of ‘/’ is to cut the U into various<br />

subsets. The symbol ‘⊗’ means intersection of the subsets. That is, A ⊗ B =<br />

⊗{X ∩ Y |∀X ∈ A, Y ∈ B, X ∩ Y ̸= φ }. If(x, y) ∈ IND(P), then x and y are indiscernible<br />

by attributes from P. These indistinguishable sets of objects therefore define an equivalence<br />

or indiscernibility relation, referred to as the P-indiscernibility relation. The equivalence 240


8 S. Wan et al.<br />

classes of the P-indiscernibility relation are denoted [x] P . Please refer to Stefanowski<br />

(1998) for more details.<br />

Step 2: Find the maximum sum of the row and minimum sum of column on I<br />

The set of attributes which is common to all Reducts is called the Core: the Core is the<br />

set of attributes which is possessed by every legitimate Reduct, and therefore consists of 245<br />

attributes which cannot be removed from the information system without causing collapse<br />

of the equivalence-class structure. The Core may be thought of as the set of necessary<br />

attributes.<br />

Written it mathematically, it can be stated as<br />

P a k :[va k , va k+1 ) ⊆ [min(a(x i), a(x j )); max(a(x i ), a(x j )) (3)<br />

where k is the cutting number for various sections. Actually, the Information Table is a 250<br />

two-dimension matrix. We sort all the attributes value with respect to decision(s). Then, fictitious<br />

cutting points are assigned into each attributes. Equation (3) is used to determine the<br />

best cutting point on the fictitious cutting point in Information Table. The min(a(x i ), a(x j ))<br />

is the minimum number of the corresponding cutting point that occurs on Information<br />

Table; max(a(x i ), a(x j )) maximum number of the corresponding cutting point that occurs 255<br />

on Information Table.<br />

Step 3: Cutting points are calculated<br />

By the way, the fictitious cutting point in Equation (3) can be written as<br />

{(<br />

v a1<br />

k1<br />

P(S) = a + ) (<br />

va1 k1+1<br />

v a2<br />

k2<br />

1, , a + va2 k2+1<br />

2,<br />

2<br />

2<br />

)<br />

v<br />

, ...,<br />

(a ar<br />

kr + var<br />

r,<br />

2<br />

kr+1<br />

) } (4)<br />

where a 1 ,a 2 ...a r denote the attributes of the Information Table; v a k andva k+1<br />

are the values<br />

of each attributes. 260<br />

Step 4: Generating classification rules<br />

Our final purpose is to find the minimal set of consistent rules that characterize the system.<br />

For a set of condition attributes P = {P 1 , P 2 , ··· , P N } and a decision attribute Q, Q /∈ P<br />

these rules should have the form<br />

{P a i }{Pb j }···{Pc k }→Qd (5)<br />

where {a, b, c} are their respective attributes and d is the decision. The symbol ‘’ is 265<br />

the operator between sets which means intersection. The symbol ‘→’ means inference.<br />

This is a form typical of association rules, and the number of items in U that match the<br />

condition/antecedent is called the support for the rule. The method for extracting such<br />

rules given in is to form a decision matrix corresponding to each individual value d of<br />

decision attribute Q. Informally, the decision matrix for value d of decision attribute Q 270<br />

lists all attribute–value pairs that differ between objects having Q = d and Q ̸= d.<br />

AQ8


International Journal of Geographical Information Science 9<br />

3.2.1.2. Program of RSES. The program of RSES is used to handle the calculation of the<br />

above process. In this study, RSES software is used to create the knowledge for classification.<br />

This software was developed by Andrzej Skowron and his R&D team in Warsaw<br />

University (RSES 2.2 User’s Guide 2005). The main aim of RSES is to provide a tool 275<br />

for performing experiments on tabular datasets. The main purpose of DRS is to develop<br />

knowledge rules from satellite images. The operations of DRS can clearly express the relations<br />

between attributes and decisions. The concept of the Rough Set-based tree structure<br />

is adopted from Ananthanarayana et al. (2003). Rough Set-based tree is to visualize the<br />

tree structure of the rules. It is also attained from DRS theory mathematically. 280<br />

4. Steps for analysis and discussion<br />

Step 1: Image fusion – combine spectrum image and panchromatic image<br />

Image fusion is a process dealing with data and information from multiple sources to<br />

achieve refined/improved information for decision making (Hall 1992, Lei et al. 2008).<br />

This process can be an integration of disparate and complementary data to enhance the 285<br />

image information as well as to increase the reliability of interpretation. On the other hand,<br />

this process also combines two or more different resolution/scale images to a new image<br />

(same scale) by using a kernel algorithm. In this study, we integrate the multi-spectral<br />

image (resolution: 20 m) and panchromatic image (resolution: 10 m) from a SPOT image<br />

based on the pixels through PCA (Principle Component Analysis) method using ERDAS<br />

290<br />

image software. The new image resulting from image fusion can provide a better resolution<br />

for the preprocessing of the image classification.<br />

AQ9<br />

Step 2: Applying image subtraction to find the location of landslide<br />

In general, landslides can be identified by the method of image subtraction. The algorithm<br />

is based on a pair of images of the same area collected at different times. The process 295<br />

simply subtracts one digital image, pixel-by-pixel, from another, to generate a third image<br />

composed of the numerical differences between the pairs of pixels (Ridd and Liu 1998).<br />

After image subtraction, the denudation sites can be given a highlighted value, which can<br />

be easily used for landslide extraction. Figure 3 shows the knowledge rules based on different<br />

scenario through the DRS method. Figure 4a shows the reference location of the study 300<br />

samples in the imageries. The grid-cells are double checked by the knowledge rules from<br />

Tables 3 and 4. Also, we plot the elevation and slope of the selected spots (training sample)<br />

from Figure 3.<br />

Step 3: Applying K-means to search the thresholds for landslide occurrence<br />

More specifically, the classifications become the core part for detecting the landslide area. 305<br />

From the previous step, the spectrum is adopted. However, the spectral information is not<br />

adequate to classify the categories. Thus, the DEM data are used to improve the classification<br />

of the categories (Gooch and Chandler 1998). A binary class of decision 1 and decision<br />

2 represent the occurrence and nonoccurrence of landslide investigating samples, respectively.<br />

That is, decision 1 is category of i and decision 2 is a category of j in Equation (6). 310<br />

Each of the samples represents one of the pixels on the map. The threshold (k) is obtained<br />

from


10 S. Wan et al.<br />

If NDVI < –0.0125<br />

else<br />

Nonoccurrence<br />

VI > 20<br />

else<br />

Nonoccurrence<br />

The grid cell is landslide<br />

occurrence<br />

If NDVI < –0.0131<br />

Nonoccurrence<br />

2006/10/20 knowledge rules<br />

VI > 44<br />

else<br />

Nonoccurrence<br />

The grid cell is landslide<br />

occurrence<br />

Figure 3. Binary classifications for landslide occurrence/nonoccurrence (knowledge rules from<br />

different scenarios).<br />

x max(i) + x min(j)<br />

= k (6)<br />

2<br />

{ x < k d = 1<br />

d =<br />

(7)<br />

x > k d = 2<br />

where x max(i) is the largest slope value of landslide nonoccurrence and x min(j) is the smallest<br />

slope value of landslide occurrence (Table 1 and Figure 1). Slope angle, topographical<br />

elevation, shape of slope, and slope aspect maps are obtained from the DEM of the study 315<br />

area. We select 30 training samples of 13 nonoccurrences and 17 occurrences from the<br />

DEM data. The slopes of these samples are also calculated by DEM data. The data are<br />

listed in the Table 1 and their locations are shown in Figure 4a. We also plotted them on<br />

Figure 4b to visualize their distributions. In addition, the minimum slope of occurrence<br />

samples can also provide similar result to determine a threshold. However, it may have a 320<br />

slight difference by comparing it with K-means method. Applying the K-means method,<br />

the threshold of slope value is 23.01 ◦ (k value) and this value is then used to create the<br />

EKTP (in Step 6).<br />

Step 4: Applying four vegetation indicators to enhance the classification<br />

In our study, we used G (Green), R (Red), IR (Infrared), and some vegetation indicators 325<br />

to improve the understanding of the relations between vegetation condition and landslide.<br />

However, the selection of vegetation indicators became a quite obstacle for grass and timberland.<br />

Bannari et al. (1995), Wan et al. (2009), and Wan (2010b) studied the effective<br />

vegetation factors as the following:<br />

(1) Normalized Difference Vegetation Index 330<br />

A common index for the density of plant growth is the Normalized Difference<br />

Vegetation Index (NDVI). Written mathematically, the formula is<br />

NDVI = NIR − R<br />

NIR + R<br />

(8)


International Journal of Geographical Information Science 11<br />

0 3 6 12 18 24<br />

km<br />

(a)<br />

40<br />

Nonoccurrence<br />

Occurrence<br />

35<br />

30<br />

Slope<br />

25<br />

20<br />

15<br />

10<br />

0 200 400 600<br />

Elevation (m)<br />

(b)<br />

800 1000 1200<br />

Figure 4. Locations and thresholds for training samples (a) detect landslide locations by image<br />

subtraction; (b) search the thresholds form DEM database of training samples.<br />

where NIR is near-infrared band, and R is the red band. The values for NDVI are<br />

obtained from SPOT image. The range of this value is [–1,1].<br />

(2) Band Ratio 335<br />

Band Ratio (BR) means dividing the pixel values in one band by the corresponding<br />

pixel value in a second band. Differences between the spectral reflectance curves


12 S. Wan et al.<br />

Table 1. The selected data from DEM (Decision = 1 landslide;<br />

Decision = 0 non-landslide).<br />

Elevation (m) Slope ( ◦ ) Decision<br />

441.99 35.39 1<br />

1010.42 13.64 0<br />

617.72 33.42 1<br />

261.48 21.8 0<br />

552.03 33.19 1<br />

991.69 16.75 0<br />

297.79 34.29 1<br />

732.69 28.92 1<br />

990.08 27.46 1<br />

263.07 26.19 1<br />

712.45 16.81 0<br />

1030.75 16.44 0<br />

438.58 20.55 0<br />

1131.79 33.44 1<br />

515.9 32.78 1<br />

752.36 18.02 0<br />

889.58 15.79 0<br />

1146.76 24.01 1<br />

698.71 36.4 1<br />

1019.64 35.07 1<br />

754.05 25.28 1<br />

235.39 18.02 0<br />

619.76 20.47 0<br />

1137.64 35.18 1<br />

361.79 16.5 0<br />

787.45 25.01 1<br />

539.32 34.18 1<br />

229.35 25.01 1<br />

849.88 19.61 0<br />

971.79 20.34 0<br />

of surface types can be elicited. The BR is a technique used in digital image processing<br />

to increase the contrast between selected features and superfluous features.<br />

It is normally used to identify vegetation concentrations. It can be formulated as 340<br />

BR = IR/R (9)<br />

The BR indicates that the relationship holds for both shadowed and directly<br />

illuminated pixels in an image.<br />

(3) Square Root of Band Ratio<br />

Some of the vegetation responses cannot be verified merely by BR. Thus, the<br />

square root of BR is generated and can be formulated as 345<br />

SQBR = √ IR/R (10)<br />

Square Root of Band Ratio (SQBR) will reduce the value of BR, thus the advantage<br />

of using SQBR is that some dark green vegetation (such as foliage forest vs.<br />

coniferous forest) can be easily identified.


International Journal of Geographical Information Science 13<br />

(4) Vegetation Index<br />

A vast majority of the natural surfaces are equally as bright in the red and near- 350<br />

infrared part of the spectrum with the remarkable exception of green vegetation<br />

(Lin et al. 2004). An index of vegetation (Equation (11)) can be used to distinguish<br />

green vegetation from natural surfaces:<br />

VI = NIR − R (11)<br />

Also, the values of Vegetation Index (VI) for each sample were obtained by SPOT<br />

image. 355<br />

From the ancillary indicators adopted by Equations (1)–(4), the binary classification<br />

(occurrence/nonoccurrence) can be of help to search for the governing factors of the indicators.<br />

We found the most dominant factors of binary classification are VI and NDVI based<br />

on DRS (Figure 3). It was found that VI and NDVI are enhanced indices for detecting the<br />

location of the landslide. As for the other aspects, the thresholds values from Figure 3 360<br />

are slightly different (such as –0.0125 vs. –0.0131 of NDVI and 20 vs. 44 of VI). Different<br />

atmospheric condition can result in different qualities of Figure. According to the predominant<br />

scientific understanding, the occurrence of landslide may be induced by the geological<br />

and morphological factors. In our study, the sampling data were collected after a typhoon<br />

struck through the area (July 2006). Similar outcomes can also be found in Wan et al. 365<br />

(2009). The pore water ratio or surface runoff may govern landslide occurrence. Vegetation<br />

conditions will affect the pore water ratio or surface runoff. This is the reason why in this<br />

study the VI and NDVI are taken as the dominant factors in this study.<br />

AQ10<br />

Step 5: Applying DRS method for vegetation condition on landslide map<br />

Vegetation cover and some other indicators are also considered as the environmental fac- 370<br />

tors. In practical analysis, many vegetation factors/indicators in the real world may require<br />

a data-driven/data mining method to handle the GIS landslide database. This study uses the<br />

DRS to handle the multi-category of land-cover classification, which occur in the field of<br />

RS (Lei et al. 2008, Wan et al. 2010b). An optimal solution of knowledge extraction can be<br />

applied to discover their characteristics, which may involve uncertainties and imprecision. 375<br />

There are three procedures involved in DRS analysis. In the first stage, the development<br />

of an ‘Information Table’ is required for the description of the characteristic attributes<br />

(inputs). The Information Table consists of attributes and decisions. In this table, a relation<br />

in a Multi-attribute set is displayed. Then, all the attributes must be clustered into appropriate<br />

classes to construct a ‘Decision Attribute.’ The final step is to attain the Cores and 380<br />

Reducts of the data attributes. Attribute reduction should be done in such a way that the<br />

reduced set of attributes provides the same quality of approximation as the original set of<br />

attributes. The minimal subsets of attributes that discern all equivalent classes of the relation<br />

which is discernable by the entire set of attributes are called Reducts. The core is the<br />

common part of all Reducts. Then, the classification process is started by applying the core 385<br />

factors.<br />

The task of classification is to find the appropriate classes. The Rough Set provides<br />

a perceivable solution by discretizing the chaotic information (Sinha and Laplante 2004).<br />

Through DRS analysis, the ‘Cores’ can be recognized as a series of key attributes that influence<br />

the decisions. The rest of the attributes not influencing the decisions can be eliminated. 390<br />

In addition, the knowledge rules for image classification can be established simultaneously.<br />

AQ11


14 S. Wan et al.<br />

The finding of attribute distinctive points aids in the search for the category classes in the<br />

satellites image. In this study, the field of image processing consists of a format with graylevel<br />

images (gray color coded on eight-bit data). We propose a new concept to deal with<br />

the uncertainty in the classification problem of image data. Image data from two different 395<br />

dates are used to attain the rules of landslides through DRS. The first step is to use the data<br />

from Figure 3 to carry out binary classification of landslides occurrences and nonoccurrences.<br />

The training data are shown in Table 2 and the outcomes of rules (tree structure)<br />

are shown on Figure 5. It should be noted that the DRS tree structure is integrated by the<br />

concept of dimensional reduction, threshold, and criterion rules. In each of the branch, it 400<br />

contains a segmentation point (threshold). Also, each branch is extracted features from a<br />

datasheet (dimensional reduction process). It also can be represented as a criterion rule.<br />

To implement the DRS, we apply the program of RSES (see Figure 1). In this software,<br />

there is a ‘classified table to decomposition tree’ function. The major outcome of this 405<br />

function is to classify the training data into a tree structure. This is very suitable for multicategory<br />

image analysis. This process follows the Boolean operation and the tree structure<br />

is generated automatically. The Boolean operation was described by Wan et al. 2008. In<br />

Figure 5a, the first three dominant factors are G, R, and IR which are extracted through the<br />

program in descending order of importance. The water-body, timberland, sensitive area, 410<br />

rock, stream, and landslide can be classified under different subdivision through different<br />

segmentation values of R and IR. Also, if a training data with some of the attributes fall<br />

into the range of G > 61 and R > 41, the classification results are listed in the omission<br />

error.<br />

The difference between a DRS-tree and decision tree is quite interesting. In our study, 415<br />

the band G can roughly divides the multi-category into three sections. For instance, when G<br />

varies from 44 to 61, there are three possibilities of classes which can be detected. That is,<br />

sensitive areas, rock, and grass can be searched in this range. If any of the target categories<br />

are required to be found in the Thematic Map, the DRS-tree is a better choice for scientists<br />

or engineers than the decision tree. In addition, we apply the image subtraction method to 420<br />

attain the knowledge rules (see Figure 5b). From the generation of Figure 5b, the variation<br />

of this area can be observed. It is important to note that some of the categories cannot be<br />

detected through this process, such as rock and stream. This implies that rock area and<br />

stream do not change; hence, they cannot be detected.<br />

Step 6: Create EKTP 425<br />

An ES is usually designed to provide solutions to a given problem. More specifically, the ES<br />

can record and provide the decision reached from the problem-solving point of view, providing<br />

not only the answer, but also the specific process by which the answer was reached.<br />

Therefore, the classification can be resolved by the observation of an expert’s experience in<br />

the field. In this study, some obstacles in classifying multi-categories are encountered (such 430<br />

as mixed-up categories). Among the classification methods, such as Maximum-Likelihood<br />

estimation, PCA, Neural Network Classifiers, and Decision Trees, they have been widely<br />

used to classify land covers from the variety of satellite images (Lei et al. 2008). Supervised<br />

and unsupervised classification techniques are two major methodologies that can be used<br />

to interpret remotely sensed data. For binary classification, it seems to work perfectly (Lei 435<br />

et al. 2008, Wan et al. 2010a), unfortunately, multi-categories are unfeasible. Accordingly,<br />

in our study, many categories such as water-body versus stream and rock versus landslide<br />

are very hard to identify based on supervised and unsupervised classification approaches.<br />

That is, if a pair of sampling data is under different categories but has similar attributes, it


International Journal of Geographical Information Science 15<br />

Table 2. Training sample of 2006/07/29.<br />

G R IR BR NDVI SQBR VI Category<br />

1 32.2432 17.6486 18.3514 1.0415 0.0194 1.0201 0.7027 Water<br />

2 33.3000 19.4333 19.3000 0.9934 −0.0038 0.9964 −0.1333 Water<br />

3 36.5000 20.2692 18.8077 0.9288 −0.0376 0.9634 −1.4615 Water<br />

4 35.2258 19.6129 17.6129 0.8986 −0.0542 0.9475 −2.0000 Water<br />

5 61.9333 40.8000 29.8667 0.7203 −0.1655 0.8472 −10.9333 Stream<br />

6 69.5500 47.5000 29.3500 0.6314 −0.2290 0.7929 −18.1500 Stream<br />

7 71.0123 63.7284 54.4568 0.8612 −0.0761 0.9273 −9.2716 Stream<br />

8 74.2099 67.7407 57.9506 0.8605 −0.0764 0.9269 −9.7901 Stream<br />

9 71.0893 62.9196 56.5446 0.8988 −0.0538 0.9478 −6.3750 Stream<br />

10 53.5326 46.9022 114.5430 2.4448 0.4190 1.5633 67.6413 Grass<br />

11 51.7244 45.6603 111.2240 2.4465 0.4181 1.5631 65.5641 Grass<br />

12 48.1000 40.2687 106.3630 2.6468 0.4495 1.6255 66.0938 Grass<br />

13 52.8734 47.2405 115.0510 2.4414 0.4173 1.5615 67.8101 Grass<br />

14 54.4196 48.5536 107.3480 2.2143 0.3767 1.4874 58.7946 Grass<br />

15 52.9057 47.3019 105.7360 2.2397 0.3803 1.4951 58.4340 Grass<br />

16 34.5103 24.4330 90.7680 3.7194 0.5757 1.9281 66.3351 Timberland<br />

17 35.3588 24.5954 99.5878 4.0531 0.6020 2.0111 74.9924 Timberland<br />

18 34.6389 24.7556 93.9000 3.7961 0.5820 1.9474 69.1444 Timberland<br />

19 37.8268 27.2333 97.5214 3.5852 0.5633 1.8930 70.2882 Timberland<br />

20 35.7157 25.2658 101.2470 4.0059 0.5987 1.9998 75.9816 Timberland<br />

21 37.1503 26.8627 117.6110 4.3785 0.6259 2.0901 90.7484 Timberland<br />

22 37.2773 26.3594 95.1699 3.6163 0.5651 1.9002 68.8105 Timberland<br />

23 36.3588 25.4286 99.8106 3.9339 0.5924 1.9813 74.3821 Timberland<br />

24 43.1456 29.1050 155.1650 5.3353 0.6837 2.3090 126.0600 Timberland<br />

25 39.4106 26.8261 145.8310 5.4430 0.6886 2.3317 119.0050 Timberland<br />

26 40.5619 28.3761 133.3780 4.7023 0.6472 2.1662 105.0020 Timberland<br />

27 40.7632 27.0327 169.2750 6.2710 0.7243 2.5032 142.2420 Timberland<br />

28 35.0954 24.8905 94.1767 3.7875 0.5811 1.9451 69.2862 Timberland<br />

29 35.1462 24.6522 91.5455 3.7183 0.5754 1.9276 66.8933 Timberland<br />

30 38.1421 26.6667 110.2020 4.1305 0.6087 2.0309 83.5355 Timberland<br />

31 85.0000 83.1250 81.1250 0.9764 −0.0124 0.9879 −2.0000 Landslide<br />

32 74.2424 77.0303 71.8788 0.9336 −0.0345 0.9661 −5.1515 Landslide<br />

33 72.8621 73.6207 67.5345 0.9197 −0.0424 0.9587 −6.0862 Landslide<br />

34 68.7143 69.6071 74.7143 1.0826 0.0366 1.0389 5.1071 Landslide<br />

35 64.0625 66.5625 77.4375 1.1742 0.0762 1.0816 10.8750 Landslide<br />

36 68.8182 68.9091 88.9091 1.3746 0.1184 1.1508 20.0000 Landslide<br />

37 47.1333 39.9000 74.9000 1.8865 0.3050 1.3723 35.0000 Sensitive<br />

38 49.2857 41.5714 77.8571 1.8762 0.3038 1.3692 36.2857 Sensitive<br />

39 50.3750 43.8750 89.8750 2.0643 0.3448 1.4353 46.0000 Sensitive<br />

40 45.4043 34.7872 148.5320 4.2810 0.6201 2.0679 113.7450 Sensitive<br />

41 48.3261 37.1522 153.3260 4.1312 0.6099 2.0322 116.1740 Sensitive<br />

42 60.4327 55.5481 51.0000 0.9198 −0.0421 0.9589 −4.5481 Rock<br />

43 54.9667 53.1778 56.7444 1.0713 0.0324 1.0340 3.5667 Rock<br />

44 58.8929 56.7143 57.2321 1.0138 0.0044 1.0056 0.5179 Rock<br />

45 52.8684 49.4605 52.0921 1.0533 0.0238 1.0252 2.6316 Rock<br />

Notes: G, Green; R, Red; IR, Infrared; BR, Band Ratio; NDVI, Normalized Difference Vegetation Index; SQBR,<br />

Squared Root of Band Ratio; VI, Vegetation Index. Binary classification assigned all the landslide samples as 1<br />

and others as 2.<br />

is impossible to classify through supervised or unsupervised techniques. Alternatively, the 440<br />

best solution is to create a translation platform.<br />

Figure 6 presents the EKTP. All the easily mixed-up categories from the database are<br />

loaded into this platform. They fall into the appropriate categories automatically. We select


16 S. Wan et al.<br />

Grid-Cell IR < 63<br />

Waterbody<br />

Grid-Cell G < 44<br />

Grid-Cell R < 41<br />

Grid-Cell IR ≥ 63<br />

Timberland<br />

Grid-Cell R ≥ 41<br />

Timberland<br />

Grid-Cell R < 41<br />

Sensitivity area<br />

Grid-Cell G<br />

is between 44 ~ 61<br />

Grid-Cell IR < 63<br />

Rock<br />

Grid-Cell R ≥ 41<br />

Grid-Cell IR<br />

is between 63 ~ 90<br />

Sensitivity area<br />

Grid-Cell IR > 90<br />

Grass<br />

Grid-Cell IR < 63<br />

Stream<br />

Grid-Cell G > 61<br />

Grid-Cell R < 41<br />

Grid-Cell IR ≥ 63<br />

Landslide<br />

Grid-Cell R ≥ 41<br />

(a)<br />

Omission error<br />

Grid-Cell G diff < –33<br />

Landslide<br />

R diff < –19<br />

Grass<br />

G diff<br />

between –33 ~ –21<br />

R diff<br />

between –19 ~ –14<br />

IR diff < 23<br />

IR diff ≥ 23<br />

Sensitivity area<br />

Grass<br />

R diff ≥ –14<br />

Sensitivity area<br />

R diff < –14<br />

Sensitivity area<br />

G diff ≥ –21<br />

R diff ≥ –14<br />

(b)<br />

Timberland<br />

Figure 5. Rules from DRS to derive various categories: (a) using single period from Table 2; (b)<br />

using image subtraction.<br />

the slope value of 23 ◦ (the threshold form K-means). The stream and water-body are classified<br />

very well by following the platform rules. A fundamental question arise: why can 445<br />

EKTP improve the accuracy of the multi-category classification? The main idea comes<br />

from some of the easily confused classes (similar image band with similar vegetation<br />

AQ12


International Journal of Geographical Information Science 17<br />

Stream<br />

Waterbody<br />

yes<br />

slope > 23°<br />

no<br />

Rock<br />

Original<br />

class<br />

Rock<br />

Landslide<br />

yes<br />

slope > 23°<br />

no<br />

Stream<br />

Original<br />

class<br />

Figure 6.<br />

Expert knowledge translation platform (EKTP).<br />

indices) which have different geomorphological conditions (such as slopes and elevations).<br />

That is, they are usually located at different hillslope. Also, the rock area and landslide are<br />

also successfully identified. 450<br />

Step 7: Discussion on accuracy<br />

In the past, many parametric studies have attempted to improve our understanding on<br />

potential landslide areas. However, there is not any agreement in the literature as to what<br />

factors should be included in the determination of landslide susceptibility areas. Depending<br />

on the characteristics of the study area, at least three factors including topography, vege- 455<br />

tation, and geomorphological conditions have been considered in the analysis. In detailed<br />

studies, however, the number of factors can be increased depending on the characteristics<br />

of the study area. In general, our study considers a site located on (1) bare land without any<br />

vegetation cover; (2) with steep slope; and (3) with relative high elevation surrounding by<br />

lower elevation. It should be noted that our development of EKTP is only suitable for the 460<br />

detection of landslide area. Many specific purposes of EKTP can be generated to resolve<br />

other detections of any landscape.<br />

Observing the tree structure in Figure 5b, the algorithm can be formulated mathematically.<br />

The basic spirit of Data Mining is to extract a small amount of samples to present<br />

the behavior of a population. Through this concept, we only randomly select 45 points 465<br />

(see Table 2) to train the classification rules. The number of testing data is 250. The<br />

accuracy of three easily mixed-up categories on DRS-tree is listed on the left sides of<br />

Tables 3 and 4. The outcomes of classification accuracy are greatly improved by using<br />

Table 3. 2006/07/29 error matrix of three easily mixed-up categories. AQ25<br />

Method<br />

category<br />

DRS producer<br />

accuracy<br />

User<br />

accuracy<br />

DRS+EKPT<br />

producer accuracy<br />

User<br />

accuracy<br />

Stream 45.00 75.00 90.00 100.00<br />

Landslide 97.50 88.64 97.50 97.50<br />

Rock 60.00 50.00 90.00 90.00<br />

Note: DRS, Discrete Rough Sets; EKPT, Expert Knowledge Translation Platform.


18 S. Wan et al.<br />

Table 4.<br />

2006/10/20 error matrix of three easily mixed-up categories.<br />

Method<br />

category<br />

DRS producer<br />

accuracy<br />

User<br />

accuracy<br />

DRS+EKPT<br />

producer accuracy<br />

User<br />

accuracy<br />

Stream 45.00 69.23 80.00 100.00<br />

Landslide 97.50 88.64 100.00 97.56<br />

Rock 50.00 83.33 100.00 76.92<br />

Note: DRS, Discrete Rough Sets; EKPT, Expert Knowledge Translation Platform.<br />

the EKTP concept. Since it is quite difficult to determine the grid-cell only by using the<br />

given image data, the geomorphological conditions should also be considered. For instance, 470<br />

such conditions facilitate to distinguish easily confused grid-cell such as stream and waterbody.<br />

Specifically, the streams are only detected 45% of the time by following the attributes<br />

image data. However, when the ancillary tool of EKTP is applied, the accuracy is enhanced<br />

to 90% (See Table 3.) The improvement of accuracy in different periods is also verified as<br />

seen in Table 4. We also calculate the overall accuracy and Kappa as listed in Table 5. 475<br />

To take a closer look at the classification results, we select an example area (located in<br />

Figure 7a) to demonstrate how efficiently the EKTP works. Figure 7b applies the DRS for<br />

image classification and Figure 7c applies DRS+EKTP. Apparently, two different parts of<br />

the improvements are made:<br />

Part A: It is shown in Figure 7b. This is the area of the well-known Chia-Yang landslide. 480<br />

The discrepancy is shown in Figure 7c. Chia-Yang landslide is occurred with relatively<br />

shallow slides on very steep slopes in stiff soils and jointed rock. However,<br />

as seen in Figure 7b, it looks like a water-body (lake) with a stream on it. However,<br />

with the ancillary tool of EKTP, the landslide area appears manifestly different<br />

(Table 6).<br />

485<br />

Part B: This is a riverbed area. Applying DRS, most of the stream (riverbed) areas<br />

are judged as rocks. Fortunately, the ancillary tool (EKTP) renders information to<br />

distinguish rocks and streams.<br />

AQ13<br />

5. Validation on proposed method<br />

As part of this study, we carry out a pixel-based with MLC for simple comparison. The 490<br />

main process of MLC is to generate statistical decision rules that examine the probability<br />

function of a pixel for each of the classes, and assign the pixel to the class with the highest<br />

probability. For instance, Figure 8a shows the overall outcomes based on the DRS+EKTP<br />

classification model of the National Park. The overall accuracy rate of Figure 8a is 95.6%.<br />

Table 5.<br />

Overall accuracy and Kappa for different scenarios.<br />

Method DRS DRS+EKPT<br />

Period Overall accuracy Kappa Overall accuracy Kappa<br />

2006/07/29 92.00 88.50 95.60 93.70<br />

2006/10/20 91.20 87.73 96.40 94.99<br />

Note: DRS, Discrete Rough Sets; EKPT, Expert Knowledge Translation Platform.


International Journal of Geographical Information Science 19<br />

(a)<br />

N<br />

W<br />

E<br />

S<br />

0 750 1,500 3,000 4,500 6,000<br />

m<br />

(b)<br />

Waterbody<br />

Stream<br />

Grassland<br />

Timberland<br />

Landslide area<br />

Potential landslide area<br />

Bare land<br />

N<br />

W<br />

E<br />

S<br />

0 750 1,500 3,000 4,500 6,000<br />

m<br />

(c)<br />

Waterbody<br />

Stream<br />

Grassland<br />

Timberland<br />

Landslide area<br />

Potential landslide area<br />

Bare land<br />

Figure 7.<br />

Study area of (a) locations; (b) Discrete Rough Sets; and (c) Discrete Rough Sets+EKTP.


20 S. Wan et al.<br />

Table 6. Error matrix of 2006/07/29.<br />

Ground truth Stream Grass Timber Potential Bare User<br />

Class outcomes Water (rock) land land Landslide ∗ landslide ∗ land Total accuracy<br />

Water 10 0 0 0 0 0 0 10 100.00<br />

Stream (rock) 0 18 0 0 0 0 0 18 100.00<br />

Grass land 0 0 22 0 1 0 0 23 95.65<br />

Timber land 0 0 0 123 0 2 0 125 98.40<br />

Landslide 0 1 0 0 39 0 0 40 97.50<br />

Potential landslide 0 0 3 2 0 18 1 24 75.00<br />

Bare Land 0 1 0 0 0 0 9 10 90.00<br />

Total 10 20 25 125 40 20 10 250<br />

Producer accuracy 100.00 90.00 88.00 98.40 97.50 90.00 90.00<br />

Overall accuracy = 95.60% Kappa = 93.70%<br />

Note: ∗ Landslide is the location of pixel has already occur landslide; potential landslide is the pixel is located at<br />

steep slopes (>23.1 ◦ ).<br />

Figure 8b presents the overall classification outcomes of MLC. When taking a closer obser- 495<br />

vation on Figure 8b, a great deal of omission errors and commission errors occurs in the<br />

western part of the National Park. On the other hand, salt and pepper effect is very serious<br />

when using the MLC approach. Also, the overall accuracy of Figure 8b is 81.5%. The red<br />

pattern in the Figure 8a represents landslide/potential landslide area which is displayed as<br />

grassland and timberland in Figure 8b. We also calculate the error matrix of MLC. Table 7 500<br />

shows the error matrix of MLC for the entire area. The category of landslide is most likely<br />

confused with the category of rock beside the stream. Also, applying MLC, the categories<br />

of landslide and potential landslide area cannot be distinguished effectively. This is because<br />

the potential landslide area is defined as an area without vegetation protection on a steep<br />

slope. Therefore, a large area in the west which should be categorized as landslide has been 505<br />

omitted.<br />

6. Summary and conclusion<br />

With the progress of spatial data survey techniques in geosciences, massive data or information<br />

can be easily collected and monitored. This makes the spatial database complicated.<br />

Thus, the analysis of variables influencing landslides requires a more efficient method in 510<br />

order to present a Thematic Map. As for other aspects, the assessment of multi-category<br />

by means of RS image data encounters many obstacles. There is also a notable difference<br />

between classifiers in regard to the outcomes of classification. Hence, some of the<br />

researchers have begun to study these classifiers. Previous related studies have focused<br />

on the SVM to handle these fields of problems (such as Wan and Lei 2009). However, 515<br />

unfortunately, the SVM approaches involved a ‘black box model’ which makes it quite difficult<br />

to display the explicit knowledge rules. Alternatively, we proposed a different concept<br />

through Data Mining approaches: DRS approach integrated with the Rough Set tree analysis.<br />

Also, we studied the variation among various categories of landforms and land covers.<br />

Specifically, our prominent effort is to establish the relations among different categories 520<br />

for an observed landslide occurrence.<br />

In the past, multi-category classifiers of RS data are very difficult to develop. In our<br />

study, we integrate RS data and DEM data in an expert decision system to greatly enhance<br />

the accuracy of the Landslide Expert System. This study offers four major contributions:


International Journal of Geographical Information Science 21<br />

N<br />

W<br />

E<br />

S<br />

Waterbody<br />

Stream<br />

Grassland<br />

Timberland<br />

Landslide area<br />

Potential landslide area<br />

Bare land<br />

0 3,250 6,500 13,000 19,500 26,000<br />

m<br />

(a)<br />

N<br />

W<br />

E<br />

S<br />

0 3,250 6,500 13,000 19,500 26,000<br />

m<br />

Waterbody<br />

Stream<br />

Grassland<br />

Timberland<br />

Original landslide area<br />

New landslide area<br />

Potential landslide area<br />

Bare land<br />

(b)<br />

Figure 8. Comparison on validation model and DRS+EKPT model (a) the classification model<br />

of DRS+EKPT (overall accuracy = 95.60%); (b) validation model of MLC ∗ (overall accuracy<br />

= 81.48%).<br />

Notes: (1) the original landslide area (yellow) is same as Figure 8a, (2) the new detected landslide<br />

area (purple) is additional by MLC, (3) the potential landslide area (red) cannot be detected by MLC. ∗<br />

The new landslide area is determined by MLC.<br />

(1) DRS is a prominent classifier. It extracts the core factors with their thresholds. 525<br />

(2) The DEM data are successfully employed to our ES to analyze the instability of<br />

soil in the study area. Also, the thresholds for landslides of the study samples are<br />

found.


22 S. Wan et al.<br />

Table 7.<br />

Validation model (MLC): producer accuracy and user accuracy.<br />

Round truth Stream Grass Timber Bare User<br />

Class outcomes Water (rock) land land Landslide land Total accuracy<br />

Water 9 1 0 0 0 0 10 90.00<br />

Stream (rock) 0 21 0 0 29 0 50 42.00<br />

Grass land 0 1 22 0 3 1 27 81.48<br />

Timber land 0 0 6 117 1 0 124 94.35<br />

Landslide 0 4 2 2 39 0 47 82.98<br />

Bare land 0 0 0 0 0 12 12 100<br />

Total 9 27 30 119 72 13 270<br />

Producer accuracy 100.00 77.78 73.33 98.32 54.17 92.31<br />

Overall accuracy = 81.48% Kappa = 74.24%<br />

(3) The ancillary tools of EKTP can enhance the classification on the category of<br />

streams from 45% to 80% (see Table 4). Moreover, the category of rock is enhanced 530<br />

approximately from 50% to 100% (see Table 4). According to our observation, the<br />

categories of rock and streams are hard to determination through satellite image<br />

data. Fortunately, we improve the overall classification accuracy by approximately<br />

3% to 5% through EKTP+DRS model.<br />

(4) The Rough Set tree is successfully applied to multi-category image classification. 535<br />

Then, the rules of each category are found rationally. Results show that different<br />

categories may be detected in the first dominant factor with various ranges (for<br />

instance, in our study, it is in band G). This will help researchers to decrease the<br />

time-consuming work of targeting categories on complex images.<br />

Acknowledgement 540<br />

We express our gratitude for National Science Council 98-2625-M-275-001 and 100-2410-H-275-<br />

009 sponsored this work.<br />

References<br />

Ahlqvist, O., 2005. Using uncertain conceptual spaces to translate between land cover categories.<br />

International Journal Geographical Information Science, 19, 831–857. 545<br />

Ahlqvist, O., Keukelaar, J., and Oukbir, K., 2000. Rough classification and accuracy assessment.<br />

International Journal Geographical Information Science, 14, 475–496.<br />

Ahlqvist, O., Keukelaar, J., and Oukbir, K., 2003. Rough and fuzzy geographical data integration.<br />

International Journal Geographical Information Science, 17, 223–234.<br />

Ananthanarayana, V.S., Narasimha Murty, M., and Subramanian, D.K., 2003. Tree structure for 550<br />

efficient data mining using rough sets. Pattern Recognition Letters, 24 (6), 851–862.<br />

Baeza, C. and Corominas, J., 2001. Assessment of shallow landslide susceptibility by means of<br />

multivariate statistical techniques. Earth Surface Processes and Landforms, 26, 1251–1263.<br />

Bannari, A., et al., 1995. A review of vegetation indices. Remote Sensing Reviews, 13, 95– 120.<br />

Darken, C. and Moody, J., 1990. Fast adaptive k-means clustering: some empirical results, 555<br />

International Joint Conference on Neural Networks, 2, 233–238.<br />

Deogun, J.S., Raghavan, V.V., and Sever, H., 1994. Rough set based classification methods for<br />

extended decision tables. In: Proceedings of International Workshop on Rough Sets and Soft<br />

Computing, 302–309.<br />

Floris, M., et al., 2004. Modelling of landslide- triggering factors – a case study in the Northern 560<br />

Apennines, Italy. Lecture Notes in Earth Sciences, 104, 745–753.<br />

Goh, C. and Law, R., 2003. Incorporating the rough sets theory into travel demand analysis. Tourism<br />

Management, 24, 511–517.<br />

AQ14<br />

AQ15<br />

AQ16


International Journal of Geographical Information Science 23<br />

Gooch, M.J. and Chandler, J.H., 1998. Optimization of strategy parameters used in automated digital<br />

elevation model generation. In: D.N.M. Donoghue, ed. International archives of photogrammetry 565<br />

and remote sensing. Cambridge: ISPRS, Data Integration: Systems and Techniques, XXXII (2),<br />

88–95.<br />

Hall, D.L., 1992. Mathematical techniques in multisensor data fusion. Boston, MA: Artech House.<br />

Hong, Y., et al., 2005. Quantitative assessment on the influence of heavy rainfall on the crystalline<br />

schist landslide by monitoring system-case study on Zentoku landslide. Japan Landslides, 2 (1), 570<br />

31–41.<br />

Huete, A.R., 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment, 25,<br />

53–70.<br />

Katzberg, J.D. and Ziarko, W., 1993. Variable precision rough sets with asymmetric bounds. In:<br />

Proceedings of International Workshop on Knowledge Discovery, 163–190.<br />

575<br />

Khazai, B. and Sitar, N., 2003. Evaluation of factors controlling earthquake-induced landslides<br />

caused by Chi-Chi earthquake and comparison with the Northridge and Loma Prieta events.<br />

Engineering Geology, 71, 79–95.<br />

Kirshnaiash, P.R. and Kanal, L.N., eds., 1982. Classification, pattern recognition, and reduction of<br />

dimensionality. In: Handbook of statistics. Amsterdam: North-Holland. 580<br />

Lee, S. and Choi, J. 2004. Landslide susceptibility mapping using GIS and the weight-ofevidence<br />

model. International Journal of Geographical Information Science, 18 (8), 789–814.<br />

Available from: http://www.informaworld.com/smpp/title~db=all~content=t713599799~tab=<br />

issueslist~branches=18 - v18<br />

Lee, S., et al., 2004. Determination and application of the weights for landslide susceptibility 585<br />

mapping using an artificial neural network. Engineering Geology, 71, 289–302.<br />

Lei, T.C., Wan, S., and Chou, T.Y., 2008. The comparison of PCA and discrete rough set method<br />

for feature extraction of remote sensing image classification – a case study on rice classification,<br />

Taiwan. Computuer Geosciences, 12 (1), 1–14.<br />

Leung, Y., et al., 2007. A rough set approach to the discovery of classification rules in spatial data. 590<br />

International Journal of Geographical Information Science, 21 (9), 1033–1058.<br />

Lin, C.Y., et al., 2004. Vegetation recovery assessment at the Jou-Jou Mountain landslide area caused<br />

by the 921 earthquake in Central Taiwan. Ecological Modelling, 176, 75–81.<br />

Lin, W.T., 2008. Earthquake-induced landslide hazard monitoring and assessment using SOM<br />

and PROMETHEE techniques: a case study at the Chiufenershan area in Central Taiwan. 595<br />

International Journal of Geographical Information Science, 22 (9), 995–1012. Available<br />

from: http://www.informaworld.com/smpp/title~db=all~content=t713599799~tab=issueslist~<br />

branches=22 - v22<br />

Lin, W.T., Lin, C.Y., and Chou, W.C., 2006. Assessment of vegetation recovery and soil erosion<br />

at landslides caused by a catastrophic earthquake: a case study in Central Taiwan. Ecological 600<br />

Engineering, 28, 79–89.<br />

Lin, W.T., et al., 2007. WinBasin: using improved algorithms and GIS technique for automated watershed<br />

modeling analysis from digital elevation models. International Journal of Geographical<br />

Information Science, 22 (1), 47–69. Available from: http://www.informaworld.com/smpp/<br />

906147682-96899860/title~db=all~content=t713599799~tab=issueslist~branches=22 - v22 605<br />

Maleta, J.-P., et al., 2005. Triggering conditions and mobility of debris flows associated to complex<br />

earthflows. Geomorphology, 66, 215–235.<br />

Mayoraz, F., Cornu, T., and Vuillet, L., 1996. Using neural networks to predict slope movements. In:<br />

Proceedings VII International Symposium on Landslides, 1 June 1966 Trondheim. Rotterdam:<br />

Balkema, 295–300. 610<br />

Nguyen, H.S. and Skowron, A., 1995. Quantization of Real Values Attributes, Rough set and Boolean<br />

Reasoning Approaches. In: Proceeding of the Second Joint Conference on Information Sciences,<br />

October 1995 Wrightsville Beach, NC, 34–37.<br />

Nguyen, S.H. and Nguyen, H.S., 1998a. Pattern extraction from data. Fundamenta Informaticae, 34<br />

(1–2), 129–144. 615<br />

Nguyen, S.H. and Nguyen, H.S., 1998b. Pattern extraction from data. In: Proceedings of the<br />

Conference of Information Processing and Management of Uncertainty in Knowledge-Based<br />

Systems IPMU’98, July 1998 Paris, France, 1346–1353.<br />

Pawlak, Z., 1982. Rough sets. International Journal of Information Computer Science, 11, 341–356.<br />

Pawlak, Z., 1991. Rough sets, theoretical aspects of reasoning about data. Boston, MA: Kluwer 620<br />

Academic Publishers.<br />

AQ17<br />

AQ18<br />

AQ19<br />

AQ20<br />

AQ21<br />

AQ22<br />

AQ23


24 S. Wan et al.<br />

Pawlak, Z., et al., 1995. Rough sets. Communications of the ACM, 38 (11), 89–95.<br />

Pradhan, B. and Lee, S., 2010. Landslide susceptibility assessment and factor effect analysis: back<br />

propagation artificial neural networks and their comparison with frequency ratio and bivariate<br />

logistic regression modelling. Environmental Modelling & Software, 25 (6), 747–759. 625<br />

Rand, W.M., 1971. Objective criteria for the evaluation of clustering methods. Journal of the<br />

American Statistical Association, 66, 846–850.<br />

Ridd, M.K. and Liu, J., 1998. A comparison of four algorithms for change detection in an urban<br />

environment. Remote Sensing of Environment, 63, 95–100.<br />

RSES 2.2 User’s Guide, 2005. Warsaw University. Available from: http:://logic.mimuw.edu.pl/»rses 630<br />

Słowiński, R., Soniewickia, B., and Wëeglarza, J., 1994. DSS for multi objective project scheduling.<br />

European Journal of Operational Research, 79 (2), 220–229.<br />

Stefanowski, J., 1998. ‘On rough set based approaches to induction of decision rules’. Polkowski,<br />

Lech and Skowron, Andrzej, Rough sets in knowledge discovery 1: methodology and applications.<br />

Heidelberg: Physica-Verlag, 500–529. 635<br />

Van Westen, C.J., Rengers, N., and Soeters, R., 2003. Use of geomorphological information in<br />

indirect landslide susceptibility assessment. Natural Hazards, 30 (3), 399–419.<br />

Walaczak, B. and Massart, D.L., 1999. Rough sets theory. Chemometrics and Intelligent Laboratory<br />

Systems, 47, 1–16.<br />

Wan, S., 2009. A spatial decision support system for extracting the core factors and thresholds for 640<br />

landslide susceptibility map. Engineering Geology, 108, 237–251.<br />

Wan, S. and Lei, T.C., 2009. A knowledge-based decision support system to analyze the Debris-Flow<br />

problems at Chen Yu-Lan River, Taiwan. Knowledge-Based Systems, 22, 580–588.<br />

Wan, S., Lei, T.C., and Chou, T.Y. 2010a. An enhanced supervised spatial decision support system of<br />

image classification: consideration on the ancillary information of paddy rice area. International 645<br />

Journal of Geographical Information Science. DOI: 10.1080/13658810802587709.<br />

Wan, S., Lei, T.C., and Chou, T.Y. 2010b. A novel data mining technique of analysis and classification<br />

for landslide problems. Natural Hazards, 52, 211–230.<br />

Wan, S., et al., 2008. The knowledge rules of debris flow event: a case study for investigation ChenYu<br />

Lan River, Taiwan. Engineering Geology, 98, 102–114. 650<br />

Wei, L.-Y., Huang, C.-L., and Chen, C.H., 2005. Data mining of the GAW14 simulated data using<br />

rough set theory and tree-based methods. BMC Genetics, 6 (1), 133.<br />

Yesilnacar, E. and Topal, T., 2005. Landslide susceptibility mapping: a comparison of logistic regression<br />

and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering<br />

Geology, 79, 251–266. 655<br />

Yilmaz, I., 2009. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial<br />

neural networks and their comparison: a case study from Kat landslides (Tokat–Turkey).<br />

Computers & Geosciences, 35 (6), 1125–1138.<br />

Zerger, A., 2002. Examining GIS decision utility for natural hazard risk modelling. Environmental<br />

Modelling & Software, 17 (3), 287–29. 660<br />

Ziarko, W., 1991. The discovery, analysis and representation of data dependencies in databases. In:<br />

G. Piatetsky-Shapiro and W.J. Frawley, eds. Knowledge discovery in databases. Cambridge, MA:<br />

American Association for Artificial Intelligence Press/Massatchuset Institute of Technology<br />

Press, 177–195.<br />

AQ24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!