27.03.2013 Views

BUKU ABSTRAK - Universiti Putra Malaysia

BUKU ABSTRAK - Universiti Putra Malaysia

BUKU ABSTRAK - Universiti Putra Malaysia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Science, Technology & Engineering<br />

Robust Multicollinearity Diagnostic Measures based on Robust Coefficient<br />

Determination<br />

Assoc. Prof. Dr. Habshah Midi<br />

Arezoo Bagheri<br />

Institute of Mathematical Research, University <strong>Putra</strong> <strong>Malaysia</strong>,<br />

43400 UPM Serdang, Selangor, <strong>Malaysia</strong>.<br />

+603-8946 6876; habshahmidi@gmail.com<br />

In this study, we propose the Robust Variance Inflation Factors (RVIFs) in the detection of multicollinearity<br />

due to high leverage points or extreme outliers in the X-direction. The computation of RVIFs is based on robust<br />

coefficient determinations which we called RR2 (MM) and RR2 (GM (DRGP)). RR2 (MM) is coefficient<br />

determination of high breakdown point and efficient MM-estimators whereas RR2 (GM (DRGP)) has been defined<br />

through an improved GM-estimators. GM (DRGP) is a GM-estimator with the main aim as downweighting<br />

high leverage points with large residuals. It has been introduced by employing S-estimators as initial values,<br />

Diagnostic Robust Generalized Potential based on MVE [DRGP (MVE)] as initial weight function and an<br />

Iteratively Reweighted Least Squares (IRLS) has been utilised as a convergence method. The numerical results<br />

and Monte Carlo simulation study indicate that the proposed RVIFs are very resistant to the high leverage points<br />

and unable to detect the multicollinearity in the data especially RR2 [GM (DRGP)]. Hence, this indicates that the<br />

high leverage points are the source of multicollinearity.<br />

Keywords: Coefficient determination, generalised M-estimators, high leverage points, variance inflation factor<br />

Extraction of information from the web pages becomes very important because the massive and increasing<br />

amount of diverse semi-structured information sources in the Internet that are available to users, and the variety<br />

of web pages that make the process of information extraction from the web a challenging problem and the<br />

ever-growing large area of research. Many researchers work on extraction of information from web pages in<br />

different domains such as business intelligence and products. Most of the previous works are limited due to the<br />

facts that their approaches are not able to handle (i) web pages with the genuine and non genuine web tables and<br />

(ii) the attributes that appear under different names but refer to the same entity (i.e., synonym). In this project,<br />

we proposed a strategy for extracting and analysing information from semi-structured web data source which<br />

consists of two approaches. An approach for extracting and classifying information from various web pages<br />

and an approach for analysing and simplifying the extracted and classified information. Two analyses have been<br />

conducted on four different domains. From the first analysis, the following can be concluded: during the process<br />

of extracting information from various web pages, it is important to handle the genuine and non genuine web<br />

tables as well as the synonyms as ignoring them might cause one to miss the information that is relevant to the<br />

user. While in the second analysis, for the Nokia products, our proposed approaches achieved increment in F and<br />

P as well as decrement in R as compared to that of Ashraf et al (2008). These analyses show that the proposed<br />

strategy which includes two approaches is able to extract and analyse information from various web pages with<br />

the genuine and non genuine web tables as well as handling the issue of synonyms.<br />

Keywords: HTML web pages, information extraction<br />

Extracting Information from Semi-structured Web Pages<br />

Assoc. Prof. Dr. Hamidah Ibrahim<br />

Mahmoud Sh. Al-Hassan, Ali Amer Alwan, Lili Nurliyana Abdullah and Aida Mustapha<br />

Faculty of Computer Science and Information Technology, University <strong>Putra</strong> <strong>Malaysia</strong>,<br />

43400 UPM Serdang, Selangor, <strong>Malaysia</strong>.<br />

+603-8943 6510; hamidah@fsktm.upm.edu.my<br />

194

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!