23.07.2013 Views

Text Mining Post Project Reviews to Improve the Construction ...

Text Mining Post Project Reviews to Improve the Construction ...

Text Mining Post Project Reviews to Improve the Construction ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

processes of two construction companies. The left hand<br />

column relates <strong>to</strong> <strong>the</strong> high level on<strong>to</strong>logy developed in Figure<br />

3.<br />

General outcome<br />

Financial, purchase,<br />

Contract period variation<br />

TABLE I. AN EXAMPLE OF CONTENT OF DOMAIN EXPERTISE<br />

How did <strong>the</strong> Job Go? Profit or Loss? Extension of project period?<br />

Were resources effectively allocated? Quality of service,<br />

environmental issues etc.,<br />

Time Any Delay? Reason for delay such as wea<strong>the</strong>r, changes or<br />

unforeseen events. What is required <strong>to</strong> finish <strong>the</strong> work on time?<br />

Etc.,<br />

Quality Any defect during project? Redesigning or repair, any specific<br />

problem, leaking, faults, errors, mistakes, damage, snags or<br />

Communication<br />

hindsights? How we achieved high quality? Was quality combined<br />

with speed?<br />

Was <strong>the</strong>re clear communication between management, client and<br />

<strong>the</strong> project team? How was <strong>the</strong> interaction with design team? What<br />

did we do well? What went wrong? What major changes arouse<br />

from meeting with clients? Was <strong>the</strong> cus<strong>to</strong>mer’s feedback, positive<br />

or negative?<br />

Building Issues involved with building such as drains, floors, cladding,<br />

beams, frames, ceiling, glazing, drains, partitions, walls lifts etc.,<br />

Security issues Any material <strong>the</strong>ft or loss and measures <strong>to</strong> prevent <strong>the</strong>m. Which<br />

Material issue/plant issue<br />

security methods (CCTV, movement sensors or security guards)<br />

worked well?<br />

How Issues such as waste, delivery of materials, disposal and<br />

recycling are dealt with. Any problems that occurred during<br />

purchasing or procurement stage?<br />

Labour: Team/sub<br />

contrac<strong>to</strong>r/contrac<strong>to</strong>r/<br />

supplier/consultant<br />

<strong>Project</strong> stages: pre contract,<br />

lease, negotiation, agreement,<br />

site survey, estimation,<br />

design, tender, contract,<br />

<strong>Project</strong> stages:<br />

Planning<br />

Any problems or concerns related <strong>to</strong> designer, supplier, engineers,<br />

consultants, surveyors, architect and builder, which effected <strong>the</strong><br />

CPSC. Did designer and contrac<strong>to</strong>r work well <strong>to</strong>ge<strong>the</strong>r?<br />

What sort of practices were adopted through <strong>the</strong> various stages of<br />

<strong>the</strong> project? What were good or bad practices used? Were <strong>the</strong><br />

documents clearly communicated?<br />

Any comments for or from <strong>the</strong> planners? Was <strong>the</strong> sequence shown<br />

on programme correct? Was <strong>the</strong> duration reasonable?<br />

<strong>Project</strong> stage: Procurement Was <strong>the</strong>re any problem in getting subcontrac<strong>to</strong>rs on time?<br />

Any comments for or from <strong>the</strong> buyers?<br />

Health and Safety: Are <strong>the</strong>re any reported accidents or incidents? What are <strong>the</strong> causes?<br />

accidents, incidents, hazards How did we perform over all? Any health and safety lessons <strong>to</strong> be<br />

learnt from this project?<br />

Changes: plan, schedule, What changes were made in <strong>the</strong> project and why? How could <strong>the</strong>y<br />

contract, order deadline,<br />

design, personnel, client<br />

or specification change etc.,<br />

be avoided in future projects?<br />

Mistakes/Errors Were <strong>the</strong>re any notable mistakes made on <strong>the</strong> site? What was <strong>the</strong><br />

cause? How can <strong>the</strong>se be corrected?<br />

Waste/environmental issues Were any particular operations wasteful on time or material? Could<br />

<strong>the</strong> operation be improved by changing <strong>the</strong> material or method?<br />

Innovation Did any innovative or interesting ideas emerge during <strong>the</strong> whole<br />

project?<br />

<strong>Text</strong> <strong>Mining</strong> Expertise: There are several commercial<br />

products available for text mining. Table 2 lists <strong>the</strong> various<br />

functionalities performed by different software packages as<br />

follows.<br />

TABLE 2 : COMPARATIVE STUDY OF TEXT MINING SOFTWARES<br />

Product FE TBN SR TC Clus. Sum. Asc In.Vs<br />

1 Smart<br />

discovery<br />

X X X X<br />

2 <strong>Text</strong><br />

1.0<br />

miner X X X X<br />

3 Au<strong>to</strong>nomy X X X X X<br />

4 SAS<br />

miner<br />

<strong>Text</strong> X X X X<br />

5 Clear forest X X X X X X<br />

6 Polyanalyst<br />

5.0/<br />

<strong>Text</strong><br />

Analytics2.3<br />

X X X X X X<br />

7 Intelligent<br />

Miner<br />

X X X X<br />

8 Retrieval<br />

ware<br />

X X X X<br />

FE: Feature extraction, TBN: <strong>Text</strong> Based Navigation, SR: Search and<br />

Retrieval, TC: <strong>Text</strong> Categorization, Clus: Clustering, Sum: Summarization,<br />

Asc: Association, InV; Information visualization.<br />

Examples of <strong>the</strong> application of several of <strong>the</strong>se TM operations<br />

on PPR documentation will now be given <strong>to</strong> fur<strong>the</strong>r illustrate<br />

<strong>the</strong> potential benefits which may be obtained and facilitate<br />

better knowledge reuse and exploitation from PPR processes.<br />

In <strong>the</strong> next section, an illustrative example has been presented<br />

showing how Polyanalyst [17] could be used <strong>to</strong> address <strong>the</strong><br />

six exploitation and reuse requirements of PPRs.<br />

6 A Case Study of UK <strong>Construction</strong><br />

<strong>Project</strong>s<br />

This example will consider PPR documentation collected<br />

during <strong>the</strong> last three years from 2 construction companies.<br />

Although <strong>the</strong> style of <strong>the</strong> reports from <strong>the</strong> different companies<br />

varies, each report contains textual narration and a description<br />

of <strong>the</strong> project, with <strong>the</strong> review divided under different<br />

headings and textual information providing <strong>the</strong> lessons<br />

learned during <strong>the</strong> operation of <strong>the</strong> CPSC.<br />

6.1 KDT and <strong>Text</strong> <strong>Mining</strong> Process on PPRs<br />

6.1.1 Transformation, Loading and Pre-processing<br />

25 PPRs with page lengths varying from 15-30 pages were<br />

selected and imported <strong>to</strong> an Excel file. The file was <strong>the</strong>n<br />

loaded in<strong>to</strong> <strong>the</strong> PolyAnalyst software system. Pre-processing<br />

is mainly done <strong>to</strong> reduce information overload and generate<br />

metadata. <strong>Text</strong>ual data pre-processing steps include removal<br />

of “unwanted and non informative” words and stemming.<br />

6.1.2 <strong>Text</strong>-<strong>Mining</strong> of PPRs<br />

After pre-processing, <strong>the</strong> PPRs are ready for TM, which<br />

involves using various <strong>to</strong>ols and techniques <strong>to</strong> extract<br />

patterns, trends and useful information. The following<br />

subsections will discuss <strong>the</strong> six major TM tasks (listed in<br />

section V) <strong>to</strong> address current problems commonly identified<br />

in PPRs.<br />

Summarization and generation of statistics for CPSC:<br />

Generally <strong>the</strong> summary statistics and summarization <strong>to</strong>ols of<br />

TM are used for this purpose. They are discussed as follows:<br />

Summary Statistics: Basic statistics can be generated for <strong>the</strong><br />

PPR text at various stages of <strong>the</strong> TM <strong>to</strong> compare its attributes,<br />

key words, or generated rules. In <strong>the</strong> present example, <strong>the</strong>se<br />

statistics help in finding <strong>the</strong> frequencies of frequently used<br />

words in <strong>the</strong> PPR reports. Thus, summary statistics are useful<br />

in identifying which reports are most important in <strong>the</strong> context<br />

of a particular issue of <strong>the</strong> supply chain.<br />

Summarization: Summarization techniques condense <strong>the</strong><br />

PPRs <strong>to</strong> a fraction of <strong>the</strong>ir original size yet still retain <strong>the</strong><br />

significant content of <strong>the</strong> PPRs in <strong>the</strong> summary.<br />

Summarization techniques determine <strong>the</strong> semantic weight of<br />

sentences written in <strong>the</strong> PPRs and only those sentences whose<br />

semantic weight is higher than <strong>the</strong> chosen threshold are kept.<br />

The size of <strong>the</strong> summary can be modified by changing <strong>the</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!