28.11.2012 Views

Design and development of a concept-based multi ... - Citeseer

Design and development of a concept-based multi ... - Citeseer

Design and development of a concept-based multi ... - Citeseer

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Shiyan Ou, Christopher S.G. Khoo <strong>and</strong> Dion H. Goh<br />

Table 5<br />

Average precision, recall <strong>and</strong> F-measure for the system-extracted research <strong>concept</strong>s from the three combinations <strong>of</strong><br />

sections in the 50 structured abstracts<br />

All five Research objectives Research objectives + research<br />

Importance level sections (Section 2) results (Sections 2 <strong>and</strong> 4)<br />

For the most important Precision (%) 20.36 31.62 23.60<br />

<strong>concept</strong>s Recall (%) 92.26 76.06 87.37<br />

F-measure (%) 33.15 43.91* 36.80<br />

For the more important Precision (%) 31.02 44.51 34.28<br />

<strong>concept</strong>s Recall (%) 90.93 59.31 78.81<br />

F-measure (%) 45.94 50.27* 47.35<br />

For all the important <strong>concept</strong>s Precision (%) 46.18 59.05 49.76<br />

Recall (%) 89.93 46.65 75.64<br />

F-measure (%) 60.44 51.64 59.40<br />

Bold figures indicate the highest values at each importance level.<br />

The more important <strong>concept</strong>s include the most important <strong>concept</strong>s.<br />

The important <strong>concept</strong>s include the more important <strong>concept</strong>s.<br />

Asterisk indicates that the figure is significantly higher than other figures in the same row.<br />

abstract, <strong>and</strong> from these to identify the more important <strong>concept</strong>s <strong>and</strong> then the most important <strong>concept</strong>s,<br />

according to the focus <strong>of</strong> the dissertation research. Meanwhile, we also used the system to<br />

extract research <strong>concept</strong>s automatically from the following three combinations <strong>of</strong> sections:<br />

• from research objectives (section 2) only;<br />

• from research objectives + research results (sections 2 <strong>and</strong> 4);<br />

• from the whole text (i.e. all five sections).<br />

The system-extracted <strong>concept</strong>s under the above three combinations were compared against the<br />

human-extracted <strong>concept</strong>s at three importance levels. The average precision, recall <strong>and</strong> F-measure<br />

for the system-extracted research <strong>concept</strong>s from the three combinations <strong>of</strong> sections in the 50 structured<br />

abstracts are given in Table 5.<br />

As shown in Table 5, considering all the important <strong>concept</strong>s, the F-measures obtained from the<br />

whole text (60.4%) <strong>and</strong> from research objectives + research results (59.4%) were similar, both <strong>of</strong><br />

which were higher than that from research objectives only (51.6%). This suggests that the important<br />

<strong>concept</strong>s were not focused only in research objectives, but scattered in the whole text. Therefore,<br />

the discourse parsing may not be helpful for identifying the important <strong>concept</strong>s. For the more<br />

important <strong>concept</strong>s, the F-measure obtained from research objectives (50.2%) was significantly<br />

higher than those from research objectives + research results (47.4%) <strong>and</strong> from the whole text<br />

(45.9%). This suggests that the research objectives section places a bit more emphasis on the more<br />

important <strong>concept</strong>s. For the most important <strong>concept</strong>s, the F-measure obtained from research objectives<br />

(43.9%) was significantly higher than those from research objectives + research results (36.8%)<br />

<strong>and</strong> from the whole text (33.2%). This suggests that the research objectives section places more<br />

emphasis on the most important <strong>concept</strong>s. Moreover, the F-measure obtained from research objectives<br />

+ research results (36.8%) was significantly higher than that from the whole text (33.2%). This<br />

suggests that the research results section also places more emphasis on the most important <strong>concept</strong>s<br />

than the other three sections (i.e. background, research methods <strong>and</strong> concluding remarks). In conclusion,<br />

discourse parsing was helpful in identifying the more important <strong>and</strong> the most important<br />

<strong>concept</strong>s in structured abstracts. The more <strong>and</strong> most important <strong>concept</strong>s are more likely to be considered<br />

as research <strong>concept</strong>s.<br />

In addition, the other three kinds <strong>of</strong> information – relationships, contextual relations <strong>and</strong><br />

research methods – were extracted manually from the whole text <strong>of</strong> the 50 abstracts by two <strong>of</strong> the<br />

authors <strong>of</strong> this paper, who are deemed to be ‘experts’. Experts are needed to do this coding because<br />

these three kinds <strong>of</strong> information are difficult to identify without substantial knowledge <strong>and</strong> training.<br />

Journal <strong>of</strong> Information Science, XX (X) 2007, pp. 1–19 © CILIP, DOI: 10.1177/0165551507084630 13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!