Design and development of a concept-based multi ... - Citeseer
Design and development of a concept-based multi ... - Citeseer
Design and development of a concept-based multi ... - Citeseer
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Shiyan Ou, Christopher S.G. Khoo <strong>and</strong> Dion H. Goh<br />
Table 5<br />
Average precision, recall <strong>and</strong> F-measure for the system-extracted research <strong>concept</strong>s from the three combinations <strong>of</strong><br />
sections in the 50 structured abstracts<br />
All five Research objectives Research objectives + research<br />
Importance level sections (Section 2) results (Sections 2 <strong>and</strong> 4)<br />
For the most important Precision (%) 20.36 31.62 23.60<br />
<strong>concept</strong>s Recall (%) 92.26 76.06 87.37<br />
F-measure (%) 33.15 43.91* 36.80<br />
For the more important Precision (%) 31.02 44.51 34.28<br />
<strong>concept</strong>s Recall (%) 90.93 59.31 78.81<br />
F-measure (%) 45.94 50.27* 47.35<br />
For all the important <strong>concept</strong>s Precision (%) 46.18 59.05 49.76<br />
Recall (%) 89.93 46.65 75.64<br />
F-measure (%) 60.44 51.64 59.40<br />
Bold figures indicate the highest values at each importance level.<br />
The more important <strong>concept</strong>s include the most important <strong>concept</strong>s.<br />
The important <strong>concept</strong>s include the more important <strong>concept</strong>s.<br />
Asterisk indicates that the figure is significantly higher than other figures in the same row.<br />
abstract, <strong>and</strong> from these to identify the more important <strong>concept</strong>s <strong>and</strong> then the most important <strong>concept</strong>s,<br />
according to the focus <strong>of</strong> the dissertation research. Meanwhile, we also used the system to<br />
extract research <strong>concept</strong>s automatically from the following three combinations <strong>of</strong> sections:<br />
• from research objectives (section 2) only;<br />
• from research objectives + research results (sections 2 <strong>and</strong> 4);<br />
• from the whole text (i.e. all five sections).<br />
The system-extracted <strong>concept</strong>s under the above three combinations were compared against the<br />
human-extracted <strong>concept</strong>s at three importance levels. The average precision, recall <strong>and</strong> F-measure<br />
for the system-extracted research <strong>concept</strong>s from the three combinations <strong>of</strong> sections in the 50 structured<br />
abstracts are given in Table 5.<br />
As shown in Table 5, considering all the important <strong>concept</strong>s, the F-measures obtained from the<br />
whole text (60.4%) <strong>and</strong> from research objectives + research results (59.4%) were similar, both <strong>of</strong><br />
which were higher than that from research objectives only (51.6%). This suggests that the important<br />
<strong>concept</strong>s were not focused only in research objectives, but scattered in the whole text. Therefore,<br />
the discourse parsing may not be helpful for identifying the important <strong>concept</strong>s. For the more<br />
important <strong>concept</strong>s, the F-measure obtained from research objectives (50.2%) was significantly<br />
higher than those from research objectives + research results (47.4%) <strong>and</strong> from the whole text<br />
(45.9%). This suggests that the research objectives section places a bit more emphasis on the more<br />
important <strong>concept</strong>s. For the most important <strong>concept</strong>s, the F-measure obtained from research objectives<br />
(43.9%) was significantly higher than those from research objectives + research results (36.8%)<br />
<strong>and</strong> from the whole text (33.2%). This suggests that the research objectives section places more<br />
emphasis on the most important <strong>concept</strong>s. Moreover, the F-measure obtained from research objectives<br />
+ research results (36.8%) was significantly higher than that from the whole text (33.2%). This<br />
suggests that the research results section also places more emphasis on the most important <strong>concept</strong>s<br />
than the other three sections (i.e. background, research methods <strong>and</strong> concluding remarks). In conclusion,<br />
discourse parsing was helpful in identifying the more important <strong>and</strong> the most important<br />
<strong>concept</strong>s in structured abstracts. The more <strong>and</strong> most important <strong>concept</strong>s are more likely to be considered<br />
as research <strong>concept</strong>s.<br />
In addition, the other three kinds <strong>of</strong> information – relationships, contextual relations <strong>and</strong><br />
research methods – were extracted manually from the whole text <strong>of</strong> the 50 abstracts by two <strong>of</strong> the<br />
authors <strong>of</strong> this paper, who are deemed to be ‘experts’. Experts are needed to do this coding because<br />
these three kinds <strong>of</strong> information are difficult to identify without substantial knowledge <strong>and</strong> training.<br />
Journal <strong>of</strong> Information Science, XX (X) 2007, pp. 1–19 © CILIP, DOI: 10.1177/0165551507084630 13