Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato
Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato
Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6. CONCLUSIONS<br />
This report should have made evident <strong>the</strong> feasibility <strong>of</strong> mechanization in text processing. Although quality considerations<br />
were not discussed, it should be mentioned that most authors are satisfied by <strong>the</strong> results obtained with <strong>the</strong>ir methods. Thus<br />
automation in <strong>the</strong> indexing process should go ahead, particularly as o<strong>the</strong>r obstacles for automation in <strong>the</strong> IR process have also<br />
been largely overcome. These were primarily due to <strong>the</strong> lack <strong>of</strong> powerful programming languages and appropriate s<strong>of</strong>tware<br />
ei<strong>the</strong>r in <strong>the</strong> retrieval process, or in data base management. However, full advantage <strong>of</strong> indexing by computer assistance can only<br />
be taken when <strong>the</strong> textual information is in machine-readable form. As <strong>the</strong>re is a trend towards automatic type-setting<br />
techniques, this condition may be fulfilled in <strong>the</strong> near future. O<strong>the</strong>r techniques to transfer textual information in<br />
machine-readable form without re-writing and coding are already within <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art.<br />
Automation in storage <strong>of</strong> information can <strong>the</strong>n be achieved by using fully automatic and semi-automatic indexing<br />
techniques.<br />
The advantages <strong>of</strong> fully automatic indexing techniques are well known:<br />
— consistency is attained as <strong>the</strong> computer assigns index terms directly from <strong>the</strong> natural language text <strong>of</strong> <strong>the</strong> document, applying<br />
<strong>the</strong> same algorithm for each document; (In human indexing <strong>the</strong> indexer makes a separate judgement for each document.)<br />
— simplicity <strong>of</strong> re-indexing, which is important, because a scientific library is a living thing and classification schemes must<br />
always change according to ei<strong>the</strong>r <strong>the</strong> aims <strong>of</strong> <strong>the</strong> library, or developments in science;<br />
— accuracy, which is guaranteed by <strong>the</strong> ability <strong>of</strong> <strong>the</strong> computer to select, transfer and re-arrange data reliably without making<br />
typographical errors;<br />
— economy, achieved by large-scale processing and computing speed;<br />
— facility for editing.<br />
From <strong>the</strong> quality point <strong>of</strong> view some fully automatic indexing techniques can be considered to have reached already <strong>the</strong><br />
same level as purely intellectual indexing, at least in a production environment (37(1970), 100(1972)].<br />
However, automatic indexing might not be fully satisfactory if high standard indexing is required. In general, one may state<br />
<strong>the</strong> better <strong>the</strong> index, <strong>the</strong> less intellectual effort is needed to search for information. Hence, <strong>the</strong> quality required for an index<br />
depends strictly upon <strong>the</strong> effort an average index user is willing to spend on retrieving information.<br />
A considerable amount <strong>of</strong> research is still required in order to have <strong>the</strong> machine do it well and efficiently especially in fully<br />
automatic text processing such as linguistic text analysis. Most approaches apply surrogates (statistical analysis) in order to<br />
overcome lack <strong>of</strong> knowledge in linguistics. Thus, semi-automatic computer controlled techniques are <strong>of</strong>ten preferred. (They do<br />
play a useful role in linguistic research also.) Machine assisted indexing can be thought <strong>of</strong> as a simulation <strong>of</strong> a manual process<br />
combined with some <strong>of</strong> <strong>the</strong> advantages <strong>of</strong> machine processing, such as accuracy, economy and facility <strong>of</strong> editing. For obvious<br />
reasons semi-automatic indexing is not fully consistent and re-indexing <strong>of</strong> <strong>the</strong> entire data base is costly. Economy will be<br />
achieved in a long-range period only since a better index can be produced which is <strong>the</strong> assumption <strong>of</strong> <strong>the</strong> success and<br />
effectiveness <strong>of</strong> any information system.<br />
Machine-aided indexing can also take full advantage <strong>of</strong> <strong>the</strong> preferences <strong>of</strong> intellectual indexing. Indexers (cited from<br />
[103(1969)]:<br />
— are able to make discriminations as to <strong>the</strong> relative importance <strong>of</strong> technical concepts as <strong>the</strong>y appear in an abstract or<br />
document,<br />
— have access to <strong>the</strong> entire document and<br />
— can go beyond <strong>the</strong> document itself to reference books, to consultation with experts, or o<strong>the</strong>r sources as deemed appropriate,<br />
to aid in properly indexing <strong>the</strong> document at hand,<br />
— can apply inductive reasoning to formulate and index concepts which are implied by <strong>the</strong> document but not expressly stated<br />
(assignment indexing),<br />
— become familiar with <strong>the</strong> requirements <strong>of</strong> <strong>the</strong> users <strong>of</strong> <strong>the</strong> system by participating in search request analysis, search strategy<br />
formulation and search screening.<br />
Thus, semi-automatic indexing will be preferable to purely intellectual indexing. On <strong>the</strong> o<strong>the</strong>r hand it is believed that fully<br />
automatic indexing techniques can be developed in <strong>the</strong> near future to satisfy requirements. Machine-assisted methods may help<br />
in achieving this aim.<br />
Acknowledgements<br />
/ wish to express my gratitude to Miss G. Pozzi. Head <strong>of</strong> <strong>the</strong> European Scientific Information Processing Centre. C.E.T.I.S., <strong>of</strong> <strong>the</strong> Commission <strong>of</strong><br />
<strong>the</strong> European Community, who gave her support to <strong>the</strong> compilation <strong>of</strong> this report.<br />
In particular, I am grateful to Pr<strong>of</strong>. F. W. Lancaster and my colleague W. Kolar for fruitful discussions and some useful suggestions.<br />
IMSI but not least, I acknowledge <strong>the</strong> helpful assistance that I received from <strong>the</strong> Library staff, <strong>the</strong> Publication and Typing Office <strong>of</strong> <strong>the</strong> EURATOM-<br />
Joint Research Center in Ispra.