23.03.2013 Views

Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6. CONCLUSIONS<br />

This report should have made evident <strong>the</strong> feasibility <strong>of</strong> mechanization in text processing. Although quality considerations<br />

were not discussed, it should be mentioned that most authors are satisfied by <strong>the</strong> results obtained with <strong>the</strong>ir methods. Thus<br />

automation in <strong>the</strong> indexing process should go ahead, particularly as o<strong>the</strong>r obstacles for automation in <strong>the</strong> IR process have also<br />

been largely overcome. These were primarily due to <strong>the</strong> lack <strong>of</strong> powerful programming languages and appropriate s<strong>of</strong>tware<br />

ei<strong>the</strong>r in <strong>the</strong> retrieval process, or in data base management. However, full advantage <strong>of</strong> indexing by computer assistance can only<br />

be taken when <strong>the</strong> textual information is in machine-readable form. As <strong>the</strong>re is a trend towards automatic type-setting<br />

techniques, this condition may be fulfilled in <strong>the</strong> near future. O<strong>the</strong>r techniques to transfer textual information in<br />

machine-readable form without re-writing and coding are already within <strong>the</strong> state-<strong>of</strong>-<strong>the</strong>-art.<br />

Automation in storage <strong>of</strong> information can <strong>the</strong>n be achieved by using fully automatic and semi-automatic indexing<br />

techniques.<br />

The advantages <strong>of</strong> fully automatic indexing techniques are well known:<br />

— consistency is attained as <strong>the</strong> computer assigns index terms directly from <strong>the</strong> natural language text <strong>of</strong> <strong>the</strong> document, applying<br />

<strong>the</strong> same algorithm for each document; (In human indexing <strong>the</strong> indexer makes a separate judgement for each document.)<br />

— simplicity <strong>of</strong> re-indexing, which is important, because a scientific library is a living thing and classification schemes must<br />

always change according to ei<strong>the</strong>r <strong>the</strong> aims <strong>of</strong> <strong>the</strong> library, or developments in science;<br />

— accuracy, which is guaranteed by <strong>the</strong> ability <strong>of</strong> <strong>the</strong> computer to select, transfer and re-arrange data reliably without making<br />

typographical errors;<br />

— economy, achieved by large-scale processing and computing speed;<br />

— facility for editing.<br />

From <strong>the</strong> quality point <strong>of</strong> view some fully automatic indexing techniques can be considered to have reached already <strong>the</strong><br />

same level as purely intellectual indexing, at least in a production environment (37(1970), 100(1972)].<br />

However, automatic indexing might not be fully satisfactory if high standard indexing is required. In general, one may state<br />

<strong>the</strong> better <strong>the</strong> index, <strong>the</strong> less intellectual effort is needed to search for information. Hence, <strong>the</strong> quality required for an index<br />

depends strictly upon <strong>the</strong> effort an average index user is willing to spend on retrieving information.<br />

A considerable amount <strong>of</strong> research is still required in order to have <strong>the</strong> machine do it well and efficiently especially in fully<br />

automatic text processing such as linguistic text analysis. Most approaches apply surrogates (statistical analysis) in order to<br />

overcome lack <strong>of</strong> knowledge in linguistics. Thus, semi-automatic computer controlled techniques are <strong>of</strong>ten preferred. (They do<br />

play a useful role in linguistic research also.) Machine assisted indexing can be thought <strong>of</strong> as a simulation <strong>of</strong> a manual process<br />

combined with some <strong>of</strong> <strong>the</strong> advantages <strong>of</strong> machine processing, such as accuracy, economy and facility <strong>of</strong> editing. For obvious<br />

reasons semi-automatic indexing is not fully consistent and re-indexing <strong>of</strong> <strong>the</strong> entire data base is costly. Economy will be<br />

achieved in a long-range period only since a better index can be produced which is <strong>the</strong> assumption <strong>of</strong> <strong>the</strong> success and<br />

effectiveness <strong>of</strong> any information system.<br />

Machine-aided indexing can also take full advantage <strong>of</strong> <strong>the</strong> preferences <strong>of</strong> intellectual indexing. Indexers (cited from<br />

[103(1969)]:<br />

— are able to make discriminations as to <strong>the</strong> relative importance <strong>of</strong> technical concepts as <strong>the</strong>y appear in an abstract or<br />

document,<br />

— have access to <strong>the</strong> entire document and<br />

— can go beyond <strong>the</strong> document itself to reference books, to consultation with experts, or o<strong>the</strong>r sources as deemed appropriate,<br />

to aid in properly indexing <strong>the</strong> document at hand,<br />

— can apply inductive reasoning to formulate and index concepts which are implied by <strong>the</strong> document but not expressly stated<br />

(assignment indexing),<br />

— become familiar with <strong>the</strong> requirements <strong>of</strong> <strong>the</strong> users <strong>of</strong> <strong>the</strong> system by participating in search request analysis, search strategy<br />

formulation and search screening.<br />

Thus, semi-automatic indexing will be preferable to purely intellectual indexing. On <strong>the</strong> o<strong>the</strong>r hand it is believed that fully<br />

automatic indexing techniques can be developed in <strong>the</strong> near future to satisfy requirements. Machine-assisted methods may help<br />

in achieving this aim.<br />

Acknowledgements<br />

/ wish to express my gratitude to Miss G. Pozzi. Head <strong>of</strong> <strong>the</strong> European Scientific Information Processing Centre. C.E.T.I.S., <strong>of</strong> <strong>the</strong> Commission <strong>of</strong><br />

<strong>the</strong> European Community, who gave her support to <strong>the</strong> compilation <strong>of</strong> this report.<br />

In particular, I am grateful to Pr<strong>of</strong>. F. W. Lancaster and my colleague W. Kolar for fruitful discussions and some useful suggestions.<br />

IMSI but not least, I acknowledge <strong>the</strong> helpful assistance that I received from <strong>the</strong> Library staff, <strong>the</strong> Publication and Typing Office <strong>of</strong> <strong>the</strong> EURATOM-<br />

Joint Research Center in Ispra.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!