Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

Summary 

SEMI-AUTOMATIC INDEXING 

State of the Art 

by 

Hermann Fangmeyer 

Diplom-Mathematiker 

C.C.R.- EURATOM/CETIS 

21020 ISPRA (VA) Italy 

After an intensive period of research in information science from the late fifties to the late sixties, a lull can now be 

observed, in which people seem to be engaged in developing operational IR-systems at a relatively low level of sophistication. In 

these systems the bulk of the work is still done by man and the methodology applied differs only in unimportant details. They 

are often evaluated by applying short-sighted economical criteria without taking into account prospective user needs, which 

tend towards increasingly specific and exhaustive information. Since re-designing a system and re-compiling manually built data 

bases is a prohibitively expensive operation, it is believed that most of these systems will not survive for very long. 

Indexing is a fundamental part of IRS, since it directly affects the quality. Thus the most advanced indexing techniques 

which tend to automatic full-text processing, should be chosen. 

Computer assisted indexing has great advantages over manual methods. Especially preferable are those techniques which 

require the text to be indexed in machine-readable form, since they allow more automised indexing processes to be integrated 

easily, when available, by reindexing the collection at relatively low costs. 

1. INTRODUCTION 

Semi-automatic indexing has not been strictly defined; there exist as many interpretations as synonyms: The intervention of 

a computer may save the indexer from having to perform routine work, or the indexer may help the computer to make 

decisions where no sophisticated algorithms are available. Hence, semi-automatic indexing methods must be arranged within the 

wide spectrum between manual indexing with a minimum of computer assistance, (e.g. the New York Times System in which the 

indexer is working on a video terminal on which the document to be indexed is displayed using a closed-circuit television system 

without having access to the data bases [98(1972)])and quasi automatic indexing with a minimum of human intervention. This 

can be imagined as a process in which the indexer changes a threshold value, depending upon the number of index terms to be 

automatically assigned. 

The terms computer assisted indexing, computer- or machine-aided indexing, man-machine indexing, computer aids to 

indexing, computer based - or computerized indexing and similar terms, will be treated as synonyms in this report. 

Semi-automatic indexing should be distinguished from automatic indexing which is defined by Stevens [ 110(p.3), (1970)] as 

the use of machines to extract or assign index terms without human intervention, once programs or procedural rules have been 

established. 

Another,contradictory definition for automatic indexing can be derived from Caras' [24(1968)] statement: 

The primary aim in automatic indexing is to derive index terms directly from the text with a minimum of human intervention. 

This definition provides for an intellectual operation. 

For Maron [75(1961)] the term 'automatic indexing' denotes the problem of deciding in a mechanical way to which 

category (subject or field of knowledge) a given document belongs. 

Thus, in his opinion, it concerns the problem of automatic recognition of the contents of a given document. From these 

non-uniform definitions, no precise definition for machine-assisted indexing can be derived. The author therefore decided upon 

the following: 

The indexing process will be called semi-automatic within this report if it consists of a combination of the intellectual efforts of 

scientific subject specialists and advanced computer techniques. 

Thus, semi-automatic indexing is restricted here to those machine-aided methods which require the qualified intervention of 

both a computer and an indexer. Furthermore, metfiods which cannot be applied to an operational system are also excluded. 

Semi-automatic indexing is divided into conversational and symbiotic indexing in order to distinguish between indexing by 

continuous contact with the computer and indexing by integration of the computer in the indexing process for the purpose of 

performing certain clerical tasks respectively. 

This report comprises the state-of-the-art up to Dec. 1972 in 

- semi-automatic derivative indexing, 

- machine-aided assignment indexing (including automatic assignment indexing techniques, which are based on previously 

created manual or semi-automatic indexing aids). 

- semi-automatic dictionary construction, since the indexing techniques often involve the setting up of thesauri. 

(Since dictionary construction is often closely linked to indexing and often employs similar methods, exact distinctions cannot 

always be made.) 

Evaluation, in the sense of measuring the retrieval efficiency of the different approaches described in the literature, is not 

involved here. The reason for this is that the authors usually content themselves with general and often contradictory 

statements. 

For some computer-aided indexing techniques computer analysis of text is the fundamental step. [103(1969)]. For these the 

data is needed in machine-readable form: 

There are essentially two principal methods for obtaining a machine-readable text for computer indexing: 

- as a by-product of the printing process; and 

- through some kind of conversion procedure using keyboard devices to produce cards or tape, or using optical scanning 

devices. [9(1968)] 

The transfer of data in natural language into machine-readable form can be extremely expensive in relation to the 

application for which it will be used. That is probably why automation in natural language processing is not as well developed as 

other fields of computer application. (Indeed, there are only a few original approaches. Most applications are simple 

modifications of these, i.e. theoretical advances have been minimal in the last decade).

Previous page

Next page

1

3

4

5

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

31

32

33

34

36

Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?