Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

More documents

Recommendations

Info

To distinguish between those techniques, which require the text to be indexed in machine-readable form from those, which do not, the terms computer controlled and intellectual directed are used respectively. This review reports on the means by which the computer can be integrated into the indexing process in order to supplement the intellectual effort, so as to render the human interventions more reliable. Semi-automatic indexing - where the computer takes over the mechanical and repetitive tasks of indexing, leaving the human indexer to deal with the intellectual problems involved [13(1968), 30(1972)] - does have the advantage of allowing a more consistent indexing, reducing training time cutting costs by speeding up actual indexing with respect to [74(1962)] purely intellectual indexing, since the abilities of both human beings and computers can be combined to form a perfect symbiosis. Computerized indexing is viewed in a similar way by some other authors [103(1969)]. Scheffler and Smith developed their computer aided indexing concept out of a desire to use the computer to perform the routine aspects of the indexing process while permitting the indexers to use their intellects to resolve non-routine situations. By allocating the treatment of commonly occurring and re-occurring indexing situations to the computer indexer is freed from making and remaking the same decisions when encountering them from one document to the next. The function could then be upgraded to making decisions regarding only new indexing situations (learning process (Auth.)). As such decision would be made, they could be added to the computer's capability. This higher level human activity would be more stimulating and would thus alleviate much of the tediousness of the indexing function. On the other hand, a greater responsibility is placed upon the human decision maker, since the decision made would affect not only the indexing of the document at hand but also the indexing of all subsequent documents in which that indexing situation occurred. For Wessel [121(1968)] Machine-aided indexing asks of the machine only that machines do well what they can do in providing help to the human indexer in doing well what the human can do: The machine can provide on-line and direct-error-checking and error-corrective guidance; it can provide consistency checks. In fact, if we desire the machine can help to enforce consistency among different human indexers. The machine, if appropriately programmed, can also be provided a learning or instructional period for training indexers as they begin to use a new index list. Bernier [ 18( 1968)) summarizes these general remarks by saying: One of the first improvements (in indexing (Author); may be computer-aided indexing in which the computer, by display of terms or by translation into standard index terminology saves the indexer from functioning essentially as a lexicographer The so-called natural-language indexing separates the processes of indexing from those of lexicography verv neatly by having the computer provide table look-up for terms used by author and indexer to translate them into standard index terminology. New terms not in the computer vocabulary, are sent back not to the indexer but to a lexicographer who enters them into the system either as synonyms, under a more general or broader term, or as a new heading. This kind of computer-aided indexing seems to be attractive. It has the problems associated with multiple-word terms, homographs, and the like. Semi-automatic indexing is adequate as long as no fully automatic procedures are available. (Some examples of automatic indexing techniques do already exist [99(1968), 37(1970)], but there are still open problems especially in computational linguistics, to improve efficiency. As these tend to be of a high degree of sophistication, computer assisted algorithms could be very useful). Generally, one may state that the need for man-machine symbiosis or interaction grows in respect to the complexity of the problem in question. This is well explained by Licklider [67(1960)]: Present day computers are designed primarily to solve pre-formulated problems or to process data according to pre-determined procedures. The course of the computation may be conditional upon results obtained during the computation, but all the alternatives must be foreseen in advance. (If an unforeseen alternative arises, the whole process comes to a halt and awaits the necessary extension of the program). The requirement for pre-formulation or pre-determination is sometimes no great disadvantage. It is often said that programming for a computing machine forces one to think clearly, that it disciplines the thought process. If the user can think his problem through in advance, symbiotic association with a computing machine is not necessary. However, many problems are very difficult to think through in advance. They would be easier to solve, and they could be solved faster, through an intuitively guided trial-and-error procedure in which the computer cooperated, turning up flaws in the reasoning or revealing unexpected turns in the solution. One of the main aims of man-computer symbiosis is to bring the computing machine effectively into the formulative parts of technical problems There will undoubtedly be various omissions of activities in the field. They probably have been omitted for one of the following reasons: - not published, - published in some not easily accessible report, not quoted in one of the analyzed items, - no significant title or abstract; or have been neglected because the methods adopted did not contain new ideas. Neither the inclusion nor the omission of any report is intended to reflect an evaluation. 2. SEMI-AUTOMATIC INDEXING TECHNIQUES 2.1. Conversational or Interactive Indexing Techniques Interactive indexing is defined here as an indexing procedure with a reciprocal interaction between the indexer and the computer, in a conversational time frame (3-5 seconds). Such a process can be either intellectually directed or computer controlled. In the former case there is no proper dialogue between man and machine as the computer activities are restricted to clerical tasks only, such as table look-up. These techniques do not necessarily require the text to be in machine-readable form. Computer controlled indexing could be an automated process, which comes to a halt if an unforeseen alternative arises, in order to receive further instructions from the indexer, or in which the computer is programmed to display some working sheet for the indexing on the screen. It needs the whole, or at least considerable parts, of the text to be indexed in machine-readable form. Little has been done to realize conversational systems for indexing or dictionary construction using this latter type of dialogue. Applications of this type of conversation are found, for example, in those systems in which the indexer or lexicographer has to decide the disposition of an unknown word at once (in a time sharing environment). [90(1971), 54(1971), 61(1973)]. A computer controlled system can be called a learning system, if the information furnished by the indexer could be used to resolve automatically similar problems at a later moment. This situation is given e.g. for dictionary updating. Intellectually directed interactive systems have been developed primarily for retrieval purposes. Some of these are: ESRO/RECON for aero-space literature and others, [76(1973)], MEDLINE, for medical, ENDS [116(1970)1 for nuclear information, BOLD [22(1967)] etc. At best, these and similar systems provide access to the document data base and to the documentation language in order to help the user to formulate his question. The computer displays from the document base at a user's request - document identifications, index terms assigned, titles or abstracts, from the thesaurus file
— alphabetically adjacent index terms and/or those semantically or hierarchically related to a given one. (The latter display requires structured thesauri.) The user then proceeds to narrow the range of his search by imposing a set of constraints and observing, interactively, the effects of these constraints. [27(1970)]. He can proceed on different levels depending upon the data bases to which he has access. The evaluation of the query can be done with regard to its precision and ambiguity using the thesaurus file and with regard to the user's real need using the search file in an interactive mode (feed back). It should be observed that the use of these conversational systems is restricted to query formulation. Indexing in most of the mentioned and similar systems is still done completely manually. But what is the difference between the query formulation and indexing of a document? The problems inherent in both are very similar. In fact, Herr [48, 49(1970)] observes that subject indexers often compare a new document with items under various subject headings to determine the most appropriate slot for the new document. Hence, indexing could be done faster and more consistently by using a conversational indexing system. Bennet [ 14( 1969)] goes still further by requiring that: when adding a document to a collection an indexer should choose a representation which makes evident both the content of the document and its relation to other documents already in the collection. This requirement is based on the observation that users, on the average, are dissatisfied if more than 50 documents are presented in response to a subject search. This might suggest that no individual content identifier should characterize more than 50 documents. The system can inform the indexer when he uses an identifier which is beyond this threshold, whereupon he can consider an alternative, more refined, subdivision. This re-assessment of content identification, occurring in a planned and continuous manner, could benefit both librarian and user. As another approach, Markus [74(1962)] suggests that: each choice of an indexing term could place in front of the indexer a display of questions or possible additional indexing terms. These would be arranged to guide his thinking to the next logical choice of an indexing term, (see also [45(1973)]. Further advantages of access to the data bases during indexing seem to J. Herr, [48, 49(1970)] to be that decentralized indexers can communicate through their work and that new indexers can be trained with minimal contact with experienced indexers by attempting to duplicate indexing patterns in the system. Access to the data base during indexing also permits re-indexing which could be most desirable to improve discrimination between similar documents. The most considerable barrier inhibiting the use of on-line systems for indexing may still be its cost, although this has come down considerably during the last years. However, the most economic indexing process does not necessarily give the best results. Therefore quality considerations should be taken into account too. The best index will be the most economic one in the long-term range, as the prospective user of information will have more sophisticated requirements. [ 109(1972)]. 2.2 Symbiotic Indexing Techniques Symbiotic or off-line indexing means the integration of the computer into the indexing process without being permanently in contact with the indexer. The computer or indexer furnishes data which can be used for decision making at a later moment by the indexer or computer respectively. The process can alternate repeatedly. This kind of indexing is applied for economical reasons, preferably, in cases where large amounts of texts are to be elaborated, such as primary index- and dictionary construction of any kind. Most techniques can be defined as computer controlled, since the text is needed in machine readable form and since the computer makes the choice of index terms. The final decision on this choice most often remains, however, a human prerogative but it is usually a binary decision whether to accept or reject an index term selected by the computer. Intellectually directed semi-automatic indexing techniques could be defined as those techniques which require a 'go-word' dictionary. In this special application the indexer has made his choice on the index terms a priori and the computer is used to find their occurrences in the text. The computer's reliability and speed as a searching, matching, comparing and arithmetic device can be exploited in two extremely useful ways. The computer can be used to edit the work of the indexer; it can also help to redesign an index so that it is sensitive, to and responds to, changes in the information content of a collection. [ 12( 1965)]. The editing function of the computer is expressed as follows: Since the computer is to take over the role of the editor, the indexer or author can now freely assign terms to a document and allow the computer to determine whether or not an assigned term is allowed by the index, whether or not the spelling of the term is acceptable, and whether the format of the term meets specifications. If desired, cross-references can also be added automatically. [78(1968), 77(1969)]. The methods adopted for this task consist of simple dictionary comparisons. For error detection the terms not found in the dictionary can be checked for simple errors such as a missing letter or the transposition of two adjacent letters. If the error cannot be automatically corrected the term is displayed in order to be rectified manually. [78(1968), 77(1969), 117(1970)]. Redesigning an index with the aid of a computer capitalizes on the arithmetic features of the machine. Using these, it is possible to keep a running tally on all the activities of the system, e.g., how often a term has been assigned to the documents of the collection, how many questions have used a given term, and so on. When specified thresholds on such empirical data are reached, a computer can indicate that a revision of the index is necessary and can determine the documents that will be affected by the revision. For example, as a document collection grows, when a given index term is assigned to too large a proportion of documents, that term loses power as a discriminator during search. This implies that the concept needs to be subdivided into more specific categories and that the original term should be used to designate a class. To control such circumstances one might specify, for example, that whenever a subject heading or an index term is assigned to one percent of the document collection, once the size of the collection has reached the range of 10,000 to 12,500 documents, that the computer program must provide a print-out of the subject heading together with a list of the accession numbers of the documents to which that heading has been assigned. The use of a range rather than an absolute number would allow the system to continue effectively where the documents being added — and therefore now subject to revision - had already been indexed under the old heading. It would further accommodate the transition period, which always accompanies revision. [ 12(1965)]. Symbiotic indexing as it is defined here, often also requires an intellectually performed editing function to prepare the input for (semi-)automatic processing (pre-editing), or to decide on the index terms chosen by the computer (post-editing). Text preparation may be at any level, for all kinds of indexes, (see also [85(1964)]. 1. Addition of special codes (escape sequences) for special signs, such as integral sign, or codes to represent uppercase, italics, boldface etc. 2. Marking of document places which means assigning to each word and non-verbal text expression its place, such as title, abstract, summary, heading, maintext, footnote, etc.
Page 1: • Q < < P198486 N?07 AGARDograph
Page 4 and 5: THE MISSION OF AGARD The mission of
Page 7: Summary SEMI-AUTOMATIC INDEXING Sta
Page 11 and 12: Intellectual improvements of the KW
Page 13 and 14: ANNUAL REPORT EDITORIAL:* 1966= ANN
Page 15 and 16: (example contd.) 3.1.5. Proper Noun
Page 17 and 18: CHEMICAL ABSTRACT Journal Compound
Page 19 and 20: Some rules were established in orde
Page 21 and 22: The principle for the creation of s
Page 23 and 24: significance or non-significance ha
Page 25 and 26: REFERENCES 1 ACKERMANN H.J., HAGLIN
Page 27 and 28: 68 LOCKHEED MISSILES AND SPACE CO.
Page 31 and 32: AGARDograph No. 179 Advisory Group
Page 33 and 34: AGARDograph No. 179 Advisory Group
Page 36: DISTRIBUTION OF UNCLASSIFIED AGARD

Semi Automatic Indexing State of the Art - FTP Directory Listing - Nato

Create successful ePaper yourself

Delete template?

Save as template?