11.07.2015 Views

Semi-Automatic Indexing of Documents with a Multilingual Thesaurus

Semi-Automatic Indexing of Documents with a Multilingual Thesaurus

Semi-Automatic Indexing of Documents with a Multilingual Thesaurus

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Semi</strong>-<strong>Automatic</strong> <strong>Indexing</strong> <strong>of</strong> <strong>Documents</strong> <strong>with</strong> a <strong>Multilingual</strong><strong>Thesaurus</strong>Ulrich Schiel,Universidade Federal de Campina GrandeCampina Grande / BRAZILulrich@dsc.ufpb.brIanna M. S. F. de Sousa,Universidade Federal de Campina GrandeCampina Grande / BRAZILianna@dsc.ufpb.brAbstractWith the growing significance <strong>of</strong> digitallibraries and the Internet, more and moreelectronic texts become accessible to a wide andgeographically disperse public. This requiresadequate tools to facilitate indexing, storage, andretrieval <strong>of</strong> documents written in differentlanguages. We present a method for semiautomaticindexing <strong>of</strong> electronic documents andconstruction <strong>of</strong> a multilingual thesaurus, whichcan be used for query formulation and informationretrieval. We use special dictionaries and userinteraction in order to solve ambiguities and findadequate canonical terms in the language and anadequate abstract language-independent term. Theabstract thesaurus is updated incrementally bynew indexed documents and is used to search fordocuments using adequate terms1. IntroductionThe growing relevance <strong>of</strong> Digital Libraries isgenerally recognized [Ha97]. A Digital Librarytypically contains hundreds or thousands <strong>of</strong>documents. It is also recognized that, insteadEnglish is the dominant language, documents inother languages are <strong>of</strong> great significance and,moreover, users want to retrieve documents inseveral languages associated to a topic, stated inhis own language [Ha97, Go02]. This is especiallytrue in regions as the European Community orAsia. Therefore a multilingual environment isneeded to attend user requests to Digital Libraries.The multilingual communication between usersand the library can be realized in two ways:• The user query is translated to the severallanguages <strong>of</strong> existing documents andsubmitted to the library;• The documents are indexed and theextracted terms are converted to alanguage neutral thesaurus (calledmultilingual thesaurus). The same occurs<strong>with</strong> the query, and the correspondencebetween query terms and documents isobtained via the neutral thesaurus.The first solution is the most widely used in theCross-Language Information Retrieval (CLIR)community [Go02], [OD00] [Oa99]. It applies alsoto other information retrieval environments, as theWorld Wide Web. For Digital Libraries, <strong>with</strong>thousands <strong>of</strong> documents, indexing <strong>of</strong> incomingdocuments and a good association structurebetween index terms and documents can becomecrucial for efficient document retrieval. This paperpresents a system based on the second approach.In order to get an extensive and preciseretrieval <strong>of</strong> textual information, a correct andconsistent analysis <strong>of</strong> incoming documents isnecessary. The most broadly-used technique forthis analysis is indexing. An index file becomes anintermediate representation between a query andthe document base.One <strong>of</strong> the most popular structures for complexindexes is a semantic net <strong>of</strong> lexical terms <strong>of</strong> a


language, called thesaurus. The nodes are single orcomposed terms and the links are pre-definedsemantic relationships between these terms, suchas synonyms, hyponyms and metonyms.Despite the importance <strong>of</strong> multilingual thesaurihas been recognized [Go02] nearly all researcheffort in Cross-Lingual Information Retrieval hasbeen done on the query side and not on theindexing <strong>of</strong> incoming documents [OD00, Oa99,Ha97].<strong>Indexing</strong> in a multilingual environment can bedivided in three steps: (1) language-dependentcanonical term extraction (including stop-wordelimination, stemming, word-sensedisambiguation), (2) language-neutral termfinding, and (3) update <strong>of</strong> the term-documentassociation lattice.Bruandet [Br89] has developed an automaticindexing technique for electronic documents,which was extended by Gammoudi [Ga93] tooptimal thesaurus generation for a given set <strong>of</strong>documents. The nodes <strong>of</strong> the thesaurus arebipartite rectangles where the left side contains aset <strong>of</strong> terms and the right side the set <strong>of</strong> documentsindexed by the terms. Higher rectangles in thethesaurus contain broader term sets and fewerdocuments. One extension to this technique is thealgorithm <strong>of</strong> Pinto [Pi97], which permits anincremental addition <strong>of</strong> index terms <strong>of</strong> newincoming documents, updating the <strong>Thesaurus</strong>.We show in this paper how this extendedversion <strong>of</strong> Gammoudi's technique can be used inan environment <strong>with</strong> multilingual documents andqueries whose language need not be the same asthat <strong>of</strong> the searched documents. The main idea isto use monolingual dictionaries in order to, <strong>with</strong>the user's help, eliminate ambiguities, andassociate to each unambiguous term an abstract,language independent, term. The terms <strong>of</strong> a queryare also converted to abstract terms in order to findthe corresponding documents.In the next section we introduce themathematical background needed to understandthe technique, whereas section 3 introduces ourmultilingual rectangular thesaurus. The followingsection 4 shows the procedure <strong>of</strong> term extractionfrom documents, finding the abstract concept andthe term-document association and inclusion <strong>of</strong> thenew rectangle in the existing rectangular thesaurus.Section 5 shows the query and retrievalenvironment and, finally, section 6 discusses somerelated work and concludes the paper.2. Rectangular <strong>Thesaurus</strong>: BasicConceptsThe main structure used for the indexing <strong>of</strong>documents is the binary relation. A binary relationcan be decomposed in a minimal set <strong>of</strong> optimalrectangles by the method <strong>of</strong> RectangularDecomposition <strong>of</strong> a Binary Relation [Ga93,BBG94]. The extraction <strong>of</strong> rectangles from a finitebinary relation has been extensively studied in thecontext <strong>of</strong> Lattice Theory and has proven to bevery useful in many computer science applications.A rectangle <strong>of</strong> a binary relation R is a pair <strong>of</strong>sets (A, B) such that A × B ⊆ R. More preciselyDefinition 1: RectangleLet R be a binary relation defined from E to F. Arectangle <strong>of</strong> R is a pair <strong>of</strong> sets (A,B) such that A ⊆E, B ⊆ F and A x B ⊆ R. A is the domain <strong>of</strong> therectangle where B is the co domain.A rectangle (A,B) <strong>of</strong> a relation R is maximal iff,for each rectangle (A’,B’)A x B ⊆ A’ x B’ ⊆ R → A = A’ e B= B’.A binary relation can always be represented bysets <strong>of</strong> rectangles, but this representation is notunique. In order to gain storage space, thefollowing coefficient is important for the selection<strong>of</strong> an optimal set <strong>of</strong> rectangles representing therelation.Definition 2: Gain in storage spaceThe gain in storage space <strong>of</strong> a rectangle RE=(A,B)is given by:g(RE)= [Card(A) x Card (B)] - [Card (A) +Card(B)]where Card(A) is the cardinality <strong>of</strong> the set A.The gain becomes significant if Card(A) > 2and Card(B) > 2, then g(RE) > 0 and grows <strong>with</strong>Card(A) and Card(B). On the other hand, there isno gain (g(GE)


A maximal rectangle containing an element (x, y)<strong>of</strong> a relation R is called optimal if it produces amaximal gain <strong>with</strong> respect to other maximalrectangles containing (x, y).Figure 2.1(a) presents an example <strong>of</strong> a relationR, and Figures 2.1(b), 2.1(c) and 2.1(d) representthree maximal rectangles containing the element(y,3). The corresponding gains are 1, 0 e -1.Therefore, the optimal rectangle containing (y,3)<strong>of</strong> R is the rectangle <strong>of</strong> Figure 2.1(b).(a)RectangleRx 1yz2345xy(b)RE 1(y, 3)123(c)RE 2(y, 3)yz35y(d)RE 3(y, 3)g( RE1(y, g( RE 2(y, g( RE 3(y, 3)3) ) = 1 3) ) = 0 ) = --1Figure 2.1: Finding the optimal rectangleDefinition 4: Rectangular GraphLet “≤≤” be a relation defined over a set <strong>of</strong>rectangles <strong>of</strong> a binary relation R, as follows:∀ (A 1 ,B 1 ) e (A 2 , B 2 ) two rectangles <strong>of</strong> R:(A 1 ,B 1 ) ≤≤ (A 2 , B 2 ) ⇔ A 1 ⊆ A 2 and B 2 , ⊆ B 1.We call (R, ≤≤) a Rectangular Graph.Note that “≤≤” defines a partial order over the set<strong>of</strong> rectangles.Definition 5: LatticeA partially ordered set (R,


R4R8SystemDevelopmentSystemInformationDevelopmentD1D4D1factdatasearchqueryinformationrecuperationdocumentreportformularyarticleR5SystemInformationD1D2Figure 2.4: Example <strong>of</strong> synonyms andpseudo-synonymsR2SystemD1D2D3, D4Finally we define a Rectangular <strong>Thesaurus</strong> asa graph where the nodes are optimal rectangles andthe edges are the semantic relations defined above.R4R8DevelopmentD1D4R5 InformationD23. <strong>Multilingual</strong> Rectangular<strong>Thesaurus</strong>The idea <strong>of</strong> creating a language-independentconceptual thesaurus has been defined by Sosoaga[So91]. The textual structure <strong>of</strong> specific languagesis mapped to a conceptual thesaurus (see Fig. 3.1).R2SystemFigure 2.3: Example lattice <strong>of</strong>rectangles – full and simplified versionD3Lexical structure<strong>of</strong> language 1Lexical structure<strong>of</strong> language 2Conceptual<strong>Thesaurus</strong>containing the relation between sets <strong>of</strong> index termsand <strong>of</strong> sets <strong>of</strong> documents. Each term in a rectangleis the representative <strong>of</strong> an equivalence relation <strong>of</strong>synonyms. The terms in the rectangle are calledpseudo-synonyms. Note that two terms are pseudosynonymsif they index the same set <strong>of</strong> documents.For a lattice <strong>of</strong> rectangles <strong>with</strong> the hierarchicrelation (A 1 ,B 1 ) ≤≤ (A 2 , B 2 ), defined above, weconsider a simplified representation, in order tosave space. We eliminate the repetition <strong>of</strong> terms inthe hierarchy. If (A 1 ,B 1 ) ≤≤ (A 2 , B 2 ) the rectangle(A 2 , B 2 ) is represented by (A 2 -A 1 , B 1 -B 2 ), <strong>with</strong>outloss <strong>of</strong> information content.Figure 2.3 illustrates a lattice <strong>of</strong> four rectangles,disrespecting language independence at thismoment, in its full and its simplified versions.Figure 2.4 shows a relation between synonymsand pseudo-synonyms. The terms Information,Retrieval and Document are pseudo-synonymsrepresenting: fact and data, search and query, andreport, formulary and article, respectively.Figure 3.1: <strong>Multilingual</strong> <strong>Thesaurus</strong>One problem <strong>of</strong> this mapping is the elimination<strong>of</strong> multiple senses <strong>of</strong> terms. For instance, the term‘word’ has 10 distinct meanings in the WordNetsystem [WN]. In general, each meaning occurs in adifferent context <strong>of</strong> using the word. We extend thedefinition <strong>of</strong> multilingual thesaurus <strong>of</strong> Sosoaga inorder to include a notion <strong>of</strong> contexts that permitsthe elimination <strong>of</strong> ambiguities.A multilingual thesaurus is a classificationsystem defined asMTh = (V, n, r; L 1 , ..., L n , C 1, ..., C m , t 1 , ... , t n*m )composed <strong>of</strong> a unique set <strong>of</strong> abstract concepts(V), a set <strong>of</strong> lexicons {L 1 ,…, L n }, a set <strong>of</strong> contexts{ C 1 , ... , C k } a set <strong>of</strong> functions t k : L i × C j → Vwhich associates to each term <strong>of</strong> the lexicon in agiven context a unique abstract term. Thehierarchic and non-hierarchic relationships aregiven by n (narrower term) and r (related term).Therefore, both n and r are subsets <strong>of</strong> V × V.


A rectangular multilingual thesaurus is arectangular thesaurus as defined in the previoussection, where the terms at the left part <strong>of</strong> eachrectangle are elements <strong>of</strong> the abstract concepts (V)<strong>of</strong> a multilingual thesaurus, and the right side areidentifiers <strong>of</strong> documents.Note that the way to obtain the rectangularmultilingual thesaurus <strong>of</strong> a set <strong>of</strong> documents is (1)indexing each document, (2) define the correctmeaning <strong>of</strong> each term, and (3) finding the abstractconcept applying the corresponding function t.In order to construct rectangles <strong>with</strong>representative terms, we can decompose eachfunction t in two parts t 0 and t 1 , <strong>with</strong> t(x, c) =t 1 (t 0 (x, c)). The function t 0 is responsible for theselection <strong>of</strong> the canonical representative for thesynonyms in a given context, and t 1 is an injectivefunction that determines the abstract termassociated to the original term in a given language.4. Construction and maintenance <strong>of</strong> aRectangular <strong>Multilingual</strong> <strong>Thesaurus</strong>In a digital library each incoming document(which can be a full document, an abstract or anindex card) must be indexed and integrated in thelibrary. Instead <strong>of</strong> indexing documents one-by-onewe can consider lots <strong>of</strong> incoming documents to beinserted in the library.The construction <strong>of</strong> a Rectangular <strong>Multilingual</strong><strong>Thesaurus</strong> is done in three steps:• Term extraction and disambiguationfrom one or more documents using one ormore monolingual dictionary (semiautomaticindexing), and determination <strong>of</strong>the abstract concept;• Generation <strong>of</strong> one or more optimalrectangles;• Optimal insertion <strong>of</strong> the newrectangles in the existing abstractthesaurus.Figure 4.1 shows the main modules <strong>of</strong> the system,called SIM-System for <strong>Indexing</strong> <strong>of</strong> <strong>Multilingual</strong><strong>Documents</strong>.4.1 <strong>Indexing</strong>, disambiguation and abstractionThe construction <strong>of</strong> a rectangular thesaurusreferencing a set <strong>of</strong> electronic documents in naturallanguage begins <strong>with</strong> relevant term extractioncontained in the document. Our semi-automaticmethod allows the user to eliminate ambiguitiesinteractively. In the current version <strong>of</strong> the systemthe format <strong>of</strong> incoming document can be pure text(txt), Micros<strong>of</strong>t Word documents (doc) or html.Other formats must be converted to pure text.DOCUMENT<strong>Indexing</strong>RECTANGLEUpdatingTHESAURUSDICTIONARYNEW WORDScontextclassificationlibrarianWordNet4.1 <strong>Semi</strong>-automatic indexingThe first step consists <strong>of</strong> words selection, stopword elimination and, for significant words,finding <strong>of</strong> the abstract term. As shown in Figure4.2, two dictionaries are use for this step. First, thedictionary <strong>of</strong> term variations contains all lexicalvariations <strong>of</strong> words in a language and determinesits normal form. Compound words, such as 'DataBase' or 'Operating System' must be considered assingle terms, since in other language they arewritten as single word (e.g. 'Datenbank' and'Betriebssystem' in German) and should beassociated to a single abstract term code. These areidentified by a look-ahead step when the dictionaryidentifies a candidate compound word.Having found the normal form <strong>of</strong> a term, themain dictionary is then used to find the abstractlanguage-independent term, depending on thecontext.In the main dictionary, the column"Representative" and the list <strong>of</strong> "Related terms"will be used in the construction <strong>of</strong> the thesaurus ina given language for query formulation (seesection 5 below).Each rectangle obtained in the previous step relates a set<strong>of</strong> terms to a set <strong>of</strong> documents. If we are processing a single


document, one rectangle is generated <strong>with</strong> the significantTerm VariationsTermCompound TermData 1BaseDatabaseMain DictionaryTermCategoryContext Concept Represent.Data Base noun C.10125 DatabaseScienceData noun C.10230 InformationScienceData noun Ling. DataDate noun History ageFigure 4.1: Dictionariesterms indexing that document. We must now insert the newrectangles in the existing abstract rectangular thesaurus.4.2 Updating the rectangular thesaurusFigure 4.3 shows the rectangular thesaurus for adocument base, where the abstract terms <strong>of</strong> thedomains <strong>of</strong> the rectangles has been exchanged byits representatives in english. Since it is in thesimplified form, term redundancy has beeneliminated.Distributed 2, 11CObject-oriented, library, programming, 2, 11, 12Database, S<strong>of</strong>tware, development, inheritanceTool, object, class, model, concept, system 2, 11, 12, 14∅Structured 11Relational, client-server 11, 12Figure 4.3: Document thesaurus∅DNote that in a rectangular thesaurus, we canidentify several cardinality levels, due to thecardinality <strong>of</strong> the rectangle domain. This levelgoes from 0 to n, where n is the total number termsthe thesaurus. Each new rectangle must be placedin the corresponding level. In the example, thesecond level has cardinality 5 and the third levelhas cardinality 12, since 7 terms has been added tothe level below.The following algorithm, from [Pi97], providesthe insertion <strong>of</strong> a new rectangle in an existingrectangular thesaurus. We consider the thesaurusin its original definition, <strong>with</strong>out simplification.The simplification is straightforward.1. Check if the cardinality level <strong>of</strong> the newrectangle exists in the thesaurus1.1. If it does not exists, create the new level forthe rectangle1.2. else1.2.1. If the domain <strong>of</strong> the new rectanglecoincides <strong>with</strong> an existing rectangle,then add the new document to the codomain <strong>of</strong> the existing rectangleelse insert the new rectangle in thelevel2. If a new rectangle has been created, establishthe hierarchic connections, searching for thehigher level rectangles containing the newterms.3. New terms not occurring in the supremum, areinserted there4. If the descendants <strong>of</strong> the new rectangle areempty, connect it to the infimum5. Information RetrievalThe purpose <strong>of</strong> an information retrieval systemis to return to the user a set <strong>of</strong> documents thatmatch the keywords expressed in a query. In oursystem we present an interface, using the user'spreferred language, representing the thesaurus <strong>of</strong>concepts occurring in the document database. Thisthesaurus includes synonyms, related terms andhierarchic relations. Using pre-existing terms,obtained from the documents in the library, helpsusers to formulate its query in an adequateterminology, reducing significantly naturallanguage interpretation problems.The dictionary-like form <strong>of</strong> the interfaceguarantees fast access to the documents searchedfor. As it was reported by LYCOS, typical userqueries are only two or three words long. Figure


5.1 shows the prototype's interface [So98] <strong>with</strong>terms defined in Portuguese.In a rectangular thesaurus, the retrieval processconsist <strong>of</strong> finding a rectangle Ri = Ci x Di, suchthat Ci is a minimal domain containing the set <strong>of</strong>terms from the query Q. If Ci ≠ Q the user canreceive a feedback from the system concerningother terms which index the retrieved documents.This fact is identified as a Galois connection in[Ga93]. Note that we can obtain several rectanglesmatching the query. On the other hand, the usercan eliminate some terms from the query in orderto obtain more documents.As can be seen in the figure, the prototypeallows one to choose a language and, as he/she isselecting the terms, the system lists thecorresponding documents.Figure 5.1: Query interfaceThe prototype has been implemented in theDelphi Programming Environment and, in its firstrelease, recognizes Word (.doc), html and text(.txt) documents.5. Related workThe model HiMeD [RN01, Li98] deals <strong>with</strong> theindexing and retrieval <strong>of</strong> documents. It isspecialized on medical documents and theindexing step is completely automatic. Since thedomain is restricted, the occurrence <strong>of</strong> polysemicterms is not so frequent as for general digitallibraries. As language neutral ontology <strong>of</strong> medicalterms they use the medical vocabulary MeSH[NLM00] combined <strong>with</strong> a generic dictionary.Gilarranz, Gonzalo and Verdejo [GGV02],proposed an approach <strong>of</strong> indexing documentsusing the information stored in the EuroWordNetdatabase. From this database they take thelanguage-neutral InterLingual Index. For theassociation <strong>of</strong> queries to documents they use thewell-known vectorial approach.The MULINEX project [ENU99] is a EuropeanUnion effort to develop tools to allow crosslanguagetext retrieval for the WWW, conceptbasedindexing, navigation tools and facilities formultilingual WWW sites. The project considersseveral alternatives for document treatment, suchas translation <strong>of</strong> the documents, translation <strong>of</strong>index terms and queries, relevance feedback <strong>with</strong>translation.6. ConclusionMost work on thesaurus construction usesautomatic indexing <strong>of</strong> a given set <strong>of</strong> documents[Br89, Ga93] and, in case <strong>of</strong> a multilingualframework, uses machine translation techniquesapplied on the query [Ya98]. In [Pi97] anincremental version <strong>of</strong> the approach on automaticgeneration <strong>of</strong> rectangular thesauri has beendeveloped. The main contribution <strong>of</strong> this paper isto integrate the incremental approach <strong>with</strong> adictionary-based multilingual indexing andinformation retrieval, including an interactiveambiguity resolution. This approach eliminatesproblems <strong>of</strong> automatic indexing, linguisticvariations <strong>of</strong> a single concept and restrictions <strong>of</strong>monolingual systems. Furthermore the problem <strong>of</strong>terms composed <strong>of</strong> more than one word has beensolved <strong>with</strong> a look-ahead algorithm for candidatewords found in the dictionary.It is clear that the interaction <strong>with</strong> the user isvery time consuming. But, it seems to be a goodtrade-<strong>of</strong>f between the precision <strong>of</strong> manualthesaurus construction and the efficiency <strong>of</strong>automatic systems. With an 'apply to all' option oncan avoid a repetition <strong>of</strong> the same conflictresolution.


Lexical databases, such as WordNet and theforthcoming EuroWordNet can be useful to <strong>of</strong>fer asemantic richer query interface using thehierarchic relations between terms. Thesehierarchies must also be included in the abstractconceptual thesaurus.References[BBG94] Belkhiter, N., Bourhfir, C., Gammoudi,M.M., Jaoua, A. Le Thanh, N. and Reguig, M.Décomposition Rectangulaire Optimale d’uneRelation Binaire: Aplication aux Bases deDonnées Documentaires. INFOR vol. 32, n° 1,pp. 33-54, 1994.[Br89] Bruandet, M.-F. ‘Outline <strong>of</strong> a Knowledge-Base Model for an Intelligent InformationRetrieval System’. Information Processing &Management. Vol. 25, N o 1, pp. 89-115, 1989.[Ga93] Gammoudi, M. M.. Méthode deDécomposition Rectangulaire d'une RelationBinaire: Une base formelle et uniforme pour lagénération automatique des thesaurus et larecherche documentaire. Thése de doctorat.Université de Nice − Sophia Antipolis EcoleDoctorale des Sciences pour L'Ingenieur, 1993.[ENU99] G. Erbach, G. Neumann & H. Uszkoreit‘MULINEX <strong>Multilingual</strong> <strong>Indexing</strong>, Navigationand Editing Extensions for the World-WideWeb’, proceedings <strong>of</strong> the Third DELOSworkshop -- Cross-Language InformationRetrieval and Proceedings, Zürich/CH, 1997[Fe97] Ferneda, E. ‘Construção Automática de um<strong>Thesaurus</strong> Retangular’, Master thesis,Departamento de Sistemas e Computação,Universidade Federal da Paraíba, CampinaGrande, 1997.[Fr02] Freitas-Jr, H.R., Laender, A. H., Lima, L.,Ribeiro-Neto, B., Vale, R. Recuperação deInformação Médica Interlínguas, Proc. <strong>of</strong> theXVII Brazilian Symposium on Databases,Gramado/Brazil, 2002, pp.209-223[GGV01] J. Gilarranz, J. Gonzalo & F. Verdejo‘An approach to conceptual text retrieval usingthe EuroWordNet <strong>Multilingual</strong> SemanticDatabase’ in Working Notes <strong>of</strong> the AAAISymposium on Cross Language Text and SpeechRetrieval, 1997[Ha97] H. Haddouti ‘Survey: <strong>Multilingual</strong> TextRetrieval and Access’ in Working Notes <strong>of</strong> theAAAI Symposium on Cross Language Text andSpeech Retrieval, 1997[Li98] L.R.S. Lima, A.F. Laender & B. Ribeiro-Neto ‘A hierarchical approach to the automaticcategorization <strong>of</strong> medical documents’ in Proc. <strong>of</strong>the 7 th Intl. Conf. on Information KnowledgeManagement (1998) pp. 132-138[NLM00] National Library <strong>of</strong> Medicine-USA‘Tree Structures & Alphabetic List – 12 th Edition,(2000)[Oa99] D. Oard ‘Global Access to <strong>Multilingual</strong>Information’ keynote address at the FourthInternational Workshop on Information Retrieval<strong>with</strong> Asian Languages-IRAL99, Taipei Taiwan,1999[OD00] W.Ogden & M.Davis ‘Improving CrosslanguageText Retrieval <strong>with</strong> Human Interaction’Proc. <strong>of</strong> the 33rd Hawaii Intl. Conference onSystem Sciences, Mauai, HI/USA, 2000[Pi97] Pinto, W. S. ‘Sistema de Recuperação deInformação com navegação através de Pseudo<strong>Thesaurus</strong>’. Master Thesis. Universidade Federaldo Maranhão, 1997.[RN01] B. Ribeiro-Neto, A.F. Laender & R.S.Lima ‘An experimental study in automaticallycategorizing medical documents’ Journal <strong>of</strong> theAmerican society for Information science andTechnology (2001) pp. 391-401[So98] Sodré, I. M. ‘SISMULT - Sistema deIndexação <strong>Semi</strong>-automática Multilíngüe’. MasterThesis, Universidade Federal da Paraíba/COPIN,Campina Grande, 1998.[So91] Sosoaga, C. L. De ‘<strong>Multilingual</strong> access toDocumentary Databases’ In A. Lichnerowicz,Editor. Proceedings <strong>of</strong> a Conference onIntelligent Text and Image Handling (RIAO91),Amsterdam, April 1991, P. 774-778[WN] WordNet – a lexical database for theEnglish language, Princeton,http://www.cogsci.princeton.edu/~wn/[Ya91] Yang, Y., Carbonell, J., Brown, R. &Frederking, R. ‘Translingual informationretrieval: learning from bilingual corpora’,Artificial Intelligence 103 (1998) 323-345

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!