Recent Researches in Information Science and ApplicationsFormal Representations of Bulgarian Possessive and ReflexivepossessivePronounsVELISLAVA STOYKOVAInstitute for Bulgarian Language - BAS52, Shipchensky prohod, str, bl. 17Sofia 1113BULGARIAvstoykova@yahoo.comAbstract: - In this paper*, we present comparison of different approaches to formal representation of Bulgarianpossessive and reflexive-possessive pronouns inflectional morphology. The interpretation is based on detailedanalysis of grammar features and semantics of possessive and reflexive-possessive pronouns, and the relatedformal representations are based on the use of semantic networks. The problem is interpreted as a grammarknowledge representation task. The principles of two different formal representations and the relatedprogramming encoding, using DATR language for lexical knowledge representation, and the UniversalNetworking Language are analyzed. Finally, more general principles and conclusions about formalrepresentation of pronominal inflectional morphology are drawn.Key-Words: - Semantic Networks, Knowledge Engineering, Computational Linguistics.1 IntroductionThe semantic networks are widely used knowledgerepresentation formalism. They offer grammarknowledge hierarchical semantic representation ofboth inflectional and conceptual knowledge byusing mostly rule-based encodings. At the sametime, different rule-based applications offer differenttechniques for encoding of almost all grammarfeatures, including different encodings of one andthe same grammar feature in similar semanticnetworks formalisms.Further, we are going to compare twoapplications of inflectional morphology ofBulgarian possessive and reflexive-possessivepronouns represented in semantic networks – usingDATR language for lexical knowledgerepresentation, and using Universal NetworkingLanguage (UNL).2 Linguistic and computationalapproaches to inflectional morphologyThe traditional interpretation of inflectionalmorphology given at the descriptive academicgrammar works [4] is a presentation of tables. Thetables consist of all possible inflected forms of arelated word with respect to its subsequent grammarfeatures. The standard computational approach toinflectional morphology is to represent words as arule-based concatenation of morphemes, and themain task is to construct relevant rules for theircombinations.Natural language processing applications usedifferent techniques to represent phonological,morphological and syntactic knowledge. The soundalternations influence the inflectional morphologyand as a result, they form irregular word forms.Thus, we have a rather unsystematically formedvariety of regular and irregular word forms [8],which require a non-monotonic rule-based formalinterpretation. Additional difficulties come from thefact that sound alternations can be occurred both instems, prefixes, suffixes and also on theirboundaries, which suggest extremely complicatedsolutions.3 The DATR language for lexicalknowledge representation* The research is supported by the project BG051PO001-3.3-05/0001 “Science and Business” of the Ministry of Education, Youthand Science of Bulgaria.ISBN: 978-1-61804-150-0 144

Recent Researches in Information Science and ApplicationsThe DATR language is a non-monotonic languagefor defining the inheritance networks through path /value equations [2]. It has both an explicitdeclarative semantics and an explicit theory ofinference allowing efficient implementation, and atthe same time, it has the necessary expressive powerto encode the lexical entries presupposed by thework in the unification grammar tradition. In DATRinformation is organized as a network of nodes,where a node is a collection of related information.Each node has associated with it a set ofequations that define partial functions from paths tovalues where paths and values are both sequences ofatoms. Atoms in paths are sometimes referred to asattributes. DATR is functional, it defines a mappingwhich assigns unique values to node attribute-pathpair, and the recovery of these values isdeterministic.The semantics of DATR uses non-monotonicinference and default inheritance, and allows thegeneralization-capturing representation of theinflectional morphology. DATR has the expressivepower which is capable to encode and process bothsyntactic and morphological rules and allowsrepresentation of grammar knowledge by using thesemantic networks. The DATR language has a lot ofimplementations however the analyzed applicationwas made by using QDATR 2.0 [13] (see relatedfile bul_det.dtr). This PROLOG encoding usesSussex DATR notation. DATR allows constructionof various types of language models (languagetheories), and the implementation allows to processwords in Cyrillic alphabet.4 The Universal NetworkingLanguageIn the UNL approach, information conveyed bynatural language is represented as a hyper-graphcomposed of a set of directed binary labeled links(referred to as relations) between nodes or hypernodes(the Universal Words (UWs)), which standfor concepts [12].Universal Words represent universal conceptsand correspond to the nodes - to be interlinked byrelations or modified by attributes – in a UNLgraph. They can be associated to natural languageopen lexical categories (noun, verb, adjective andadverb). Additionally, UWs are organized in ahierarchy (the UNL Ontology), are defined in theUNL Knowledge Base, and exemplified in the UNLExample Base, which are the lexical databases forUNL. As language-independent semantic units,UWs are equivalent to the sets of synonyms in agiven language, approaching the concept of a synsetused by the WordNet.Attributes are arcs linking node onto itself. Theycorrespond to one-place predicates, i.e., functionsthat take a single argument. In UNL, attributes havebeen normally used to represent informationconveyed by natural language grammaticalcategories (such as tense, mood, aspect, number,etc.). Attributes are annotations made to nodes orhyper-nodes of a UNL hyper-graph. They denote thecircumstances under which these nodes (or hypernodes)are used. Attributes may convey threedifferent kinds of information: (i) The informationon the role of the node in the UNL graph, (ii) Theinformation conveyed by bound morphemes andclosed classes, such as affixes (gender, number,tense, aspect, mood, voice, etc), determiners(articles and demonstratives), etc., (iii) Theinformation on the (external) context of theutterance. Attributes represent information thatcannot be conveyed by UWs and relations.Relations are labeled arcs connecting a node toanother node in a UNL graph. They correspond totwo-place semantic predicates holding between twoUWs and define semantic roles. In UNL, relationshave been normally used to represent semantic casesor thematic roles (such as agent, object, instrument,etc.) between UWs.UNL-NL Grammars are sets of rules fortranslating UNL expressions into natural language(NL) sentences and vice-versa. They are normallyunidirectional, i.e. the en-conversion grammar (NLto-UNL)or de-conversion grammar (UNL-to-NL),even though they share the same basic syntax. In theUNL Grammar there are two basic types of rules: (i)Transformation rules – used to generate naturallanguage sentences out of UNL graphs and viceversaand (ii) Disambiguation rules – used toimprove the performance of transformation rules byconstraining their applicability. The UNL offers auniversal language-independent and open-sourceplatform for multilingual applications [1]. Theformal interpretations of inflectional morphologycan be used for e-Learning and web-basedapplications [3] which underlay basic principles andrelated techniques of encoding.4.1 Representing inflectional morphology inUNL frameworksThe UNL specifications offer types of grammarrules particularly designed to interpret inflectionalmorphology both with respect to the prefixes,suffixes, infixes, and to the sound alternationstaking place during the process of the inflection.Thus, UNL allows two types of transformationISBN: 978-1-61804-150-0 145

Recent Researches in Information Science and Applicationsinflectional rules: (i) A-rules (affixation rules) applyover isolated word forms (as to generate possibleinflections) and (ii) L-rules (linear rules) apply overlists of word forms (as to provide transformations inthe surface structure). Affixation rules are used foradding morphemes to a given base form. They areused for generating inflections or derivations.There are two types of A-rules: (i) simple A-rules involve a single action (such as prefixation,suffixation, infixation and replacement), and (ii)complex A-rules involve more than one action (suchas circumfixation). There are four types of simpleA-rules: (i) prefixation, for adding morphemes at thebeginning of a base form, (ii) suffixation, for addingmorphemes at the end of a base form, (iii)infixation, for adding morphemes to the middle of abase form, (iv) replacement, for changing the baseform. The proposed application was made within theframework of the project ‘The Little Prince Project’of the UNL Foundation aimed to develop UNLgrammar and lexical resources for several languagesbased on the book ‘The Little Prince’ [5]. It offersthe interpretation of inflectional morphology whichuses complex A-rules for adding, suffixation,prefixation and replacement.The UNL interpretation of the possessive andreflexive-possessive pronouns defines 5 wordinflectional types every one of which uses its ownrules to generate all possible inflected forms for thefeatures of number and definiteness.5 The semantics and the grammarfeatures of possessive pronounsThe semantics of the possessive pronouns inBulgarian includes various relationships like:possession (depending whether it is an object or asubject of possession), part-of-whole, relational, etc.The main semantic relationship of the possessionvaries depending whether it is referred to thepossessor or to the thing being possessed [6]. Onlythe full forms of the possessive pronouns haveinflection [4]. The full forms of the possessivepronouns are: ’moj’ (my), ’tvoj’ (your), ’negov’(his), ’nein’ (her), ’nash’ (our), ’vash’ (your),’tehen’ (their). They have the grammar features ofperson, number, gender, and definiteness. Theinflectional morphology of possessive pronouns isgiven at the Appendix.The grammar feature of person is not inflectionaland expresses information both at the level of syntaxand at the hypertext level through agreement. Thefull forms imply information both about thepossessor and the object being possessed usingagreement in number and gender.The grammar feature of definiteness implies theinformation about the possession at the syntacticlevel using agreement and is expressed by a formalmorphological marker which is an endingmorpheme [4]. It is different for genders however,for the masculine gender two types of definitemorphemes exist – to determine a defined in adifferent way entities, which have two phoneticalternations, respectively. For the feminine and forthe neuter gender only one definite morphemeexists, respectively. For the plural, two definitemorphemes are used depending on the ending vocalof the main plural form.The features of gender and number of thedefinite article are different from the gender andnumber features of the possessive pronouns,themselves. The former are inflectional whereas thelater are not inflectional, even both they can expressagreement. Thus, our task is to analyze the relatedarchitecture and principles of the rule-basedinterpretations of the possessive pronounsinflectional morphology using DATR language forlexical knowledge presentation, and UNL.5.1 The DATR encoding of Bulgarianpossessive pronouns inflectional morphologyThe available published DATR encoding ofinflectional morphology of Bulgarian possessivepronouns is given at [7] and presents an inheritancesemantic network consisting of different inflectionaltype nodes and consisting of a rule-based formalgrammar and a lexical database (the pronouns). Theparticular queries to be evaluated are relatedinflected word forms. It also offers an account ofsound alternations [8].The interpretation is based on the adjectivesencoding [9] and takes as a starting point linguisticmotivation, in particular, the priority of one oranother grammar feature. Thus, the feature ofgender is accepted as a specific trigger to change thevalues of the inflected forms for the features ofnumber and definiteness. The DATR account ofBulgarian inflectional morphology offers spacesemantic networks representation [10] as well. Theencoding is as follows*:* Here and elsewhere in the description we use Latin alphabetto present morphemes instead Cyrillic used. Because ofmismatching between both some of typically BulgarianISBN: 978-1-61804-150-0 146

Recent Researches in Information Science and ApplicationsThe DATR interpretation of possessive pronounsuses the inheritance hierarchical formalrepresentation to interpret the inflectionalmorphology rules and uses 4 inflectional rules mostof which were defined for the adjectives. It accountsfor the sound alternations and for irregular inflectedforms. It also uses the semantic hierarchicalrepresentation of the inflectional grammar featuresof gender, number and definiteness and conciseencoding.The encoding presents the inflectional rules forgeneration of all related inflected forms ofpossessive pronouns. Node DET defines the definiteinflectional morphemes and all other nodes definethe inflectional rules for 4 related inflectional types.Thus, node Adj defines the rules for the pronouns’negov’ and ’nein’; node Adj_2 defines the rules forthe pronoun ’tehen’; node Adj_4 defines the rulesfor the pronouns ’nash’ and ’vash’, and node Adj_5defines the inflectional rules for the pronouns ’moj’and ’tvoj’. The pronoun ’moj’ is given as a differentnode defined by its person, number, gender andinflectional roots. Its generated inflected forms areas follows:5.2 The UNL encoding of Bulgarianpossessive pronouns inflectionalmorphologyIn the UNL account of possessive pronounspresented in [11], the inflectional grammar features:gender, number and definiteness are accepted as astarting point of the encoding, and the inflectionalrules are defined. The grammar features which arenot inflectional (like person and non-inflectionalgender) are presented as invariables (according tothe UNL formalism definitions) and are included inthe UNL dictionary database. In further description,we are going to use the notation defined by the UNLspecifications [12]. We are starting with the analysisof the inflectional rules for the possessive pronouns’moj’ (my) and ’tvoj’ (your) which belong to onecommon inflectional type M165.phonological alternations are assigned by two letters, whereas inCyrillic alphabet they are marked by one.The possessive pronoun ’negov’ (his) does notrealize phonetic alternations during the process ofinflection. Its inflectional grammar rules (typeISBN: 978-1-61804-150-0 147

Recent Researches in Information Science and ApplicationsM167) define the masculine, feminine, neuter andplural undefined and defined inflected word formsby attachment of the gender, plural or definitemorphemes to the base word formThe possessive pronoun ’nein’ (her) realizes onephonetic alternation (the transition of ”i” into ”j”)during the process of inflection. The inflectionalrules (type M168) for masculine, feminine, neuterand plural undefined and defined inflected wordforms are as follows:adjectives. The encoding uses related rulesdefinition for each inflected form and accounts forsound alternations and irregular inflected forms.6 The semantics and the grammarfeatures of reflexive-possessivepronounThe semantics of the reflexive-possessive pronouncombines the semantics of the possessionrelationship and that of the reflexivity. That means itexpresses the possession relationship between thepossessor (defined by the subject in the sentence,and agreed with it in gender and number) and thething being possessed (to which the pronoun isreferred, and agrees in gender and number). Thereflexive-possessive pronoun is one and it has a fulland a short form, and both they can be used withrespect to the agreement. However, only its fullform ’svoj’ (-self) has the inflectional grammarfeatures of gender, number, and definiteness (Fig.3), which are similar to that of the adjectives and ofthe possessive pronouns.The possessive pronouns ’nash’ (our) and ’vash’(your) does not realize phonetic alternations duringthe process of inflection (type M166). It uses thesame inflectional rules as for the pronoun ’negov’(his). The only difference is that the inflectional rulefor neuter gender undefined and defined forms areas follows:The most complicated inflectional rules (typeM169) are for he pronoun ’tehen’ (their) whichrealizes two phonetic alternations during the processof inflection only for the feminine and neuter gender(which are interpreted by applying the rules forreplacement), and one phonetic alternation for themasculine and plural inflected forms.The UNL account of the possessive pronounsuses complex A-rules for adding, suffixation,prefixation and replacement. It defines 5 inflectionaltypes (M165-M169) very similar to that of theThe DATR formal account of reflexivepossessivepronoun inflectional morphology usesthe inflectional rules already defined at node Adj_5and uses the same principle as for the possessivepronouns. The UNL formal interpretation ofreflexive-possessive pronoun is consistent with theencoding of the possessive pronouns ’moj’ (my) and’tvoj’ (your) and is exactly the same (Fig. 2 (a)). Ituses the inflectional rules already defined atinflectional type M165 and accounts for all relatedinflected forms.7 ConclusionThe above description presents the encoding of theinflectional morphology of Bulgarian possessiveand reflexive-possessive pronouns using DATRlanguage for lexical knowledge representation andusing UNL. Both encodings define grammar rulesfor generation of all related inflected forms based onthe use of the inflectional grammar features ofgender, number, and definiteness. For bothinterpretations, the pronouns are defined as lexicaldatabase at the dictionary by their base forms andtheir non-inflectional features of gender and person.ISBN: 978-1-61804-150-0 148

