24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.2. <strong>Lexical</strong> Knowledge Formalisms and Resources 17<br />

1.3 Idioms 15<br />

sea<br />

climbing<br />

mud<br />

clay<br />

ice<br />

fire<br />

gravel<br />

sand<br />

water<br />

earth<br />

peat<br />

soil<br />

air<br />

rock<br />

Figure 1.1: Graph snippet reflecting <strong>the</strong> ambiguity <strong>of</strong> <strong>the</strong> word rock. Therearetwodifferent<br />

Figure meanings 2.7: A<strong>of</strong> term-based rock represented lexical in <strong>the</strong>network graph: rock with “stone” <strong>the</strong>and neighbours rock “music”. <strong>of</strong> <strong>the</strong> word rock in a<br />

corpus (from Dorow (2006)). This word might refer to a stone or to a kind <strong>of</strong> music.<br />

jazz<br />

music<br />

Semantic categories tend to aggregate in dense clusters in <strong>the</strong> word graph, such as <strong>the</strong><br />

“music” and <strong>the</strong> “natural material” clusters in Figure 1.1. These node clusters are held<br />

toge<strong>the</strong>r by ambiguous words such as rock which link o<strong>the</strong>rwise unconnected word clusters.<br />

We attempt to divide <strong>the</strong> word graph into cohesive semantic categories by identifying and<br />

disabling <strong>the</strong>se semantic “hubs”.<br />

In addition, we investigate an alternative approach which divides <strong>the</strong> links in <strong>the</strong> graph<br />

into clusters instead <strong>of</strong> <strong>the</strong> nodes. Thelinksin<strong>the</strong>wordgraphcontainmorespecificcontextual<br />

information and are thus less ambiguous than <strong>the</strong> nodes which represent words in<br />

isolation. For example <strong>the</strong> link (rock, gravel) clearlyaddresses<strong>the</strong>“stone”sense<strong>of</strong>rock,<br />

and <strong>the</strong> link between rock and jazz unambiguously refers to rock “music”. By dividing <strong>the</strong><br />

links <strong>of</strong> <strong>the</strong> word graph into clusters, links pertaining to <strong>the</strong> samesense<strong>of</strong>awordcan<br />

be grouped toge<strong>the</strong>r (e.g. (rock, gravel) and(rock, sand)), and links which correspond to<br />

different senses <strong>of</strong> a word (e.g. (rock, gravel) and(rock, jazz)) can be assigned to different<br />

clusters, which means that an ambiguous word is naturally split up into its different senses.<br />

relations, <strong>the</strong> nodes can be seen as word senses. In this case, an edge E(wi, wj)<br />

indicates that one sense <strong>of</strong> wi is related to one sense <strong>of</strong> wj.<br />

PAPEL (Gonçalo Oliveira et al., 2008, 2010b) and CARTÃO (Gonçalo Oliveira<br />

et al., 2011) are resources that may be used as term-based lexical networks for<br />

Portuguese. They are structured in relational triples automatically extracted from<br />

dictionary definitions. Each triple t = {w1, R, w2} represents a semantic relation<br />

identified by R, which occurs between a sense <strong>of</strong> <strong>the</strong> lexical item w1 and a sense <strong>of</strong><br />

lexical item w2. A lexical network is established if <strong>the</strong> arguments <strong>of</strong> <strong>the</strong> relational<br />

triples, w1 and w2, are used as nodes, connected by an edge labelled as R, <strong>the</strong> type<br />

<strong>of</strong> <strong>the</strong> relation. Figure 2.8 shows part <strong>of</strong> a term-based lexical network with relations.<br />

1.3 Idioms<br />

Idioms pose an even greater challenge to systems trying to understand human language<br />

than ambiguity. Idioms are used extensively both in spoken and written, colloquial and formal<br />

language. Any successful natural language processing system must be able to recognize<br />

and interpret idiomatic expressions.<br />

Figure 2.8: A term-based lexical network with relations where banco is one <strong>of</strong> <strong>the</strong><br />

arguments (from PAPEL 2.0 (Gonçalo Oliveira et al., 2010b)). In Portuguese, besides<br />

o<strong>the</strong>r meanings, banco might refer to a bench/stool or to a financial institution.<br />

pop<br />

soul<br />

art<br />

film

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!