Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.2. <strong>Lexical</strong> Knowledge Formalisms and Resources 17<br />
1.3 Idioms 15<br />
sea<br />
climbing<br />
mud<br />
clay<br />
ice<br />
fire<br />
gravel<br />
sand<br />
water<br />
earth<br />
peat<br />
soil<br />
air<br />
rock<br />
Figure 1.1: Graph snippet reflecting <strong>the</strong> ambiguity <strong>of</strong> <strong>the</strong> word rock. Therearetwodifferent<br />
Figure meanings 2.7: A<strong>of</strong> term-based rock represented lexical in <strong>the</strong>network graph: rock with “stone” <strong>the</strong>and neighbours rock “music”. <strong>of</strong> <strong>the</strong> word rock in a<br />
corpus (from Dorow (2006)). This word might refer to a stone or to a kind <strong>of</strong> music.<br />
jazz<br />
music<br />
Semantic categories tend to aggregate in dense clusters in <strong>the</strong> word graph, such as <strong>the</strong><br />
“music” and <strong>the</strong> “natural material” clusters in Figure 1.1. These node clusters are held<br />
toge<strong>the</strong>r by ambiguous words such as rock which link o<strong>the</strong>rwise unconnected word clusters.<br />
We attempt to divide <strong>the</strong> word graph into cohesive semantic categories by identifying and<br />
disabling <strong>the</strong>se semantic “hubs”.<br />
In addition, we investigate an alternative approach which divides <strong>the</strong> links in <strong>the</strong> graph<br />
into clusters instead <strong>of</strong> <strong>the</strong> nodes. Thelinksin<strong>the</strong>wordgraphcontainmorespecificcontextual<br />
information and are thus less ambiguous than <strong>the</strong> nodes which represent words in<br />
isolation. For example <strong>the</strong> link (rock, gravel) clearlyaddresses<strong>the</strong>“stone”sense<strong>of</strong>rock,<br />
and <strong>the</strong> link between rock and jazz unambiguously refers to rock “music”. By dividing <strong>the</strong><br />
links <strong>of</strong> <strong>the</strong> word graph into clusters, links pertaining to <strong>the</strong> samesense<strong>of</strong>awordcan<br />
be grouped toge<strong>the</strong>r (e.g. (rock, gravel) and(rock, sand)), and links which correspond to<br />
different senses <strong>of</strong> a word (e.g. (rock, gravel) and(rock, jazz)) can be assigned to different<br />
clusters, which means that an ambiguous word is naturally split up into its different senses.<br />
relations, <strong>the</strong> nodes can be seen as word senses. In this case, an edge E(wi, wj)<br />
indicates that one sense <strong>of</strong> wi is related to one sense <strong>of</strong> wj.<br />
PAPEL (Gonçalo Oliveira et al., 2008, 2010b) and CARTÃO (Gonçalo Oliveira<br />
et al., 2011) are resources that may be used as term-based lexical networks for<br />
Portuguese. They are structured in relational triples automatically extracted from<br />
dictionary definitions. Each triple t = {w1, R, w2} represents a semantic relation<br />
identified by R, which occurs between a sense <strong>of</strong> <strong>the</strong> lexical item w1 and a sense <strong>of</strong><br />
lexical item w2. A lexical network is established if <strong>the</strong> arguments <strong>of</strong> <strong>the</strong> relational<br />
triples, w1 and w2, are used as nodes, connected by an edge labelled as R, <strong>the</strong> type<br />
<strong>of</strong> <strong>the</strong> relation. Figure 2.8 shows part <strong>of</strong> a term-based lexical network with relations.<br />
1.3 Idioms<br />
Idioms pose an even greater challenge to systems trying to understand human language<br />
than ambiguity. Idioms are used extensively both in spoken and written, colloquial and formal<br />
language. Any successful natural language processing system must be able to recognize<br />
and interpret idiomatic expressions.<br />
Figure 2.8: A term-based lexical network with relations where banco is one <strong>of</strong> <strong>the</strong><br />
arguments (from PAPEL 2.0 (Gonçalo Oliveira et al., 2010b)). In Portuguese, besides<br />
o<strong>the</strong>r meanings, banco might refer to a bench/stool or to a financial institution.<br />
pop<br />
soul<br />
art<br />
film