15.11.2013 Views

Análisis sintáctico conducido por un diccionario de patrones de ...

Análisis sintáctico conducido por un diccionario de patrones de ...

Análisis sintáctico conducido por un diccionario de patrones de ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

semantic network to give a measure of “semantic nearness” between constituents. For<br />

this purpose we assign different weights to relations, hierarchy concepts links and<br />

implicit relations.<br />

In our mo<strong>de</strong>l the advanced government patterns dictionary is the most<br />

practical to solve most of the structure ambiguities. The dictionary reflects the<br />

properties of the language itself, giving the syntactic constructions for each<br />

predicative word, i.e. the entire subcategorization information for each specific word.<br />

For Spanish, there are no dictionaries with complete subcategorization information.<br />

There is some spread information consi<strong>de</strong>red by several authors. For a big dictionary<br />

we require thousands of entries but manual work implies labor intensive and so much<br />

time. We propose a statistical method to compile the syntactic information.<br />

We propose a method to compile the frequency of combinations. These<br />

combinations correspond to the specific predicative words and the prepositions that<br />

introduce their valences. In our dictionary, the weight of a combination is <strong>de</strong>fined as<br />

the quotient of the frequency of the combination in the correct variants of parsing, i.e.,<br />

in the texts, and its frequency in the incorrect variants of syntactic structures produced<br />

by the specific analyzer.<br />

The statistical mo<strong>de</strong>l to obtain those frequencies is based on two sources: one<br />

generating the true structures and one generating noisy variants which represent the<br />

parser’s mistakes. Thus, some combinations can have a weight greater than 1, which<br />

means that this combination appears in correct variants more frequently than in<br />

incorrect ones. Others can have a weight inferior to 1, which means that they more<br />

frequently appear in false variants. Finally, some combinations may have a weight of<br />

1, which means that this combination is useless for disambiguation, even if it is<br />

frequent in the texts.<br />

The statistical weights give actually the possibility to change the whole point<br />

of view on the nature and use of the dictionary that is used for the purpose of<br />

disambiguation, giving the kinds of errors that an analyzer makes. We obtained those<br />

weights by an iterative process. The process begins with an empty dictionary. For<br />

each phrase all hypothesis about syntactic structure ma<strong>de</strong> by the parser have the same<br />

weights in the first iteration. Once the frequency of the combination in the correct<br />

variants in the texts and its frequency in the incorrect variants of syntactic structure<br />

are <strong>de</strong>termined for each fo<strong>un</strong><strong>de</strong>d combination a new weight calculation for all the<br />

variants is ma<strong>de</strong>. These steps are repeated <strong>un</strong>til the difference between weights<br />

obtained in the previous iteration and the actual iteration is not greater than the<br />

established threshold.<br />

Since such a dictionary should contain statistical weights of the combinations<br />

for specific words, we employ the method on the texts of the LEXESP Spanish<br />

corpus. We test the obtained results of the syntactic information compiled for the<br />

advanced government patterns dictionary on a group of 100 sentence extracted from<br />

that corpus and parsed by the CFG module. The true structures for the input sentences<br />

probed to be classified in the 35% rank of our experiments.<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!