12.09.2013 Views

Programme booklet (pdf)

Programme booklet (pdf)

Programme booklet (pdf)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

78<br />

CLIN 21 – CONFERENCE PROGRAMME<br />

A database for lexical orthographic errors in French<br />

Abstract<br />

Manguin, Jean-Luc<br />

GREYC - Univ.de Caen - France<br />

This work describes the construction of a database for lexical orthographic errors in<br />

French. This construction uses different techniques form the field of NLP for a goal in<br />

the field of psycholinguisitics. In psycholinguisitics, it is often difficult and long to<br />

collect enough data from experiments with real people. Here the data are collected online<br />

and come from the requests made to an on-line dictionary. In this huge amount of<br />

data (about 160 millions words, 4 millions distinctive forms), we can find enough errors<br />

to have good statistics for a deep study of errors. The questions developped here are<br />

the link between a "bad" form and its correction, and the classification of errors in a<br />

small number of types. Several programs and techniques are involved to achieve these<br />

tasks : detection of graphic neighbours, phonetization, pattern matching : the<br />

combination of these techniques leads us to 70% of correction with no ambiguity, and<br />

80% if we accept the system give several possible corrections. The classification of<br />

errors is also useful for predicting where errors may appear in the words, and thus for<br />

the knowledge of children's learning of orthography.<br />

Corresponding author: jean-luc.manguin@unicaen.fr

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!