25.08.2013 Views

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Pooling of corpora is not as frequent as, for example, the pooling of dictionaries.<br />

The main reason for this may be that corpora are very specific, and document a<br />

cultural heritage. Pooling them with corpora of different languages, subject areas,<br />

registers, and so forth is only of limited use. Nevertheless, there are some computer-<br />

linguistic pools that integrate corpora for computational purposes, and that may,<br />

therefore, integrate your corpora and maintain them for you. A description of these<br />

mostly very complex pools is beyond the scope of this paper, but the interested reader<br />

might check the following projects:<br />

• GATE (http://gate.ac.uk);<br />

• Natural Language Toolkit (http://nltk.sourceforge.net); and,<br />

• XNLRDF (http://140.127.211.213/xnlrdf).<br />

Projects targeting language documentation may also host your corpora, (e.g. the<br />

TITUS Project [http://titus.uni-frankfurt.de/]). In addition, LDC (http://www.ldc.<br />

upenn.edu) and ELRA (http://www.elra.info) are hosting and distributing corpora<br />

(and dictionaries) so that your institute might profit financially from sold copies of the<br />

corpus you created.<br />

Once you decide to create your own free software (including corpora, dictionaries,<br />

etc.), you have to think about the license and the format of the data. From the great<br />

number of possible licenses you might use for your project (cf. http://www.gnu.org/<br />

philosophy/license-list.html for a commented list of licenses) you should consider the<br />

GNU General Public License, as this license, through the notion of Copyleft, doesn’t<br />

give a general advantage to someone who is copying and modifying your software.<br />

Copyleft refers to the obligation that:<br />

…anyone who redistributes the software, with or without changes,<br />

must pass along the freedom to further copy and change it. (...)<br />

Copyleft also provides an incentive for other programmers to add to<br />

free software.<br />

(http://www.gnu.org/copyleft/copyleft.htm)<br />

With Copyleft, modifications have to be made freely available under the same<br />

conditions as you originally distributed your data, and if the modifications are of<br />

general concern, you can integrate them back into your software. The quality of<br />

your resources improves, as others can find and point out mistakes or shortcomings<br />

in the resources. They will report to you as long as you remain the distributor. In<br />

addition, you may ask people to cite your publication on the resource whenever using<br />

the resource for one of their publications. Without Copyleft, important language<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!