30.01.2014 Views

Introduction to a Field Linguist's Toolbox - Computational Linguistics ...

Introduction to a Field Linguist's Toolbox - Computational Linguistics ...

Introduction to a Field Linguist's Toolbox - Computational Linguistics ...

SHOW MORE
SHOW LESS

Transform your PDFs into Flipbooks and boost your revenue!

Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.

<strong>Introduction</strong> <strong>to</strong> a <strong>Field</strong> <strong>Linguist's</strong><br />

<strong>Toolbox</strong><br />

HTMD: From Corpus <strong>to</strong> Lexicon (II)<br />

Sascha Griffiths<br />

Bielefeld, 2005<br />

Bielefeld University<br />

How <strong>to</strong> Make a Dictionary/<br />

Dokumentation Bedrohter<br />

Sprachen<br />

Prof. Dr. Dafydd Gibbon<br />

Winter Term 05/06


What is <strong>Toolbox</strong>?<br />

●<br />

<strong>Toolbox</strong> is a computational <strong>to</strong>ol developed by the<br />

SIL International (formerly known as the Summer<br />

Institute of <strong>Linguistics</strong>)


What is <strong>Toolbox</strong>?<br />

●<br />

●<br />

<strong>Toolbox</strong> is a computational <strong>to</strong>ol developed by the<br />

SIL International (formerly known as the Summer<br />

Institute of <strong>Linguistics</strong>)<br />

<strong>Toolbox</strong> is designed for field work purposes


What is <strong>Toolbox</strong>?<br />

●<br />

●<br />

●<br />

<strong>Toolbox</strong> is a computational <strong>to</strong>ol developed by the<br />

SIL International (formerly known as the Summer<br />

Institute of <strong>Linguistics</strong>)<br />

<strong>Toolbox</strong> is designed for field work purposes<br />

<strong>Toolbox</strong> is a database application that<br />

interlinearizes, analyses and s<strong>to</strong>res text and can<br />

convert this in<strong>to</strong> an alphabetically ordered<br />

dictionary


Interlinearize<br />

●<br />

“Interlinear Text Language data in tiered<br />

format incorporating 'text, phrase, word and<br />

morpheme levels.'<br />

(Bird, Bow, Hughes, 2003) “ 1<br />

1 From: http://emeld.org/school/glossary.html#interlinear 11/12/05<br />

20:45:34


His<strong>to</strong>ry of <strong>Toolbox</strong><br />

●<br />

Original Name: Shoebox


His<strong>to</strong>ry of <strong>Toolbox</strong><br />

● Original Name: Shoebox<br />

● Why was it called Shoebox (up <strong>to</strong> version 5.0)?<br />

– “The name "Shoebox" is a nostalgic<br />

reference <strong>to</strong> the pre­computer times<br />

when linguists used <strong>to</strong> s<strong>to</strong>re cards with<br />

language examples in old shoeboxes.” 2<br />

2 From: Nevskaya, Irina: Lecture Tu<strong>to</strong>rial: An <strong>Introduction</strong> <strong>to</strong> <strong>Toolbox</strong> and<br />

Shoebox (Abstract), on:<br />

http://titus.fkidg1.uni-frankfurt.de/curric/dobes/sslectut.htm 11/12/05 17:47:58


What can <strong>Toolbox</strong> do?<br />

●<br />

“Shoebox is not a database system, nor an<br />

information retrieval system, nor a text<br />

corpus database in the strict sense. But it<br />

combines all these aspects in a particular<br />

way, which eases the work of the linguist<br />

who does descriptive and analytical work.”<br />

From: Hirzel, Hannes: How <strong>to</strong> Optimize analysing an African language text<br />

corpus by exploiting old and new features of the Shoebox 5.0 interlinearization<br />

program, on: http://www.unizh.ch/spw/<strong>to</strong>ols/shoebox/LeidenCALL2001HH.pdf<br />

11/12/05 18:05:30


SIL International lists it's <strong>Toolbox</strong><br />

under the following headwords:<br />

●<br />

●<br />

●<br />

●<br />

●<br />

concordance<br />

database<br />

dictionary<br />

field notes<br />

interlinear text analysis


SIL International lists it's <strong>Toolbox</strong><br />

under the following headwords:<br />

●<br />

●<br />

●<br />

●<br />

●<br />

concordance<br />

database<br />

dictionary<br />

field notes<br />

interlinear text analysis<br />

You might want <strong>to</strong> google these<br />

terms and later include these in the<br />

portfolio


What SIL International writes about<br />

<strong>Toolbox</strong>:


Basic functions<br />

●<br />

●<br />

●<br />

●<br />

Viewing and Searching<br />

Browsing<br />

Editing<br />

Sorting


Viewing and searching


Viewing and searching<br />

Click on these arrows <strong>to</strong> view the<br />

next/previous dictionary entry<br />

(“record”)


Viewing and searching


Viewing and searching<br />

Enter a word and press 'OK'


Browsing<br />

Press Alt-R


Editing


Editing


Editing


Editing


Editing


Sorting


Sorting


Getting started<br />

●<br />

●<br />

●<br />

●<br />

First download <strong>Toolbox</strong> 1.4 from<br />

http://www.sil.org/computing/catalog/index.asp#software<br />

Then download <strong>Toolbox</strong> Training from<br />

http://www.sil.org/computing/<strong>to</strong>olbox/downloads.htm<br />

Open Generic Start Up Kit<br />

Then follow instructions as following


Before starting...<br />

●<br />

●<br />

●<br />

●<br />

Copy the file 'Dictionary&Text.prj' in<strong>to</strong> your<br />

working direc<strong>to</strong>ry<br />

For this purpose click on the file with the right<br />

mouse but<strong>to</strong>n (don't use drag & drop) and choose<br />

'copy' or simply click on the file and press ctrl-c<br />

Go <strong>to</strong> your working direc<strong>to</strong>ry and click right<br />

mouse but<strong>to</strong>n and choose 'insert' or press ctrl-v<br />

Rename the file


The Startup Kit:


Starting<br />

●<br />

●<br />

●<br />

Select database and choose properties<br />

Either set <strong>to</strong> English or vernacular<br />

After that delete all the fields one does not need<br />

for the first steps:<br />

– Free translation<br />

– Notes<br />

– Morphemes<br />

– Gloss<br />

– Part of Speech


Entering text<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Choose text as a source (example 'if')<br />

Mark text in source document either using the<br />

mouse or cursor<br />

Press ctrl-c<br />

Go <strong>to</strong> text row (\tx) and paste (ctrl-v) the text in<strong>to</strong><br />

the right column<br />

Enter the title of the text in<strong>to</strong> text identification<br />

Enter an abbreviation of the title in<strong>to</strong> reference<br />

line(the references can later be au<strong>to</strong>matically<br />

numbered)


Entering text<br />

1)Use ctrl-c and ctrl-v <strong>to</strong> copy and paste marked text<br />

passages in<strong>to</strong> <strong>Toolbox</strong><br />

2)When the text is added one should press enter so<br />

that \ft (free translation) reappears


Entering text<br />

After entering the text press Alt-I (\mb, \ge & \ps will<br />

reappear.


Enter text in<strong>to</strong> <strong>Toolbox</strong> line by line


Interlinearize the text<br />

Press Alt-I


Interlinearize the text<br />

Press Alt-I<br />

Morpheme<br />

is not recognized!


Morphemes


Morphemes<br />

●<br />

One can just adjust morphemes by putting a space<br />

in between in the text line and adding it <strong>to</strong> the<br />

dictionary<br />

– Later <strong>Toolbox</strong> will remember the morpheme and parse<br />

it correctly<br />

– It might not work first time though


Morphemes<br />

●<br />

●<br />

One can just adjust morphemes by putting a space<br />

in between in the text line and adding it <strong>to</strong> the<br />

dictionary<br />

– Later <strong>Toolbox</strong> will remember the morpheme and parse<br />

it correctly<br />

– It might not work first time though<br />

One can just wait until <strong>Toolbox</strong> “learns” a<br />

morpheme<br />

– if it parses talk and later talking, it will remember these<br />

and later au<strong>to</strong>matically find the -ing (e.g.) ending<br />

– For this also dictionary entries need <strong>to</strong> be made


Making an entry<br />

●<br />

●<br />

●<br />

●<br />

Mark a word in the line \mb<br />

Click on this word using the right mouse but<strong>to</strong>n<br />

Click 'Insert'<br />

Enter the lexical properties in<strong>to</strong> the dictionary<br />

field at the bot<strong>to</strong>m of the screen


Making an entry


Making an entry


Entering new text<br />

Move the cursor at the end of the free translation line and<br />

press enter.


Entering new text<br />

The new reference can be assigned a number by selecting '<strong>to</strong>ols' from<br />

the menu and selecting 'renumber text'.


The result should look like this:


Wordlist, concordance and<br />

dictionary<br />

●<br />

●<br />

●<br />

●<br />

The dictionary is entered manually as shown<br />

A wordlist can be produced using the menu '<strong>to</strong>ols'<br />

and in this menu 'wordlist' (or by pressing alt-l)<br />

A concordance can be produced can be produced<br />

by using the menu '<strong>to</strong>ols' and in this menu<br />

'concordance' (or by pressing ctrl-l)<br />

A new text window can be added in<strong>to</strong> a new text<br />

file by choosing the menu 'database' and where<br />

one will find 'new record'


Wordlist


Concordance


Exporting<br />

●<br />

It is possible <strong>to</strong> export the <strong>Toolbox</strong> data in<strong>to</strong> a<br />

word processor file


Exporting<br />

●<br />

It is possible <strong>to</strong> export the <strong>Toolbox</strong> data in<strong>to</strong> a<br />

word processor file<br />

NB: Every active<br />

window can be<br />

exported!


The Dictionary in RTF<br />

To get 'headers' for each letter use modify and tick the<br />

appropriate box before exporting.


Adding new data categories<br />

To add a new data category (e.g. the pronunciation)<br />

click on the left column in the text window and<br />

press ctrl-e.<br />

Note on dictionary making:<br />

What is seen in the left column of the text<br />

window is called data categories (or<br />

datcats), which are called fields in <strong>Toolbox</strong>,<br />

what can be seen on the right side is<br />

(language) data or records as <strong>Toolbox</strong><br />

refers <strong>to</strong> them.<br />

The fields represent the microstructure of a<br />

dictionary.


Saving<br />

Do not forget <strong>to</strong> save the results. For this one has <strong>to</strong><br />

'file' under menu and click on 'SAVE ALL'!


References/Further Reading<br />

For further reading use the reference/basis of this<br />

presentation which is the <strong>Toolbox</strong> Manual/Guide<br />

which is part of <strong>Toolbox</strong> Training which can be<br />

downloaded on the SIL International website (also<br />

from the '<strong>Toolbox</strong> Home').<br />

http://www.sil.org/computing/<strong>to</strong>olbox/downloads.htm<br />

&<br />

<strong>Toolbox</strong> website:<br />

http://www.sil.org/computing/catalog/show_software.asp<br />

15/12/05 01:11:17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!