Introduction to a Field Linguist's Toolbox - Computational Linguistics ...
Introduction to a Field Linguist's Toolbox - Computational Linguistics ...
Introduction to a Field Linguist's Toolbox - Computational Linguistics ...
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
<strong>Introduction</strong> <strong>to</strong> a <strong>Field</strong> <strong>Linguist's</strong><br />
<strong>Toolbox</strong><br />
HTMD: From Corpus <strong>to</strong> Lexicon (II)<br />
Sascha Griffiths<br />
Bielefeld, 2005<br />
Bielefeld University<br />
How <strong>to</strong> Make a Dictionary/<br />
Dokumentation Bedrohter<br />
Sprachen<br />
Prof. Dr. Dafydd Gibbon<br />
Winter Term 05/06
What is <strong>Toolbox</strong>?<br />
●<br />
<strong>Toolbox</strong> is a computational <strong>to</strong>ol developed by the<br />
SIL International (formerly known as the Summer<br />
Institute of <strong>Linguistics</strong>)
What is <strong>Toolbox</strong>?<br />
●<br />
●<br />
<strong>Toolbox</strong> is a computational <strong>to</strong>ol developed by the<br />
SIL International (formerly known as the Summer<br />
Institute of <strong>Linguistics</strong>)<br />
<strong>Toolbox</strong> is designed for field work purposes
What is <strong>Toolbox</strong>?<br />
●<br />
●<br />
●<br />
<strong>Toolbox</strong> is a computational <strong>to</strong>ol developed by the<br />
SIL International (formerly known as the Summer<br />
Institute of <strong>Linguistics</strong>)<br />
<strong>Toolbox</strong> is designed for field work purposes<br />
<strong>Toolbox</strong> is a database application that<br />
interlinearizes, analyses and s<strong>to</strong>res text and can<br />
convert this in<strong>to</strong> an alphabetically ordered<br />
dictionary
Interlinearize<br />
●<br />
“Interlinear Text Language data in tiered<br />
format incorporating 'text, phrase, word and<br />
morpheme levels.'<br />
(Bird, Bow, Hughes, 2003) “ 1<br />
1 From: http://emeld.org/school/glossary.html#interlinear 11/12/05<br />
20:45:34
His<strong>to</strong>ry of <strong>Toolbox</strong><br />
●<br />
Original Name: Shoebox
His<strong>to</strong>ry of <strong>Toolbox</strong><br />
● Original Name: Shoebox<br />
● Why was it called Shoebox (up <strong>to</strong> version 5.0)?<br />
– “The name "Shoebox" is a nostalgic<br />
reference <strong>to</strong> the precomputer times<br />
when linguists used <strong>to</strong> s<strong>to</strong>re cards with<br />
language examples in old shoeboxes.” 2<br />
2 From: Nevskaya, Irina: Lecture Tu<strong>to</strong>rial: An <strong>Introduction</strong> <strong>to</strong> <strong>Toolbox</strong> and<br />
Shoebox (Abstract), on:<br />
http://titus.fkidg1.uni-frankfurt.de/curric/dobes/sslectut.htm 11/12/05 17:47:58
What can <strong>Toolbox</strong> do?<br />
●<br />
“Shoebox is not a database system, nor an<br />
information retrieval system, nor a text<br />
corpus database in the strict sense. But it<br />
combines all these aspects in a particular<br />
way, which eases the work of the linguist<br />
who does descriptive and analytical work.”<br />
From: Hirzel, Hannes: How <strong>to</strong> Optimize analysing an African language text<br />
corpus by exploiting old and new features of the Shoebox 5.0 interlinearization<br />
program, on: http://www.unizh.ch/spw/<strong>to</strong>ols/shoebox/LeidenCALL2001HH.pdf<br />
11/12/05 18:05:30
SIL International lists it's <strong>Toolbox</strong><br />
under the following headwords:<br />
●<br />
●<br />
●<br />
●<br />
●<br />
concordance<br />
database<br />
dictionary<br />
field notes<br />
interlinear text analysis
SIL International lists it's <strong>Toolbox</strong><br />
under the following headwords:<br />
●<br />
●<br />
●<br />
●<br />
●<br />
concordance<br />
database<br />
dictionary<br />
field notes<br />
interlinear text analysis<br />
You might want <strong>to</strong> google these<br />
terms and later include these in the<br />
portfolio
What SIL International writes about<br />
<strong>Toolbox</strong>:
Basic functions<br />
●<br />
●<br />
●<br />
●<br />
Viewing and Searching<br />
Browsing<br />
Editing<br />
Sorting
Viewing and searching
Viewing and searching<br />
Click on these arrows <strong>to</strong> view the<br />
next/previous dictionary entry<br />
(“record”)
Viewing and searching
Viewing and searching<br />
Enter a word and press 'OK'
Browsing<br />
Press Alt-R
Editing
Editing
Editing
Editing
Editing
Sorting
Sorting
Getting started<br />
●<br />
●<br />
●<br />
●<br />
First download <strong>Toolbox</strong> 1.4 from<br />
http://www.sil.org/computing/catalog/index.asp#software<br />
Then download <strong>Toolbox</strong> Training from<br />
http://www.sil.org/computing/<strong>to</strong>olbox/downloads.htm<br />
Open Generic Start Up Kit<br />
Then follow instructions as following
Before starting...<br />
●<br />
●<br />
●<br />
●<br />
Copy the file 'Dictionary&Text.prj' in<strong>to</strong> your<br />
working direc<strong>to</strong>ry<br />
For this purpose click on the file with the right<br />
mouse but<strong>to</strong>n (don't use drag & drop) and choose<br />
'copy' or simply click on the file and press ctrl-c<br />
Go <strong>to</strong> your working direc<strong>to</strong>ry and click right<br />
mouse but<strong>to</strong>n and choose 'insert' or press ctrl-v<br />
Rename the file
The Startup Kit:
Starting<br />
●<br />
●<br />
●<br />
Select database and choose properties<br />
Either set <strong>to</strong> English or vernacular<br />
After that delete all the fields one does not need<br />
for the first steps:<br />
– Free translation<br />
– Notes<br />
– Morphemes<br />
– Gloss<br />
– Part of Speech
Entering text<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
Choose text as a source (example 'if')<br />
Mark text in source document either using the<br />
mouse or cursor<br />
Press ctrl-c<br />
Go <strong>to</strong> text row (\tx) and paste (ctrl-v) the text in<strong>to</strong><br />
the right column<br />
Enter the title of the text in<strong>to</strong> text identification<br />
Enter an abbreviation of the title in<strong>to</strong> reference<br />
line(the references can later be au<strong>to</strong>matically<br />
numbered)
Entering text<br />
1)Use ctrl-c and ctrl-v <strong>to</strong> copy and paste marked text<br />
passages in<strong>to</strong> <strong>Toolbox</strong><br />
2)When the text is added one should press enter so<br />
that \ft (free translation) reappears
Entering text<br />
After entering the text press Alt-I (\mb, \ge & \ps will<br />
reappear.
Enter text in<strong>to</strong> <strong>Toolbox</strong> line by line
Interlinearize the text<br />
Press Alt-I
Interlinearize the text<br />
Press Alt-I<br />
Morpheme<br />
is not recognized!
Morphemes
Morphemes<br />
●<br />
One can just adjust morphemes by putting a space<br />
in between in the text line and adding it <strong>to</strong> the<br />
dictionary<br />
– Later <strong>Toolbox</strong> will remember the morpheme and parse<br />
it correctly<br />
– It might not work first time though
Morphemes<br />
●<br />
●<br />
One can just adjust morphemes by putting a space<br />
in between in the text line and adding it <strong>to</strong> the<br />
dictionary<br />
– Later <strong>Toolbox</strong> will remember the morpheme and parse<br />
it correctly<br />
– It might not work first time though<br />
One can just wait until <strong>Toolbox</strong> “learns” a<br />
morpheme<br />
– if it parses talk and later talking, it will remember these<br />
and later au<strong>to</strong>matically find the -ing (e.g.) ending<br />
– For this also dictionary entries need <strong>to</strong> be made
Making an entry<br />
●<br />
●<br />
●<br />
●<br />
Mark a word in the line \mb<br />
Click on this word using the right mouse but<strong>to</strong>n<br />
Click 'Insert'<br />
Enter the lexical properties in<strong>to</strong> the dictionary<br />
field at the bot<strong>to</strong>m of the screen
Making an entry
Making an entry
Entering new text<br />
Move the cursor at the end of the free translation line and<br />
press enter.
Entering new text<br />
The new reference can be assigned a number by selecting '<strong>to</strong>ols' from<br />
the menu and selecting 'renumber text'.
The result should look like this:
Wordlist, concordance and<br />
dictionary<br />
●<br />
●<br />
●<br />
●<br />
The dictionary is entered manually as shown<br />
A wordlist can be produced using the menu '<strong>to</strong>ols'<br />
and in this menu 'wordlist' (or by pressing alt-l)<br />
A concordance can be produced can be produced<br />
by using the menu '<strong>to</strong>ols' and in this menu<br />
'concordance' (or by pressing ctrl-l)<br />
A new text window can be added in<strong>to</strong> a new text<br />
file by choosing the menu 'database' and where<br />
one will find 'new record'
Wordlist
Concordance
Exporting<br />
●<br />
It is possible <strong>to</strong> export the <strong>Toolbox</strong> data in<strong>to</strong> a<br />
word processor file
Exporting<br />
●<br />
It is possible <strong>to</strong> export the <strong>Toolbox</strong> data in<strong>to</strong> a<br />
word processor file<br />
NB: Every active<br />
window can be<br />
exported!
The Dictionary in RTF<br />
To get 'headers' for each letter use modify and tick the<br />
appropriate box before exporting.
Adding new data categories<br />
To add a new data category (e.g. the pronunciation)<br />
click on the left column in the text window and<br />
press ctrl-e.<br />
Note on dictionary making:<br />
What is seen in the left column of the text<br />
window is called data categories (or<br />
datcats), which are called fields in <strong>Toolbox</strong>,<br />
what can be seen on the right side is<br />
(language) data or records as <strong>Toolbox</strong><br />
refers <strong>to</strong> them.<br />
The fields represent the microstructure of a<br />
dictionary.
Saving<br />
Do not forget <strong>to</strong> save the results. For this one has <strong>to</strong><br />
'file' under menu and click on 'SAVE ALL'!
References/Further Reading<br />
For further reading use the reference/basis of this<br />
presentation which is the <strong>Toolbox</strong> Manual/Guide<br />
which is part of <strong>Toolbox</strong> Training which can be<br />
downloaded on the SIL International website (also<br />
from the '<strong>Toolbox</strong> Home').<br />
http://www.sil.org/computing/<strong>to</strong>olbox/downloads.htm<br />
&<br />
<strong>Toolbox</strong> website:<br />
http://www.sil.org/computing/catalog/show_software.asp<br />
15/12/05 01:11:17