25.08.2013 Views

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

PDF (Online Text) - EURAC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• Researchers should be able to address linguistic problems with linguistic<br />

solutions.<br />

Often, a linguistic problem, or a problem initially described in language terms<br />

(e.g. retranscribing the data using a different phoneset) has to be redescribed in<br />

programming terms before it can be addressed. Problems should be addressable in the<br />

terms in which they occur.<br />

• The toolkit should be increasingly simple to maintain and develop.<br />

Over its lifetime, any toolkit increases in functionality: new problems occur and<br />

new tasks become possible. If extensions are increasingly difficult to implement, the<br />

toolkit eventually disintegrates (e.g. into a library of loosely-related scripts), becomes<br />

impossible to maintain, and falls into disuse. A well-designed toolkit can avoid this<br />

fate.<br />

• The toolkit should encourage its own use and development.<br />

It should be preferable to use the toolkit than to revert to the bad old ways.<br />

Nevertheless, further use of the toolkit should stimulate researchers to confront it<br />

with new problems, and to think of new areas in which the toolkit might be used.<br />

If possible, the toolkit should be extensible by the researchers themselves, rather<br />

than having to rely on a separate maintainer. In this case, the design of the toolkit<br />

should promote the writing of readable, reusable code.<br />

3. A Solution<br />

3.1 Introduction<br />

Our first (and current) attempt at a software artefact that meets these requirements<br />

is the SpeechCluster package (Uemlianin 2005a). SpeechCluster is a collection of small<br />

programs written in a programming language called Python. Python has a very clear,<br />

readable syntax, and is especially suited for projects with several programmers, or<br />

with novice programmers. As such, it suited our aim of encouraging non-programmers<br />

to write their own tools.<br />

The SpeechCluster package consists of a main SpeechCluster module with the basic<br />

API, and a number of modules that can be used as command line tools. The tools are<br />

intended to be used as such, but they can also be used as ‘examples’, or as a basis for<br />

customisation or further programming with SpeechCluster.<br />

Table 1 shows a list of the tools currently available as part of SpeechCluster: Below,<br />

we look at two of these in more detail before exploring the use of SpeechCluster as an<br />

API. Finally, we look at SpeechCluster in a larger system.<br />

174

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!