19.11.2012 Views

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The ‘externality’ of DA arises from its will<strong>in</strong>gness to<br />

identify specific features of talk (classically, particular<br />

speech acts) and apply these as what CA would see as a<br />

priori categories. The advantage of such an approach is<br />

that it allows the sort of cod<strong>in</strong>g that makes extensive data<br />

sets accessible to the analyst; the disadvantage is that it<br />

does so at the expense of fail<strong>in</strong>g to capture aspects of the<br />

construction of the talk, with the result that it may all too<br />

easily miss what is actually gett<strong>in</strong>g done through the talk.<br />

It is this focus on action that characterises CA and expla<strong>in</strong>s<br />

its <strong>in</strong>sistence on the importance of the sequential unfold<strong>in</strong>g<br />

of <strong>in</strong>teraction. Schegloff (1991, p. 46) captures this essential<br />

relationship well:<br />

. . . the target of its [CA’s] <strong>in</strong>quiries stands where<br />

talk amounts to action, where action projects consequences<br />

<strong>in</strong> a structure and texture of <strong>in</strong>teraction<br />

which the talk is itself progressively embody<strong>in</strong>g<br />

and realiz<strong>in</strong>g, and where the particulars of the<br />

talk <strong>in</strong><strong>for</strong>m what actions are be<strong>in</strong>g done and what<br />

sort of social scene <strong>in</strong> be<strong>in</strong>g constituted.<br />

The advantages of this approach is that it enables the<br />

analyst to understand what is be<strong>in</strong>g achieved through the<br />

talk <strong>in</strong> a way that is not open to the discourse analyst<br />

us<strong>in</strong>g cod<strong>in</strong>g to analyse large data sets; the disadvantage is<br />

the demands it places on the analyst <strong>in</strong> terms of time and<br />

resources. The transcription system itself demands close<br />

attention to m<strong>in</strong>utiae of delivery and the process of collect<strong>in</strong>g<br />

data is often a slow and pa<strong>in</strong>stak<strong>in</strong>g process, which<br />

might be described <strong>in</strong> terms of ‘track<strong>in</strong>g the biography of<br />

the phenomenon’s emergence’ (Jefferson, 1983, p. 4) or<br />

‘hav<strong>in</strong>g accumulated a batch of fragments’ (ibid p. 16) .<br />

Our basis <strong>for</strong> br<strong>in</strong>g<strong>in</strong>g these apparently <strong>in</strong>commensurate<br />

approaches together lies <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g a way of apply<strong>in</strong>g CA<br />

<strong>in</strong> order to identify an ‘action’, then us<strong>in</strong>g DA <strong>in</strong> the <strong>for</strong>m<br />

of a cod<strong>in</strong>g system based on pragmatic features to identify<br />

patterns across stretches of talk that identify this action,<br />

then apply<strong>in</strong>g CA to the <strong>in</strong>stances thus identified <strong>in</strong> order to<br />

check the accuracy of the specified pattern. While no CA<br />

practitioner would accept this as a legitimate analysis <strong>in</strong> itself,<br />

s<strong>in</strong>ce it will always be possible that other th<strong>in</strong>gs are<br />

be<strong>in</strong>g accomplished through the talk, it does allow specific<br />

actions to be identified and thereby makes it possible to develop<br />

<strong>in</strong>creas<strong>in</strong>gly rich pictures of how particular actions<br />

are distributed through the talk. What follows focuses on<br />

the tools that can be used to maximise the benefits derivable<br />

from an action -based analysis.<br />

3. Data<br />

The data used <strong>for</strong> this project are drawn from audio<br />

recorded <strong>in</strong>terdiscipl<strong>in</strong>ary scientific research project meet<strong>in</strong>gs<br />

rang<strong>in</strong>g from large collaborative funded projects with<br />

at least 6 participants <strong>in</strong> each meet<strong>in</strong>g to <strong>in</strong>terdiscipl<strong>in</strong>ary<br />

PhD supervision meet<strong>in</strong>gs consist<strong>in</strong>g of two supervisors<br />

and a student. The discipl<strong>in</strong>es represented <strong>in</strong> these meet<strong>in</strong>gs<br />

consist of mathematics, statistics, biology and bio<strong>in</strong><strong>for</strong>matics.<br />

The data have been collected s<strong>in</strong>ce March 2011,<br />

38<br />

produc<strong>in</strong>g about 120 hours of audio record<strong>in</strong>gs to date as<br />

part of a collection that will cont<strong>in</strong>ue to grow as we follow<br />

a number of research projects to completion. What we are<br />

present<strong>in</strong>g here is based on only small part of the data that<br />

have been transcribed (amount<strong>in</strong>g to 20 hours to date).<br />

It is also necessary to emphasise that this represents<br />

an early stage <strong>in</strong> the project. Development is currently<br />

focused on an action <strong>in</strong> which the speaker <strong>in</strong>troduces a<br />

question and then follows this with a series of turns lead<strong>in</strong>g<br />

to a suggestion. What makes this a particularly attractive<br />

start<strong>in</strong>g po<strong>in</strong>t <strong>for</strong> our analysis is that the sequence is marked<br />

by turns with so <strong>in</strong> the turn-<strong>in</strong>itial position (‘so-clusters’),<br />

which are easily identifiable <strong>in</strong> the data. Extract 1 provides<br />

an example this action (all names are pseudonyms):<br />

Extract 1<br />

01 ALF was it a strict criterion <strong>for</strong> it<br />

02 ROY or no not very strict<br />

03 ALF so you don’t th<strong>in</strong>k it’s normal<br />

04 it’s (xxx) fourteen that’s<br />

05 really the<br />

06 ROY I probably could f<strong>in</strong>d more yes<br />

07 I mean I didn’t use any strict<br />

08 criteria just I applied some<br />

09 (xxx) two or three (xxx) also<br />

10 my eye each if I believe.<br />

11 ALF yeah so it should be about the<br />

12 number.<br />

13 GARY yeah<br />

14 ROY depends you know if I make it<br />

15 less strict (xxx) fifty (xxx).<br />

16 ALF so if say eighty percent of them<br />

17 are thought to be affected by<br />

18 the wash if we did the whole<br />

19 mock wash micro array data it<br />

20 would allow us to identify<br />

21 twenty genes that are affected<br />

22 by pulse so we don’t know<br />

23 whether that’s relevant it’s<br />

23 worth its worth f<strong>in</strong>d<strong>in</strong>g out (4.0)<br />

In terms of <strong>in</strong>terdiscipl<strong>in</strong>ary talk, once the analysis is<br />

complete it will be <strong>in</strong>terest<strong>in</strong>g to see how this action<br />

is distributed <strong>in</strong> the data. If, <strong>for</strong> example, quantitative<br />

analysis reveals that such exchanges are <strong>in</strong>ter-discipl<strong>in</strong>ary<br />

(as opposed to <strong>in</strong>tra-discipl<strong>in</strong>ary), this would provide prima<br />

facie evidence of genu<strong>in</strong>e <strong>in</strong>terdiscipl<strong>in</strong>ary exchanges. It<br />

would also enable us to collect examples of this across<br />

different data sets <strong>in</strong> order to understand more about<br />

how such sequences work towards the build<strong>in</strong>g of shared<br />

understand<strong>in</strong>g and action.<br />

At this stage we are work<strong>in</strong>g with basic transcriptions<br />

of the sort illustrated above and limit<strong>in</strong>g more delicate<br />

transcription to examples of the relevant action, though<br />

the differences between these can be considerable, as a<br />

comparison of Extract 2 with its ‘equivalent’ <strong>in</strong> l<strong>in</strong>es 05<br />

and 06 <strong>in</strong> Extract 1 demonstrates:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!