Audio Description a Visual Assistive Discourse - Communication ...

AUDIO DESCRIPTION, A VISUAL ASSISTIVE DISCOURSE: 

An Investigation into Language Used to 

Provide the Visually Disabled 

Access to Information in Electronic Texts 

A Thesis 

Submitted to the Faculty of the 

Graduate School of Arts and Sciences 

of Georgetown University 

in partial fulfillment of the requirements for the 

degree of 

Master of Arts 

in Communication, Culture and Technology 

By 

Philip J Piety, B.S. 

Washington, DC 

February 24, 2003

Copyright 2003 By Philip Piety 

All Rights Reserved 

ii

ABSTRACT 

Visually impaired and blind individuals face challenges in accessing many types of 

texts including television, films, textbooks, software, and the Internet because of the rich 

visual nature of these media. In order to provide these individuals with access to this visual 

information, special assistive technology allows descriptive language to be inserted into the 

text to represent the visual content. This study investigates this descriptive language. It 

looks at it as a system of human communication and investigates the process of creating 

descriptions that includes an intermediary called a describer and the modifications they make 

to a text in order to render it accessible and usable. There are different practices of creating 

these descriptive insertions and many terms refer to them including Audio Description, 

described video, and textual and verbal equivalents. This study considers these practices as 

variants of a type of communication called Visual Assistive Discourse that has specific and 

definable properties. This study is the first academic investigation into the language process 

since the practice was conceptually described as a technique for television by Frazier in 1975. 

It addresses broad questions about this unique communication form. Who does it? Why it 

is unique? What does it look like? And, how can it be analyzed? 

The approach taken is structured as a study within a study. The outer study looks at 

the theoretical issues of using language as a visual prosthetic and shows it having properties 

in common with both prototypical spoken and written discourse as well as communication 

like sign language interpretation that relies upon an intermediary. The inner study uses a 

corpus of more than 23,000 words of Audio Description drawn from four movies described 

by three organizations proficient in the practice of describing film. Analysis of this data 

i

shows it to be a language system with distinctive constituent and discursive structures. This 

study shows that the fundamental nature of a unit of an inserted description is not an 

isolated representation of isolated visual information, but rather, a semantic unit that is 

situated in several definable ways within a multimodal text. 

ii

ACKNOWLEDGMENTS 

Producing this work has allowed me to learn about language, research, and some of 

the challenges that accrue to the study of nontraditional subjects. Many people both inside 

and outside the University have contributed to this effort, and I am forever grateful to them. 

Within Georgetown University, I must first mention and thank my thesis advisor, 

Professor Randy Bass, whose help was invaluable in connecting my research interests to the 

completion of this process. Next, I am thankful for knowing and benefiting from the advice 

of Professor Ron Scollon; my second reader and an informal advisor almost since the time I 

considered this as a potential topic. Ron has provided constant encouragement and support 

and, as the process was nearing completion, I found myself recollecting many things he said 

about the choices that scholars make that directly related to my work. I am also grateful to 

Professors Jay Lemke and John Castellani of the University of Michigan and Johns Hopkins 

University Center for Technology in Education. Professor Castellani was the teacher who 

exposed me to this area as an educational challenge and Professor Lemke provided 

extremely valuable feedback on an early draft of this document. 

Also within Georgetown, Professors Shukla, Tannen, Tinkcom, and Tyler were 

gracious in spending time discussing the challenges of this topic and provided me with 

encouragement and kind thoughts. I also appreciate the many email exchanges with Daniel 

Loehr and Kristin Mulrooney and the early discussions with Elisa Everts: all doctoral 

students at Georgetown. I also thank Professors Hamilton and Schiffrin whose classes at 

Georgetown were memorable, enriching, and relevant to my work. And, of course, Dr. 

iii

Suzanne Wong Scollon who gave me the nudge in this direction and then advised me 

(correctly) of some of the challenges I would encounter in studying this as a language topic. 

Outside of academia, the first person to thank is someone who everyone involved 

with the field of Audio Description owes much to. Nobody, with the exception of her 

husband Cody Pfanstiehl, has made a larger contribution to the method and the advocacy of 

this emerging field than Dr. Margaret Pfanstiehl. The Pfanstiehls discussed their work many 

times, provided documents that relate to their methods, and referred me to others whose 

input was essential. My path was made smoother by being able to say, “the Pfanstiehls 

suggested I call.” Among those who responded were Joel Snyder, Director of Described 

Media with the National Captioning Institute, himself a major contributor to the field of 

Audio Description, and Larry Goldberg, Director of Media Access at the WGBH 

Educational Foundation who was helpful as was his staff, including Theresa Maggiore and 

Bryan Gould. I also thank Jim Stovall, founder of the Narrative Television Network who 

discussed in vivid detail how a person uses Audio Description. 

Within the advocacy community, I owe much to the American Foundation for the 

Blind and the many supportive phone conversations with Dr. Elaine Gerber; and the 

support of Drs. Jaclyn Packer, and Corrine Kirchner. And, I also thank Curtis Chong from 

the National Federation for the Blind who was the first person in the visually impaired 

advocacy community that spoke with me about my research and Melanie Brunson of the 

American Council of the Blind. 

At the Recordings for the Blind and Dyslexic, I am grateful to Judy Vollmer and the 

staff in the Boston studio; and Chris Smith in Washington DC who spent many hours with 

me and provided me with equipment, tapes, and insights into their process. 

iv

From the Center for Applied Special Technology, I appreciate the thoughts, time, 

and encouragement of Dr. Robert Dolan. And, from the United States Access Board, both 

David Baquis and Doug Wakefield were a tremendous help and spent a good deal of time as 

did Dr. Judith Dixon from the Library of Congress. I also thank John Weber, a good 

personal friend, National Public Radio producer, and volunteer audio describer who is one 

the people who sparked my interest in this area. Bob Regan, Product Manager for 

Macromedia Corporation and student with the Trace Center at the University of Wisconsin 

at Madison was also helpful as I began this investigation. 

My largest thanks must go to the person I dedicate this work to. My wife Sarah has 

encouraged me to follow my intellectual pursuits for their value alone. From the beginning 

of this process, she has served as a constant reminder in times of doubt that my work is 

worthwhile. 

v

TABLE OF CONTENTS 

1. INTRODUCTION......................................................................................................1 

1.1 Overview............................................................................................................. 1 

1.2 Research Goals & Questions............................................................................... 3 

1.3 Anticipated Benefits and Limitations of this Research....................................... 5 

1.4 Organization of the Sections............................................................................... 7 

1.5 Note for Visually Impaired Readers ................................................................... 7 

2. BACKGROUND INFORMATION...........................................................................8 

2.1 The Primary Consumers of Visual Description.................................................. 8 

2.2 Practices of Describing Images for Assistive Technology ............................... 10 

2.2.1 Audio Description..................................................................................... 10 

2.2.2 Audio Books ............................................................................................. 14 

2.2.3 Software and Interactive Media ................................................................ 15 

2.2.4 Multimedia and Internet Sites................................................................... 16 

2.3 The Goal of Accessible Media: Usability and Experiential Equivalence......... 18 

2.4 Previous Qualitative Evaluations of Description.............................................. 19 

2.5 The Practical Case for a Unified Model............................................................ 20 

2.5.1 Reasons to Consider as Separate Practices ............................................... 21 

2.5.2 Reasons to Consider as a Single Process .................................................. 21 

3. VISUAL ASSISTIVE DISCOURSE.......................................................................24 

3.1 Visual Assistive Discourse Defined.................................................................. 24 

3.1.1 The “Discourse” in Visual Assistive Discourse........................................ 24 

3.1.2 Common Properties of Different Descriptive Practices............................ 25 

3.1.3 Comparison of VAD and Other Communication Systems ....................... 26 

3.1.4 The Components of the VAD Communication System............................ 28 

3.2 Conceptual Issues with Words for Images........................................................ 29 

3.2.1 Sequential vs. Parallel............................................................................... 30 

3.2.2 Raw vs. Processed Information................................................................. 30 

3.2.3 Schema Theory......................................................................................... 31 

3.2.4 Translation-Interpretation-Transrepresentation........................................ 32 

3.3 Descriptions: Situated/Constrained in Multimodal Texts................................. 32 

3.3.1 Textually Situated Descriptions................................................................ 33 

3.3.2 Constraints: Detail vs. Interpretation........................................................ 35 

3.4 Discussion: The Role of the Describer ............................................................. 37 

3.4.1 Describing as a Way of Thinking ............................................................. 37 

3.4.2 Describer as Intermediary......................................................................... 39 

3.4.3 Description as a Group Process ................................................................ 39 

vi

3.5 Discussion: The Role of the Consumer ............................................................ 41 

3.5.1 The Consumers: Actively Building a Text Model.................................... 41 

3.5.2 The Purposes and Goals of Consumers .................................................... 43 

3.6 Cultural Issues with Description? ..................................................................... 45 

4. STUDY OF AUDIO DESCRIPTION......................................................................47 

4.1 The Study Corpus ............................................................................................. 47 

4.2 Methodology..................................................................................................... 48 

4.3 The Structural Components of Audio Description........................................... 49 

4.3.1 Insertions ................................................................................................... 50 

4.3.2 Utterances.................................................................................................. 53 

4.3.3 Representations ......................................................................................... 56 

4.3.4 Words........................................................................................................ 67 

5. USING THE DEFINITIONS FOR ANALYSIS......................................................69 

5.1 Descriptive Mass............................................................................................... 69 

5.2 The Textual Role of Insertion Content ............................................................. 72 

5.3 Sample Analysis: Persistent Entity Development............................................. 72 

5.4 Sample Analysis: The Scene as Frame ............................................................. 74 

5.5 Sample Analysis: Utterance Patterns ................................................................ 75 

5.6 Sample Analysis: Representational Combinations ........................................... 77 

5.7 Analysis Challenges: Time, Reality, and Cultural Elements............................ 78 

6. SUMMARY: A LANGUAGE SYSTEM ................................................................79 

7. FUTURE STEPS......................................................................................................83 

7.1 Consumers Study .............................................................................................. 83 

7.2 Supporting Further Developments of Audio Description................................. 84 

7.3 Descriptive & Comparative Studies of Other Forms of VAD.......................... 85 

7.4 Human Subjects Studies with AD..................................................................... 86 

7.5 Educational Materials Study............................................................................. 86 

7.6 Assistive Technological Research .................................................................... 87 

APPENDIX A: GLOSSARY.............................................................................................89 

APPENDIX B: TRANSCRIPTION & MULTIMODAL ISSUES...................................92 

Transcription Conventions ........................................................................................... 92 

Multimodal Issues........................................................................................................ 92 

Sub second timing ........................................................................................................ 93 

vii

APPENDIX C: VERBAL DESCRIPTIONS FOR FIGURES .........................................94 

REFERENCES ..................................................................................................................97 

NOTES.............................................................................................................................104 

viii

LIST OF FIGURES 

Figure 1 – Different practices of visual description in 2002................................................ 10 

Figure 2 - Conceptual view of Internet descriptions ........................................................... 16 

Figure 3 - Overview of VAD and other prototypical communication processes................. 27 

Figure 4 - Chafe's view of immediate mode........................................................................ 38 

Figure 5 – Conceptual view of a the description process.................................................... 40 

Figure 6 - Conceptual diagram of consumer's process........................................................ 42 

Figure 7 - Length of utterances in corpus........................................................................... 54 

Figure 8 - Chart version of table 4 data.............................................................................. 71 

LIST OF TRANSCRIPTS 

Transcript 1- From “The Gift of Acadia” 1:06................................................................... 52 

Transcript 2 - From "A Star is Born" 16:20........................................................................ 55 

Transcript 3 - From "Gladiator" 40:26............................................................................... 59 

Transcript 4- From "Gladiator" 1:42:30.............................................................................. 60 


Transcript 6 - From "La Story" 3:38................................................................................... 62 

Transcript 7 - From "LA Story" 19:00................................................................................ 62 


Transcript 9 - From "A Start is Born" 1:58......................................................................... 64 


Transcript 11- From "Gladiator" 1:15:29............................................................................ 65 

Transcript 12- From "Gladiator" 12:00............................................................................... 65 


Transcript 14 - From "Gladiator" 58:42.............................................................................. 66 


Transcript 16 - From "La Story" 90:38............................................................................... 67 

Transcript 17- From "La Story" 25:36................................................................................ 67 

ix

LIST OF TABLES 

Table 1 - Study corpus material.......................................................................................... 48 

Table 2 - Summary structural components of audio description......................................... 50 

Table 3 - Comparison of description mass in different four texts....................................... 70 

Table 4- Distribution of description mass by insertion length............................................. 71 

Table 5 - Referring terms for main character of "Gladiator"............................................... 73 

Table 6- Comparison of description styles.......................................................................... 76 

x

1.1 Overview 

1. INTRODUCTION 

The title of this work “Audio Description a Visual Assistive Discourse” may 

introduce one or two terms into the vocabulary of the readers. The practices that form the 

basis for this study are recent having only become established towards the end of the 

twentieth century. They are based on communication and digital technologies, and rely 

upon language for their success. 

Audio Description is a way to provide a visually impaired person access to visually 

rich productions including movies, television programming, plays, live events, and museum 

exhibitions. With Audio Description, a describer inserts spoken words to provide 

representations of information contained in the visual field of the production. The inserted 

description, when combined with existing audio content, including dialog from the original 

production, creates a new text that is more accessible than it would be without the addition 

of the description. This study looks at Audio Description as a language system and 

describes its features and structural components and shows how it can be analyzed in a 

manner similar to other discourse types. 

Visual Assistive Discourse is a term introduced in this study to describe the type of 

communication process that Audio Description is. Audio Description and practices like it 

rely upon special assistive technology and Visual Assistive Discourse is a linguistic view of the 

communication processes that operate over and through assistive technology, but is 

fundamentally a human communication process. 

1

Specific productions and practices of language use can be viewed as parts of broader 

classes: a telephone conversation can be viewed as a type of conversation, electronic mail can 

be viewed as a form of correspondence, and conversation that utilizes a sign language 

interpreter can be viewed by its reliance upon an communicative intermediary. Further, 

these classifications are not purely hierarchical: a conversation interpreted with sign language 

is also a conversation, while an interpreted unidirectional event such as a political speech is 

not. 

This study is descriptive. It is the first attempt to look at this area, not as a service or 

accessibility issue, but as language system. There is an asymmetry to this domain because 

language and vision operate differently and to view this system within a language framework 

may require a conceptual shift. That shift is from the perspective of the sighted describer 

who works with both vision and language to the perspective of the consumer of this service 

who often receives just the language. From that perspective, it is a language issue. While 

many human interactions from buying coffee to expressing amity or hostility can be 

accomplished with language in a secondary or non-existent role, in this area, language is the 

essential medium of exchange. The contents of this study could be expressed as theses, as 

propositions, within the framework of problem statements. At the core of these different 

ways to frame this study is the belief that Audio Description is a type of Visual Assistive Discourse 

that is a type of language system. This language system is used by people who have different 

perceptual abilities, but are members of the same speech communities. Understanding this 

language system can involve many different levels of analysis including the participant 

structure, its environmental constraints, the conditions under which it is practiced, and its 

external form from small parts such as words to larger discourse units. 

2

The sister practices that are also types of Visual Assistive Discourse, like Audio 

Description, are recent and in formative stages. So this study may provide benefits to these 

practices by illuminating the techniques and methodologies that Audio Description employs. 

Because this study is situated at an early stage in the lives of these practices that are steadily 

growing, and there is little academic work to build on, it takes a broad approach and tries to 

address from several important perspectives some fundamental questions about what this 

process is like. 

Readers of this document should see what the author has become convinced of: that 

these practices are vast both in the range of conceptual issues and their impacts on people’s 

lives. This is a technical study and it will not address many issues people who are 

experienced in visual description consider important such as the interpretive aspects of 

description, which content should be described, or describing under specific textual 

constraints. These are important areas to be sure, but the focus of this study is to help 

define the process characteristics and participants in a way that will support many types of 

ongoing qualitative discussions. 

1.2 Research Goals & Questions 

As a descriptive study, this research can potentially serve a range of purposes from 

qualitative analysis to comparative studies that require a formal definition of the language 

use. For the purposes of this document, this study’s intent can be formalized in two sets of 

research questions in two different types of studies. The first study looks at all of these 

communication practices as variations of a type of process that is distinctive from other 

3

forms of language use. This study defines the process called Visual Assistive Discourse 

(VAD) through the following questions: 

1. What properties are common to the practices of providing visual information 

through language in electronic texts? 

2. How is this communication process similar to and distinct from other forms 

of language use in terms of participants and development of a text? 

3. Other than providing a theoretical definition, what other reasons exist for 

viewing these practices as variations of a common process? 

The method employed in the conceptual study is descriptive and logical. It describes 

the properties and components that are broadly at work in Visual Assistive Discourse and 

then draws in related literature and concepts to assemble a conceptual view that reflects that 

which is common about the practices and the common implications of the structure. 

The second or inner study looks at the language produced in one of the visual 

description practices. The practice chosen for this study is Audio Description. While Audio 

Description (AD) will not show every possible linguistic form of this descriptive process, it 

is a practice area with a strong methodological history. This second study part has two main 

research questions: 

1. Is there a constituent structure that is different from other language uses? 

2. What types of information are provided within Audio Description and what 

patterns in representation exist? 

4

These two types of studies support each other to provide a more comprehensive 

view of this unique communication process and thereby support the proposition that Audio 

Description is an example of a type of language process called Visual Assistive Discourse. 

1.3 Anticipated Benefits and Limitations of this Research 

This research is intended as an initial investigation into a way to use language that has 

largely been unexplored, linguistically. An anticipated benefit of this research is the 

understanding of this process as a formal system with specific characteristics that can be 

observed, measured, and taught. At the time that this study is being written a number of 

discussions in the community of those providing and those using visual description are 

occurring that involve standards and guidelines (Levine, 2002). Historically, these practices 

are in their earliest and formative stages. And, as Audio Description and its sister language 

forms are like other language systems, they will evolve through negotiation of the 

participants. This research is then positioned at an important stage in this emerging language 

system and aims to support the development of it by a description of its process level 

characteristics that can support this ongoing negotiation. 

As with other areas that benefit a disabled community, much of the previously 

available resources to support the blind and visually impaired in accessing visual information 

have been focused on service, delivery, and regulatory areas rather than research. Some may 

see parallels to the struggles of the deaf community to have acceptance of sign language that 

for years was banned as ‘deviant.’ Some may find that compared to other fields, the amount 

of previous research in this area is small especially considering the conceptual issues involved 

5

and that this study is required to cover a broad area without the ability to focus on specific 

and important components. 

Will this study yield insights that help the larger population of language users? While 

making no claim that this area will have broad relevance, it is important to note the historical 

precedence for research into areas of communicative challenge, including the documentation 

of American Sign Language (Stokoe, 1965) and Vygotsky’s work with deaf-blind children 

(Hardy, 2000) that have provided insights into general properties of language and cognition. 

One of the dimensions of this research is that forms of information such as pictures, actions, 

and gestures that are not usually considered language or equivalent to language are being 

replaced with language. The fact that these non-linguistic forms do not have the same 

properties of speech or writing could be a basis to consider this research as not a ‘linguistic’ 

effort or alternatively may reinforce the need to define the non-linguistic visual information 

with the type of structure often used to study language and language use. 

Independent of the benefits for the study of language, there is a rich history of 

efforts originally intended to benefit a disabled group that resulted in benefits for the general 

population. Some examples include illuminated elevator floor lights, originally for the deaf; 

ramps and curb cuts originally for wheelchairs that are now used by baby strollers and 

bicyclists; and the secondary audio programming (SAP) channel now used for a multilingual 

support in television programming. Perhaps because Audio Description and its sister 

practices are so new, there are benefits to those challenged by language or learning a new 

language or other groups as yet unknown. 

6

1.4 Organization of the Sections 

This study is organized from the outside in and starts with current practices and 

completes by describing some paths for further study. Section two contains background 

information that discusses the blind and visually impaired communities, the existing practices 

of providing visual description and some of the relevant literature that exists. Section three 

looks specifically at Visual Assistive Discourse (VAD) as a formal system by defining it in 

terms of properties and a participant framework as well as implications for the principle 

roles and processes. Section four looks at Audio Description (AD) with a review of several 

productions of movies described by different organizations. This section develops a set of 

terms for the structural and functional components based on examples from transcripts of 

AD. Section five builds upon the definitions in section four with a series of analytic 

perspectives that looks at referring sequences (Schiffrin, 1994), experiential frames 

(Goffman, 1974, Tannen, 1993a), foci of consciousness (Chafe, 1994), and sites of 

engagement (Scollon, 2001a). Section summarizes the findings from AD to the larger 

practices of VAD and section seven describes some paths for additional research. 

1.5 Note for Visually Impaired Readers 

In order to make this document more usable if it is translated into an accessible 

form, the topical headings and subheadings use a numbering system rather than the standard 

American Psychological Association (APA) style. Also, all of the images used in this report 

have associated verbal descriptions contained in Appendix C. These descriptions will also be 

extracted as a separate file that can be accessed concurrently with this document. 

7

2. BACKGROUND INFORMATION 

The goal of this section is to provide some orientation to readers who will likely have 

different experiences and understandings. Below will be a discussion of the consumers of 

visual description, the main practices of visual description, the goals of accessible media, and 

some practical benefits for viewing these disparate practices as variations of a single process. 

2.1 The Primary Consumers of Visual Description 

Most, but not all, of the consumers of visual description are the blind or visually 

impaired. Because of the variety of factors that impair vision, the effects that occur when 

vision loss comes at different ages, and the way statistics on visual impairments are collected, 

a simplified description of the communities of visually impaired individuals is not possible in 

this forum. Not only is there no typical blind person, there are numerous ways to categorize 

the population, including grouping those who have had an impairment from birth 

(congenital) or acquired it at some point in life (adventitious), or by looking at the nature of 

the impairment such as focus, loss of visual field, or ability to discern details. Current 

estimates of the prevalence of visual impairment are that there are 1.3 million citizens 

affected with some form of visual impairment in the United States; 98,000 of which are 

students (AFB, 2001a). The National Federation of the Blind reports that 3.5% of the 

population over 65 years of age has some visual impairment (NFB, 2000), while the 

American Foundation For the Blind reports a much higher 16% of the same age group 1 . 

Timely access to accessible educational information including textbooks, multimedia, 

and educational videos are considered significant impediments to education and training. 

Many blind children are educated in mainstream schools, but less than half complete high 

8

school. For those that complete high school, they are just as likely to take some college 

courses as sighted individuals and much less likely to graduate (AFB, 2001b). Nationally, 

unemployment for working age visually impaired individuals is over 50% with less than one 

in three of those of working age and legally blind being employed (AFB, 2000). 

The visually impaired community does not have a separate language. Braille is a way 

to represent a natural language such as English in a tactile form, but most members of this 

community do not read Braille, or do not read it well. While Braille is an important mode of 

communication for the blind, it is not a universal solution for communicating information. 

At present, Braille literacy is in decline (Schroeder, 1994). VIB individuals use language in 

essentially the same ways 2 , including visual references, as do those with sight (Warren, 1994) 

(Elbers, 1999). For those who lost sight after critical developmental ages, concepts 

grounded in vision such as colors, perspective, visually based inferences (Wyver, 2000) are 

still relevant even if they cannot see. For the others, including the congenitally blind, there is 

also interest in visual information and in understanding the visual world. No research or 

other indications emerged in this study that descriptive language for the VIB should be 

different from language that would be used with sighted consumers of visual description 

who were not accessing the visual information. The differences between regular language 

use and the language used in VAD (discussed later in this document) are not related to visual 

impairments but to the fundamental nature of the visual descriptive process. 

9

2.2 Practices of Describing Images for Assistive Technology 

Using language in electronic texts to replace vision is practiced by a variety of 

organizations and with a variety of text types. This analysis divides these into four mostly 

unrelated areas as illustrated in figure 1. Except for serving a common set of customers, 

there is little apparent integration of these activities, at the time of this study. 

Live 

Description 

1981 

Audio Description 

Described 

Video 

Figure 1 – Different practices of visual description in 2002 

2.2.1 Audio Description 

Audio 

Tours 

One of the two most developed areas of these practices is called Audio Description 

(AD). AD actually encompasses several different usages that have in common that they use 

human voice rather than voice synthesis. They also work primarily with texts that are in 

motion rather than fixed. Further, they also share a common methodological history. 

Beginning with Live Description 

Audio 

Books 

1948/1971 

Although it appears that the very first described performances were recorded on tape 

and broadcast through radio reading services (Packer, 1997b), the first sustained and 

standardized AD program began in 1981 with the work of Dr. Margaret Pfanstiehl and her 

organization called the Metropolitan Washington Ear. The “Ear” began providing live AD 

10 

Software & 

Interactive 

Media 

Late 1990s 

The Internet 

& Multimedia 

Late 1990s

for plays at a Washington DC area theater using local FM transmitters and a describer who 

would insert narration between sections of dialog. The practices and techniques developed 

for live description were influential for all of the major areas of AD that followed. 

The most significant difference between live description and the other forms 

discussed below is that, as a result of being live, the describer must be sensitive to events in 

real time and the description cannot be pre-packaged because even theatrical performance 

(sometimes by design) varies in terms of events and timing. This is not to indicate that the 

description is fully spontaneous. Before a theatrical production is described, it is previewed 

by the describer and in some cases by two describers, one for the program information and 

one for the performance, so there is some redundancy as well as preparation in the creation 

of the descriptions (Weber, 2002). 

Live description is also now used in non-theatrical settings such as weddings and 

ceremonies. Occasionally, live events are broadcast simultaneously (simulcast) on the 

Internet. 

Described Video 

The next major development in Audio Description, and now the most prevalent 

form of AD is called described video. Described video includes television, films, and streaming 

media. This form of visual description reaches thousands of viewers/hearers daily. And, as 

stated earlier, some of the earliest described performances occurred experimentally in the 

1970s. In 1975, a theoretical approach was developed as part of a Master’s thesis by Gregory 

Frazier (Frazier, 1975). But, it was not until 1982 when the Washington Ear connected with 

WGBH, the public broadcasting station in Boston that had pioneered closed captioning and 

11

continues to pioneer accessible media including the secondary audio program (SAP) channel, 

that described video became a broadcast reality (Packer, 1997b). WBGH, with consultation 

from “the Ear” began broadcasting described shows and later, with support from the U.S. 

Department of Education, launched the descriptive video service (DVS) in 1990 and the 

National Center for Accessible Media (NCAM) in 1993. DVS is now one of the two largest 

providers of described video material in the U.S. and, in addition to providing broadcast 

products, sells a collection of described videos while NCAM focuses on newer technologies 

such as interactive media and devices. WGBH is not the only organization providing 

described video products. In 1988, seemingly independent of these efforts on the East 

Coast, Jim Stovall, who lost his sight as a young adult, developed another descriptive 

approach. Rather than broadcast on a separate channel, his company used the existing audio 

channel with description inserted in between dialog so that all viewers receive the same 

audio content. His company, the Narrative Television Network (NTN), also began with an 

emphasis on television programming and was a commercial enterprise that was sustained by 

advertising as well as funding from the U.S. Department of Education. Initially, the 

descriptive style used at NTN was sparser than the original description style used at “The 

Ear” and WGBH, but by today, Stovall indicates their styles are fairly close (Stovall, 2002). 

With a recent Federal Communications Commission (FCC) ruling 3 that requires 

several prime time hours of television programming each day to be described, a number of 

other organizations have entered the Audio Description market space. Recently, the 

National Captioning Institute (NCI), known for real-time closed captioning, launched a 

program of described media led by Joel Snyder who has been active in Audio Description 

12

since the early 1980’s, including work with the Ear and the National Endowment for the 

Arts, and in developing audio tours of museums. 

A significant feature of all of the organizations that provide description for television 

and film is that their staffs are usually paid and they employ an extensive pre-description 

process including scripts, writers, and editors. 

Recorded Tours 

Starting in the mid 1980s, museums began offering Audio Description tours (Snyder, 

2002a) and the practices have spread extensively (ASTC, 2001). Since this type of Audio 

Description is more geared towards fixed stimuli such as paintings and museum exhibits, it 

would seem to be similar in some ways to the content found in books. 

Methodological Ancestry 

All of the major providers of described video products in the US had early and 

substantive consultations with and training from the Washington Ear and these 

organizations readily attributed much of their understanding of the principles of description 

to the Ear and the Phanstiehls (Goldberg, 2002, Snyder, 2002a, Stovall, 2002). In addition, 

at the time of this report, there are active AD programs throughout the world and many of 

these programs have had substantive contact with those based in the US (Simpson, 2001) 

including from the Ear (Pfanstiehl, 2002a). While the practice of Audio Description is now 

spread over dozens of organizations, there is a common methodological history derived 

from the live performances that began 1981. 

13

2.2.2 Audio Books 

Historically, printed material has been a very different medium from film and video 

and so it follows that the description practices used for printed material would be unrelated 

to Audio Description. In the United States, there is only one organization, Recording for 

the Blind and Dyslexic (RFB&D), that seems to provide most of the descriptions of visual 

information in books. The American Printing House (APH) for the Blind may also, but 

repeated inquiries provided no confirmation. And, while the APH does provide material in 

audio form, providing material in tactile form (Braille and tactile images) seems to be their 

focus, so that today, RFB&D is essentially the main supplier of audio textbooks (Burnham, 

2002, Wall, 2002). 

RFB&D began as Recording for the Blind (RFB) based in Princeton, New Jersey. It 

was chartered in 1948 to help visually impaired soldiers returning from World War II and is 

funded by government, private industry, and subscription services. This organization 

provides textbooks on a range of subjects in audio format to students with documented 

visual and/or learning disabilities. These textbooks are read along with descriptions of 

images 4 into a digital recording system that allows page-by-page access. The material is then 

distributed either digitally or on cassette tape. RFB&D operates 32 recording studios 

nationally, using over 5300 volunteers, and serving over 25,000 blind members (RFB&D, 

2001a). The RFB&D volunteers usually have deep experience in subject areas that they read 

for and are often retired professionals. 

The National Braille Association (NBA) produced a manual in 1971 for recording 

books on tape that included instructions on descriptions of images, including maps, 

diagrams, and charts. RFB&D uses the NBA guidelines and also has developed an extensive 

14

set of procedures for describing images in the range of disciplines taught in public schools 

and post-secondary institutions. Subjects include chemistry, computer science, social 

studies, geography, and math. This process includes volunteer readers and staff that support 

those readers by selecting which images will be described and where in the audio stream the 

descriptions will be placed. The volunteer who is reading the text then creates the 

descriptions. RFB&D policy suggests that the image descriptions be written out prior to 

reading by the volunteer, but linguistic evidence indicates that some (and perhaps most) of 

the descriptions are spontaneous. Like WGBH, RFB&D has subject focus specialists/areas 

and often assigns material of certain types to studios in different cities (Smith, 2002, 

Vollmer, 2002). 

There is also a worldwide consortium effort underway to develop a digital talking 

book (DTB) standard (Kerscher, 2001a). This effort includes publishers and library systems 

with the goal of providing a standard document interchange format based on the worldwide 

web consortium (W3C) extensible markup language (XML) (Kerscher, 2001b). Similar to 

the Internet, these standards use the term textual equivalent in relation to images. 

2.2.3 Software and Interactive Media 

Software and Interactive media are relatively new areas for visual description. While 

the prototypical texts for visual description, plays and books, have been around for 

thousands of years, the software industry is less than fifty years old. And, it is only within 

the last twenty years (more recently in a major way) that software and interactive media 

components have come in contact with the general population that includes the visually 

impaired. 

15

Interactivity is used here to denote a human-technology interchange where the 

human is presented with options to direct the technology to alter its form and/or functions. 

The term “Rich Media” is used by NCAM to mean “elements on a web page (or in a 

separate player) which exhibit dynamic motion over time or in response to user interaction” 

(NCAM, 2002) Convergent Media is another term used at NCAM for interactive elements, 

mostly in relation to digital television (Wlodkowski, 2002). The same regulations that cover 

the Internet also cover interactive elements on web pages; including navigational graphics 

(client & server maps), applets, and other dynamic media. This area is fairly new and there is 

little history to study. 

2.2.4 Multimedia and Internet Sites 

The term multimedia is 

often used to describe (like 

multimodal) content that is 

conveyed through more than one 

representational form (Corn, 

2002). The Internet can be 

viewed as an interconnected 

multimedia environment. The 

most important differentiator 

Figure 2 - Conceptual view of Internet descriptions 

Still 

Images 

Live (Simulcast) Events 

between Internet/multimedia and those mentioned before is that all of the earlier 

publication types, with the exception of live performances, can be included in multimedia 

16 

The Internet 

Text and Hypertext 

Moving 

Images 

Interactive 

Elements

texts. An Internet text can include live elements as well. The Internet, as illustrated in figure 

2, is really the superset of all other descriptive practices. 

Descriptions of non-textual components on the Internet are addressed through 

several standards and guidelines. The most widely known of these guidelines is section 508 

of the United States Rehabilitation Act of 1998 5 that covers many Federal government 

websites. Non-textual components (still images, moving images, and interactive elements) 

on executive branch websites are required under section 508 to have a description. Known 

commonly as the “508 standard,” it is often used as a measure for other organizations, 

including local governments, non-governmental organizations, and universities who wish to 

make their websites accessible but are not bound by the 508 mandate. There is also a 

voluntary guideline developed by the Worldwide Web Consortium (W3C, 1999) that covers 

essentially the same territory as section 508 but differs in some details. There are several 

software tools that check web sites for compliance to these standards that allows web sites to 

claim compliance with a standard. There is also a movement called the “Speech Friendly 

Sites” that connects to the 508 and W3C guidelines as well (Artic Technologies, 2002). 

There have also been many books published just in the last few years on developing 

accessible websites. None seems to provide more than a handful of pages related to the 

challenges associated with describing images and creating textual equivalents. 

All of the standards, guidelines, and books reviewed in this study take a similar 

approach to non-textual components. They rely upon features inherent in the hypertext 

markup language (HTML) standard for descriptions to be placed on these non-textual 

elements. And, they specify these elements should have a “textual equivalent” description. 

The guidance for that equivalent specifies that it must convey “the meaning of the image” 

17

(Board, 2001a, Board, 2001b). For many non-textual elements such as buttons and audio 

files, the description options are straightforward and formulaic. For images that carry 

essential communicative content rather than just decoration, for example the images found 

in textbooks, the descriptions are more challenging. None of these guidelines really 

addresses the range of descriptive options that might exist or how the person ‘surfing’ the 

web would cognitively process different descriptive approaches. In short, the standards and 

guidelines require the existence of descriptions, but provide little detail as to what those 

descriptions should include or how they should be constructed. 

No estimates of the number of organizations attempting to make Internet sites 

accessible were available. And, since the guidelines are so open, it is likely that there are 

hundreds if not thousands of different approaches to describing images in these new media. 

2.3 The Goal of Accessible Media: Usability and Experiential Equivalence 

The goals of accessibility practices discussed here have been expressed with two 

types of approaches, usability and experiential equivalence. Some distinguish usability from 

accessibility as: accessibility is based on the technology being able to access the information 

while usability is the information being meaningful and that has an intended effect upon the 

listener (Baquis, 2002). So, accessibility can be viewed as a component of usability (Slatin, 

2001). Experiential equivalence is a term that takes the concept of usability further to consider 

the usable experience as one that is equivalent to the experience the non-disabled would 

have. Section 508 requirements for the Internet specify that the description should 

“communicate the same information as its associated element” (Access Board, 2001). The 

18

NBA guidelines used at RFB&D have a similar guideline to ”allow the author to make his 

own impression on the listener.” 

Each practice approaches this concept with different terminology, but the underlying 

principle is the same – detailed pictorial description is not required, but rather the goal of the 

process is to provide essential information to allow the consumer to experience the original 

text in a similar manner as a sighted consumer would have including the opportunity to 

assign their own meanings to the texts rather than use the interpretations of others. 

2.4 Previous Qualitative Evaluations of Description 

It seems that little research has been done to measure the benefits of visual 

description. Some studies related to Audio Description in recent years, and based mostly on 

subject self reports have indicated that described video and live performances are beneficial 

on a number of levels 6 . In “Video Description in North America,” Packer cites seven types 

of benefits from description (Packer, 1996): 

1. Gaining knowledge about the visual world, 

2. Gaining better understanding of televised materials, 

3. Feeling independent, 

4. Experiencing social connection, 

5. Feeling equality with fully sighted, 

6. Experiencing enjoyment, and 

7. Relief of burden on sighted viewers 

These findings were based on analysis of DVS customer feedback and are consistent 

with other reports that a production with Audio Description provides conceptual and 

19

cultural inclusion that is not possible otherwise (Lovering, 1993, Packer, 1997a, Packer, 

1997b, Pfanstiehl, 2002c). 

Outside of the realm of Audio Description, no studies of the benefits of describing 

visual information were found by this investigation. It follows that an appreciable portion of 

textbook images contain substantive content, and certainly, studies have shown that images 

in texts often improve comprehension over the text alone (Levie, 1982). As to whether the 

described images have similar benefits does not seem to have been studied. The Internet 

and interactive media areas are too new and perhaps too diverse to provide any generalized 

practices for comment. And, what little research has been done into this area from the 

consumer’s perspective indicates that these areas are much less successful than Audio 

Description and audio books (Gerber, 2002a, Gerber, 2001, Gerber, 2002b). 

No studies seem to have been conducted as to whether description of other types of 

graphics holds benefits for people who are not visually impaired such as those with cognitive 

impairments, including Alzheimer’s disease, learning disabilities, and second language 

learners who could benefit from additional spoken descriptions to accompany visual 

information. 

2.5 The Practical Case for a Unified Model 

While there are clearly academic reasons to consider these practices as variations of a 

single process, it does introduce a new layer of complexity into what are challenging and 

little studied fields. There are also practical benefits to viewing this area as a single process 

from the perspective of those providing these services. Below considerations for and against 

this view are presented. 

20

2.5.1 Reasons to Consider as Separate Practices 

The two most important sets of facts to support separate analyses of descriptive 

discourses are the differences in traditional media types and consumer interests. Books and 

dynamic moving media (video, film) operate according to different audience dynamics. A 

book is read according to the schedule of the reader and video productions such as 

television shows and movies are linear and have their own fixed timescales. Textbooks 

contain still images including diagrams and illustrations that convey specific and often 

conceptual messages, while cultural video and film rarely contain diagrams or illustrations. 

And while both books and dynamic moving media contain segments, scenes and chapters, 

the organizational content of these segments is different. Books present subsections/topics 

with hierarchical structures (Lemke, 2002, Raman, 1994) and video productions contain 

characters, locations, action, and dialogue that interrelate on a continuous basis as the scene 

evolves. There are also substantial differences in different types of consumers and their 

backgrounds. Congenitally or early adventitiously blind who have little or no experience 

with the visual world want different types of descriptions than those who have a memory of 

the visual world and usually want much more (Pfanstiehl, 2002b), raising questions about 

what areas of description are appropriate. 

2.5.2 Reasons to Consider as a Single Process 

The reasons for a more integrated model of this unique communication process fall 

into four areas. First, while more traditional media such as books and television shows can 

be viewed as very different interactional experiences for the consumer, these lines are not so 

neatly drawn with respect to newer digital media. Educational software and the Internet 

21

have characteristics common to books and videos. Furthermore, educational videos, while 

using the same media as cultural productions, contain many of the structural characteristics 

of textbooks including purposeful illustrations and the ability for the viewer to navigate the 

text’s structure. Textbooks now also frequently come with digital media in the form of a 

compact disc or references to an accompanying website. In addition, Internet sites can and 

frequently do contain all of the characteristics of both books and video as well as additional 

characteristics of interactivity and hypermodality. In other words, the web can be like a 

book, or a movie, or both, and more. 

Second, regarding the audience differences, one could also view this as an issue of 

relative proportions. Both congenitally and adventitiously blind of different age groups are 

consumers of books, movies, television, and the Internet, but perhaps in different ratios. It 

is more probable that the reader of a textbook will be an adolescent, but it is conceivable 

that adults returning to school or helping their children with homework might use the same 

instructional texts. Likewise children may choose for personal interest or family 

participation to watch a film or video genre geared towards an older audience. The Internet 

as a media space is used by all ages. And, while many government websites might be geared 

towards older (more likely to be adventitiously blind) populations, some websites, including 

those for museums and cultural collections, are often oriented to the needs of a young 

audience. 

Third, digital technologies present new opportunities for the ways that the 

information in managed and delivered. Digital technology allows traditional boundaries text 

types based on publishing restrictions to be blurred creating new hybrid text types (Iedema, 

2003) (Piety, 2001). Currently descriptive technology is developing (along with almost all 

22

media technology) as a digital technology with all of the power for distribution and 

dependence on software as other digital media. It is likely that descriptive technology will 

evolve as other information technologies have to have not just a dependence upon software 

but an architecture that is governed by software (Lessig, 1999). Trends in information technology 

continue in the directions of special purpose hardware being replaced by general-purpose 

hardware governed by special purpose software; and more recently, general-purpose 

software governed by conceptual models (Gamma, 1995). The power of the model-based 

technology is that we may see performances of assistive technology in the future derived 

from a conceptual model of the fundamental human processes at work that the technology 

supports. If this is the case, the more general this model can be, the more widely it can be 

applied to different textual and technological situations. 

Fourth, a dimension that will be discussed in more detail below is that of the 

describer. Descriptions come from sighted individuals who face a number of significant 

choices in constructing their descriptions. Once this unique communication system and the 

special and powerful role of the describer is understood, a role that is similar in some ways 

to a sign-language interpreter, there may reasons to consider this a professional skill that 

transcends media type. There are already a number of organizations such as NCAM and 

NTN discussed earlier that work to make several types of media accessible. By focusing on 

the properties of this system that are common to all media types, perhaps an economy of 

scale can be achieved in future training and possibly accreditation. 

23

3. VISUAL ASSISTIVE DISCOURSE 

The practices described in the previous section share many properties. They are 

intended to support the same types of people, they all rely on technology, and they have 

essentially the same goals. This allows them, despite their differences, to be viewed as 

members of the same family of communication practices. This study introduces a term for 

these practices, Visual Assistive Discourse (VAD). Below, it is formally defined, including 

the characteristic properties that distinguish it, how it compares with other better known 

communication systems, as well as a conceptual discussion that covers issues related to using 

words for images and implications for understanding the principle roles of describers and 

consumers of description. 

3.1 Visual Assistive Discourse Defined 

This process can be defined from a number of perspectives. Below, four different 

types of definitions are presented: 1) by the meanings intended in the term discourse, 2) five 

distinctive properties of VAD, 3) how VAD compares as a process to other prototypical 

communications processes, and 4) the essential components of VAD. 

3.1.1 The “Discourse” in Visual Assistive Discourse 

The term “discourse” is often used in different ways (Scollon, 2001b) and several 

different meanings are intended with the use of it in VAD. First, some have used it to 

indicate units of language larger than a sentence and language in use (Schiffrin, 1994). These 

are two of the senses it is used here because visual descriptive language is often extended 

beyond words and clauses to larger text sub-units.. Discourses have also been looked at as 

24

social practices where language is but one element used in social activities that exist for 

specific purposes (Gee, 1999, Scollon, 2001a). From this perspective the roles and activities 

of the describers and receivers are considered to be important parts of the discursive 

process. Discourse has also been defined as a multimodal process when it includes different 

forms of communicative content (Kress, 2001). And finally, discourse is used here as 

Tannen defined it as ‘language in context across all forms and modes’ (Tannen, 1981), and 

she said, linguists in their study of discourse are concerned “with the central questions of 

structure, of meaning, and how these function to create coherence.” The search for 

coherence is an important goal of studying this communication process because in order for 

visual description to be meaningful and true to its intention, it must be coherent to the 

listener and coherent with the intension of the author(s) of the texts. Discourse then is used 

in Visual Assistive Discourse to include the linguistic and non-linguistic elements that 

interact to create meaning with accessible media using assistive technology. 

3.1.2 Common Properties of Different Descriptive Practices 

is employed. 

There are five basic properties that will be evident in all of the practices where VAD 

1. Technology enables the reception of remote descriptions. 

2. Descriptive messages are non-interactive. 

3. The descriptions are subordinate to another text. 

4. The description process is non-transparent. 

5. The process is constrained by the text. 

25

The first of these properties, that technology enables the reception of remote descriptions, 

means that descriptions are created in a different place and possibly different time from 

when they are received and it is through technology that these descriptions are brought to 

the receiver. The second property is that descriptive messages are non-interactive. This 

means that they are received as one-way messages without the listener being able to 

influence the form or ask for clarification. The third property is that descriptions are 

subordinate to another text. The inserted language is not an independent communication 

process, but is a component in a text that plays a very specific role. Fourth, the description 

process is non-transparent. The act of describing creates a new text, different from the 

original text that is influenced by insertion of descriptive language, the choices reflected in 

that language, and the acoustic and prosodic properties of the insertions. Fifth, the process is 

constrained by the text. The nature of the descriptions can be constrained by the type of the 

text and the technology used to transmit it. And, for Audio Description only, because 

descriptions are inserted in between dialog and other audio content, the types of gaps in the 

specific text affect the description. 

3.1.3 Comparison of VAD and Other Communication Systems 

As a class or type of language, the system of VAD has specific components and 

roles. Figure 3 shows an abstracted representation of VAD compared to typical written, 

conversational, and sign language interpretation processes. Looking at the broad 

characteristics of visual description, it is both possible to see it is as a unique communicative 

process and how it is related to some other systems. It is like conversation in that it is 

26

eceived in a spoken form. It is similar to reading because the receiver of it cannot interact 

with the text as they cannot with a book. And, there is an intermediary that facilitates the 

communication as sign language interpretation provides. And, unlike any of these other 

processes, information is inserted to replace visual information only creating a secondary or 

modified text. 

Figure 3 - Overview of VAD and other prototypical communication processes 

Conversant 

Face-to-face Conversation 

Author 

Written Communication 

Co-constructed 

Text 

Composed 

Text 

Conversant 

Reader 

Hearing 

Conversant 

The similarities and differences extend beyond the reception of the text to the 

creation of texts. In conversation, the text is co-constructed and shared between the 

interlocutors (Schiffrin, 1987). It emerges rather than being static. Conversely, in most 

written communication, an author creates the text that he expects to be used by a reader 

who then reads that work in the form the author produced it 7 . With sign language 

interpretation (and all interpretations and translations), there are several texts. The 

27 

Spoken 

Text 

Interpreter 

Visual 

Text 

Interactional Sign Language Interpretation 

Author 


Source 

Text 

Visual Assistive Discourse 

Describer 

Verbal 

Insertions 

Modified 

Text 

Signed 

Text 

Deaf 

Conversant 

Consumer

interpreter exchanges spoken and signed texts with the conversant, while all of the 

conversants share a common visual text that could also be called part of the context. With 

VAD, however, the describer is in a position, in fact is required to, add to and modify the text 

created by the author. In order to make the text accessible, the describer must alter it by 

inserting language that reflects the describer’s choices for representing the visual 

information. Where the sign language (SL) interpreter is responsible for converting discrete 

discourse units from sign to speech or visa versa, the visual describer must select certain 

elements from the visual field and determine which elements will be described, in what 

order, and using what terminology. 

3.1.4 The Components of the VAD Communication System 

Any instance of VAD consists at least four components: 

• Source text 

• Modified text that includes insertions of descriptive content 

• Describer 

• Consumer. 

The source text is a production that was developed by some individual or 

organization to communicate with an audience in a specific way and with specific messages. 

Generally, the audience that is anticipated in the development of a source text is a sighted 

audience and the source text includes both visual and verbal content. In most cases, the 

source text is prewritten and recorded. The modified text contains both original verbal 

content produced by the author or the source text and descriptive insertions that are 

anonymous and disembodied messages (Goffman, 1963) sent to a receiver who is unknown 

28

to the describer. The modified text is created by either a third party describer or by the 

author of the source text. The describer is a conceptual role and in practice may be one or 

more individuals responsible for creating and placing descriptive insertions. In some cases, 

such as audio books, the describer is also presenting other content, while in other cases, such 

as described video, the describer presents only the descriptive content. The consumer is also 

a conceptual role and equates to a range of individuals. The consumer is generally not 

known to the describer; both specifically as the person who will be listening to the 

description or by known characteristics because the consumer can be from any many age 

group and possessing variable visual impairments, as described earlier. There is no typical 

consumer and there may be different preferences for types of information to be provided. 

These four components will be present in any VAD event. 

3.2 Conceptual Issues with Words for Images 

At its core, VAD is a process of using words as a substitute for information available 

in visual form. The differences between the visual and acoustic channels and the types of 

information usually conveyed through language and the visual field are significant and worth 

at least a brief discussion. Language and vision are fundamentally different phenomena with 

vision being a primary process and language a secondary process (Bateson, 1972) that 

reflects a level of interpretation. As Lemke says, “No text is an image. No text has the exact 

same set of meaning affordances as any image” (Lemke, 2002). These differences can be 

seen in two aspects: sequential vs. parallel processing, and the ways and volumes that each 

provides information. This topic connects to schema theory and it is difficult to classify as 

either an interpretation or a translation activity. 

29

3.2.1 Sequential vs. Parallel 

Language is the result of thoughts being encoded and retrieved in a sequential form 

where visual information is perceived in parallel, though a gestalt, as a dynamic whole. 

Encoding information in language involves specific representational choices including the 

words to use and the sequence in which they are placed. When it is spoken, language 

becomes a linear medium. As Slatin says of receiving the description of a web page graphic 

“it is an experience of time, not in space” (emphasis in original) (Slatin, 2001). While 

language becomes fixed when written or spoken, visual information can change as it is 

viewed. It changes in the environment, and moving media such as video and film so that the 

person viewing is seeing multiple abstract elements (color, shape, line) and recognizable 

objects (people, places, things, actions) concurrently. This dynamic property can also be true 

of still images because when viewing a static picture, the eye moves and recognizes different 

information over time(Holsánová, 2001, Kress, 1996) and many still images have attentional 

foci or cues (Dwyer, 1978) and/or motion called vectors built into them (Kress, 1996). 

3.2.2 Raw vs. Processed Information 

The second main difference to discuss in relation to this topic is the amount of 

information a unit of language and image can provide and the way each provides it. In 

general, the visual field can provide vast amounts of information and detail where language 

requires specific encoding into words that, by their very nature, summarize and categorize 

information, often into prototypical concepts (Rosch, 1978). The following hypothetical 

example of two statements of visual description is used to highlight what a description could 

be and some of the word/concept issues presented. 

30

1. A hobo sits at the counter of a greasy spoon 

2. A waitress approaches him 

These two statements describe a scene and they show certain choices a describer 

might make. One of these choices is vocabulary: the terms “greasy spoon” and “hobo” are 

specific to a type of location and person. For a listener who knows what these terms mean, 

they likely convey a clearer image with fewer words than the more generic alternatives “run 

down lower class dining establishment” and “poor traveler.” Another decision the 

describer makes is to keep these characters anonymous. They could have names in a story 

that have yet to be revealed, but the description in this case only provides the externally 

known information. Yet another decision is the choice to focus on these details (waitress, 

hobo, counter, greasy spoon) over others that are present in the same image such as time of 

day or other characters. Clearly, these two statements do not equate to a visual scene, but 

they may provide the information necessary for a hearer to construct a visual scene. 

3.2.3 Schema Theory 

At the same time that words cannot provide all of the information available in a 

visual field, certain words and concepts associated with them imply a whole range of related 

meanings that are often available visually as well. The word “restaurant” for example will 

usually imply many physical and conceptual elements (kitchens, wait staff, menus, food) that 

do not have to be seen. Complexes of concepts and expectations are thought by some to be 

stored in mental structures called schemata (Schank, 1977). When schemata are activated, as 

by words and/or visual stimuli, expectations of other information and perhaps assumed 

values for information not provided and events that might follow become activated as well 

31

(Tannen, 1993a). If these schemas guide expectations in interaction and conversation 

(Tannen, 1993b) and are at work in the process of reading (Perfetti, 1988, Wilson, 1986) -- 

two activities that VAD is similar to, they may play a role in allowing a few words to provide 

a large amount of content in VAD. 

3.2.4 Translation-Interpretation-Transrepresentation 

As a language process, visual description could be seen as a member of a family of 

translation and interpretation activities. According to Metzger (Metzger, 1999), “Both 

translation and interpretation deal with the rendering of a given text into another language”. 

A key issue here is that the text that is being rendered is made up of visual elements (scenes, 

signs, smiles, etc.) rather than formalized language. If translation is what happens when the 

written is converted into the written and interpretation when the conversational is converted 

into the conversational (Hatim, 1997), then visual description might be viewed as trans- 

representation because it takes raw visual information and converts it into language. The 

transrepresentation is affected by the sequential/parallel, high detail/high concept, and 

author encoded/receiver determined meaning boundaries. 

3.3 Descriptions: Situated/Constrained in Multimodal Texts 

In addition to the general differences between language and visual information 

described above, visual descriptions are strongly affected by their environment. They are 

both situated in and constrained by where they are placed in a text and the textual nature of 

the visual elements that are being represented. While it might seem that descriptions would 

relate to those elements that are the most salient visual components in a multimodal text 

32

(Norris, 2002), the nature of VAD imposes further effects on the nature of the description; 

in ways constraining them. 

3.3.1 Textually Situated Descriptions 

The selection, meaning, and content of any descriptive insertion is influenced by its 

textual position, which can be viewed in three ways: 1) as the textual environment it is 

within, 2) its position in the text in relation to other occurrences of the same visual stimuli, 

and 3) the relative importance of the stimuli independent of its salience. The hypothetical 

description of a hobo in the example above would probably be placed in between other 

salient textual information as shown in the example below: 

3. Audio: Clock chimes 5 times 

4. Describer: A hobo sits at the counter of a greasy spoon 

5. Describer: A waitress approaches him 

6. Woman: Coffee? 

In this example, the descriptions from lines 4&5 are contextualized with time and 

dialog. The information from the source text in lines 3 and 6 provides context for the 

description and equally important, the description provides context for the source text 

elements. This mutual relationship between the descriptive insertions and existing source 

text content is an essential aspect to the meaning of a description because it is all these 

elements that the consumer of a described production will receive. These same lines of 

description take on a yet other meanings if part of a textbook as the hypothetical example 

below illustrates: 

33

7. Reader: During the depression, many people lost their jobs and 

traveled searching for work, see picture 1 

8. Picture 1 shows a hobo sits at the counter of a greasy spoon 

9. a waitress approaches him 

These examples should show one of the most important aspects of visual 

description, the textual environment and that these descriptions do not come ‘naked’ with a 

seemingly unlimited set of meanings as lines 1-2 show. Rather, as lines 3-6 and 7-9 illustrate, 

descriptions are textually situated and understanding their meanings involves an understanding 

of the information provided in the environment of the source text. 

The descriptions are textually situated in a second way by the sequential position of 

visual information in the source text. If the examples above were not the first time that the 

image or place is described, but a successive occurrence where the audience would be 

expected to recognize the place, then a different description could be called for entirely, even 

if the visual image was exactly the same. It would be different, if for no other reason that it 

was known (“the greasy spoon” rather than “a greasy spoon”) and also because additional 

details could be provided if the previously supplied information is expected to be 

understood. 

An additional factor in textually situating descriptions is the relevance to the text that 

visual elements have. Also, if there were details about the scene that were important for 

future plot reasons but might not be the most salient feature (as we assume the hobo and 

waitress are) but are important for the viewer to know about, the description becomes 

situated not only by its enclosing information or its appearance in the text but also because 

34

of its significance to the internal structure of the text (Gould, 2002, Pfanstiehl, 1984, Snyder, 

2002a) as judged by the describer. 

3.3.2 Constraints: Detail vs. Interpretation 

Each type of text creates conditions that affect the nature of the description. The 

limits or restrictions on visual description are imposed by the technology used in the modified 

text and by the methods of the describer. These restrictions can place pressures on the 

manner in which the description is rendered. A concept behind these constraints being that, 

as the language used in descriptions covers more information with less words, the more the 

scene might be evaluated and summarized by the describer leaving the consumer of the 

description less opportunity to independently assign meaning (Baquis, 2002). Alternatively, 

to provide the building blocks of information to allow the consumer more opportunities to 

infer meaning may take more time to describe and hear. In the hypothetical examples of 

description from above, there were 14 words and 19 syllables. If the text allowed a smaller 

insertion, certain changes would need to be made that would either reduce the amount of 

detailed information or summarize the situation with less verbal content. For example, 

“hobo” might become “bum” or “man” and “greasy spoon” might become “diner.” Each 

of these choices in language might have a slightly different effect on the hearer. 

The books recorded by RFB&D can use either a special format audiocassette tape or 

a special digital format. In both cases, the image descriptions are placed in specific locations 

(usually after a paragraph where they are referenced) in the linear recording. While the 

consumer of the RFB&D media can scan forwards and backwards, they cannot access the 

images separately from the body of the text because both are combined into the same audio 

35

stream. The RFB&D technology does not impose any restrictions on the amount of 

description, so they would be called unrestricted descriptions. 

Internet sites have built-in functions for describing pictures directly in a property 

attached to an image (Alt-text) or in a separate page (Long-Desc) that the user can navigate 

to. Assistive technology either renders the descriptions into refreshable Braille devices or 

uses voice synthesis. Some browsers place limits on the amount of Alt–text that can be 

displayed, so direct descriptions are potentially restricted while indirect descriptions are 

unrestricted. Slatin, in “Maximum Accessibility,” recommends no more than 150 characters 

(Slatin, 2001), including punctuation, while Alonzo recommends no more than 300 words 

for long description (Alonzo, 2001). Internet descriptions could be viewed as restricted by 

convention. 

Restrictions for descriptions in Audio Description are more complicated. As Frazier 

had envisioned, description is mostly inserted into the gaps or bridges (Frazier, 1975) in the 

audio. A basic rule for all Audio Description practices is that descriptions do not impede 

dialog. And, then theoretically, that entire span between dialog segments could be used for 

description. But, there are cases where non-dialog audio cues (crashes, bumps, etc.) require 

the description to be synchronized so that the consumer is able to make sense out of the 

original audio material. Also, in some cases such as opera or musicals, music or parts of a 

soundtrack are considered more important than description so visual description is further 

restricted by both the specific text and the conventions of the describer. 

Descriptions of software and interactive elements tend to be short and functional, 

describing the interactive effect of a function associated with an image rather than an 

36

elaborate description of the image itself (Board, 2001b, Gerber, 2002a, Slatin, 2001, W3C, 

1999). These descriptions are unrestricted, but usually brief. 

3.4 Discussion: The Role of the Describer 

Ultimately communication, or a failure to communicate effectively happens between 

people. In VAD, these people are primarily either describers or consumers. Having 

considered some of the conceptual issues with representing visual information in language, 

the different characteristics of language, and the effect of textual constraints, it is now 

appropriate to discuss the role of describer again. Describers are the first human link in the 

unidirectional process of making media accessible. What the describer selects for 

description, the manner it is described in, and how it is positioned in the modified text is 

final. The describer is a gatekeeper of information. It is a role that is both powerful and difficult. 

The describer must balance all of the visual and linguistic factors, must select which 

information is to be presented and how it will be presented within the textual constraints. 

Three dimensions of the describer role worth discussing are the cognitive process for 

describing, the role as intermediary, and the practice effects that add other dimensions to 

how descriptions are created. 

3.4.1 Describing as a Way of Thinking 

Practitioners of Audio Description, one of the most methodologically established 

forms of VAD say that providing visual description does not seem to be a natural process 

for most people and often requires specific training (Gould, 2002, Phansteihl, 2002, Stovall, 

2002). Snyder says, “We must learn to see the world anew” (Snyder, 2002b). Perhaps a 

reason why it requires special training is that, while descriptions are produced using the tools 

37

of everyday communication (speech and writing), the way language is used in visual 

description is not common. 

Chafe describes two modes of 

conversation: immediate and displaced (Chafe, 

1994). Immediate conversation deals with an 

extroverted consciousness where the mind is 

focused on perceiving, acting on, and evaluating 

information that is in the present; while 

displaced conversation uses a consciousness 

that is introverted to remember and imagine. 

Chafe describes the majority of everyday talk as 

38 

Figure 4 - Chafe's view of immediate mode 

perceiving 

acting 

evaluating 

environment 

EXTROVERTED CONSCIOUSNESS 

represented 

representing 

speaking 

language 

Speaking in the Immediate Mode 

dealing with the introverted and displaced consciousness: events that have happened, did not 

happen, or might happen as well as possible conditions and realities and much of the 

grammar of languages such as English are designed to accommodate these unreal (irrealis) 

situations. As we can see in figure 4, Chafe’s view of immediate consciousness places 

evaluating as a natural and integral part of the perceptual process. But, this conflicts with the 

goals of VAD, so the describer is required to suppress (or control) the evaluative process so 

that the described information is in such a form as to allow the listener to do evaluation 

from the raw informational materials. While describing in some ways may be as old as 

language itself, only describing what is actually occurring visually at the moment, and 

suppressing evaluation, is like asking the describer to pull an “end-run” around the natural 

thought process of extroverted consciousness by proceeding directly to speaking. Further,

the environment of VAD is neither the environment that surrounds the describer or the 

consumer, but rather the environment of the text and its visible boundaries. The describer is 

also performing this conceptual end run while only focusing on a small window of the 

immediate experience, which is the world of the text. 

3.4.2 Describer as Intermediary 

The describer is in an intermediary position that is similar to both a sign language 

interpreter and an author. As an interpretive mediator, the describer is responsible for 

creating language that is to serve as an equivalent to visual information. This role, places 

the describer in a position of relative power similar to the role that Metzger found 

interpreters may take in “Sign Language Interpreting: Deconstructing the Myth of 

Neutrality” (Metzger, 1999). Furthermore, since the words of the describer, unlike the 

words of conversational SL interpreters, are final, the role of describer has some qualities of 

an author (Harris, 2000). While the goal of VAD practices do not call for the describer to 

take a substantive role in the communication process and to act as Pfanstiehl says “as a color 

video camera,” (1984), the fundamental nature of VAD places the describer in a non- 

transparent position that will affect the process through the choices they make. 

3.4.3 Description as a Group Process 

For many producers of description, creating the modified text is not a one-person 

task. It involves teams with formalized procedures, reference manuals, and working 

documents. While this study has not investigated the nature of description teams and their 

documents as illustrated in figure 5, it should follow that the process that gives birth to the 

modified text could have significant impacts upon the quality of the result. 

39

Both of the Audio Description organizations interviewed for this study use teams in 

the creation of their descriptive content, but the teams seem configured differently. In the 

case of RFB&D, each book is pre-read and which images are to be described as well as 

where the descriptions will be positioned is marked up for the readers assigned to it. The 

reader then only describes those pictures that have been pre-selected (Smith, 2002, Vollmer, 

2002). Once a RFB&D reader encounters a picture notation, he or she describes it in a 

seemingly individual way. For RFB&D the reader is also the author of the descriptions. 

Conversely, the established organizations doing recorded Audio Description organizations 

use a script that can be written and edited by people other than the person who is the voice 

of description. 

Figure 5 – Conceptual view of a the description process 

Text 

Producer 

Source 

Text 

Working 

Docs 

Support 

Team 

Describers 

Original Information 

Manuals 

& Style 

Guides 

Descriptions 

All of the organizations doing recorded Audio Description interviewed for this 

research use a prescreening process where some members of the description team listen to 

just the audio to identify critical points of comprehension failure. And, all of these 

organizations indicated that the script writing and editing are extensive processes that 

40 

Team-based 

Description Process 

Modified Text

include evaluations about the amount and the type of description that can be inserted. As a 

result, the Audio Description insertions rarely show markers of spontaneity such as 

nonfluencies or hesitations. 

3.5 Discussion: The Role of the Consumer 

The role of the consumer is a disadvantaged position in VAD. Where readers of 

books and recipients of SL interpretation each have a different communicative challenge, 

using a non-interactive text and requiring a communicative intermediary respectively, the 

visually impaired face both of these challenges with the described visual information. For 

the consumer, they do not experience both a source text and a modified text. The text they 

receive is the modified one and it is an amalgam of original material and insertions of 

description. Throughout the investigations leading to this report, those involved with 

providing descriptions across all media types consistently expressed an interest in the internal 

mental process of consumers, in how the descriptions they provide are effective or not in 

fulfilling the goals of description. In this way, VAD is similar to the other language systems 

described above because there is no consensus on universal principles for how language and 

thought actually work inside someone’s mind. There are, however, some indications of what 

type of activity listening to VAD could be similar to. 

3.5.1 The Consumers: Actively Building a Text Model 

Research indicates that the process of listening is very similar in terms of basic 

perception and comprehension to reading (Townsend, 1987). So, a person listening to a text 

might be engaging in the same or similar cognitive processes as would someone reading the 

same text. Reading is a process that involves both the decoding of the graphic, or tactile 

41

symbols for Braille (Knowlton, 1996), and cognitive processes of developing a set of mental 

representations called a “text model.” (Carpenter, 1986, Haberlandt, 1988). Because of the 

structural similarities to reading, it can be deduced that the descriptive insertions are 

providing information that contributes to the process of building a text model in the mind of 

listener. 

If the process of receiving a described text is similar to reading, then a number of 

factors will influence the text model that is built. The listener’s personal history and world 

knowledge (schemata), as well as their goals will all probably influence what type of a mental 

representation is built from the text they receive (Wilson, 1986). Since it is the consumer 

that is creating their own understanding as they experience the text, the descriptions should 

be viewed as cognitive tools (Vygotsky, 1934), rather than brush strokes to a painting painted 

by the describer, as pieces of information that are the building blocks of mental structures in 

the mind of the consumer as she actively develops an understanding of the text. 

Anecdotal 

evidence from those 

who practice Audio 

Description supports 

the view that receiving 

description is mentally 

active. Frazier (1975) 

describes how as an 

Figure 6 - Conceptual diagram of consumer's process 

Modified Text 

Consumer’s 

Process 

Original 

Information 


individual listening to a production assembles an understanding of the action from audio 

42 

Text 

Model 

Consumer 

Purposes 

& Goals 

Personal 

History 

World 

Knowledge

clues. A quarter century later, DVS customers in feedback sessions reported that when the 

appearance of a character is described long after a character is introduced to the audience, 

that the new descriptive information can clash with a “mental picture” that the listener has 

already created (Gould, 2002). Stovall, himself blind, and the founder of the Narrative 

Television Network, one of the two largest U.S. producers of Audio Description, described 

that the listener has a mental picture in his or her mind – it may not be exactly what the 

person or place looks like on the film, but it is sufficient for understanding the production 

(Stovall, 2002). 

3.5.2 The Purposes and Goals of Consumers 

Another important dimension in understanding the role of the consumer is 

motivation and purpose. Why do people listen to Audio Described productions or listen to 

textbooks? While these questions may seem elementary, it is important to recognize that this 

area is probably the most important part of the study of VAD and also suffers from the least 

real data. The formal studies of the consumption of VAD were in Audio Description and a 

small amount on visually impaired people using the Internet. There is little documentation 

on the larger practices that people listening to visual description are engaged in. And, the 

discussion below provides no more than a few selected examples that may adumbrate some 

of the larger issues that remain to be explored in individual motivations and uses for 

described material. 

As part of the research for this publication, members of an online community for 

Audio Description were polled on this topic and encouraged to provide insight into this 

process. Most of the responses received indicated that the service of Audio Description is 

43

essential to providing access to productions, but there was little specificity as to type of 

production or information that were of interest. This general view is supported by some 

publications from the American Foundation for the Blind that says that Audio Description is 

used both for the enriching, aesthetic experience of the content of the text and for cultural 

inclusion (AFB, 1991). Visually impaired individuals often report watching movies with 

sighted friends and family and that the understanding of culturally relevant texts, whether 

from individual or group viewing, is useful in social activities. 

One post to an Audio Description online community that predated the poll just 

mentioned indicated that facial expressions were specifically interesting to a congenitally 

blind listener who said “Even though Audio Description does not give me a concrete 

example of the various ways of smiling, it does provide me with very valuable information 

about what kind of expressions may be exchanged between people” (Miller, 2002). In 

another area, WGBH and the American Foundation for the Blind conducted a survey of 

audio description customers that indicated a strong interest in science programming (Kuhn, 

1992b). 

In the area of visual descriptions for textbooks, while it is logical that the consumers 

are visually impaired individuals using this material for educational purposes, the visually 

impaired now make up only 25% of the customers of RFB&D, with the remaining members 

having dyslexia and/or other learning disabilities (RFB&D, 2001b). It was also reported 

informally that the recorded RFB&D materials have been used in classes of reading 

challenged students who are neither blind nor dyslexic. Naturally, these uses are not related 

to VAD, but may impact the type of service provided to the visually impaired students. 

44

The Internet, being a broad publishing medium can by definition be used for a range 

of situations from purely informational to entertainment and education. Research from the 

American Foundation for the Blind indicates that much of the Internet is difficult to use for 

the visually impaired despite the accessible technology (Gerber, 2002a, Gerber, 2001, Slatin, 

2002). But, all indications are that the visually impaired attempt to use the Internet for 

similar reasons as the rest of the population: eCommerce, information, entertainment, etc. 

3.6 Cultural Issues with Description? 

Is there a cultural dimension to VAD? Communication between the sighted 

describers and visually impaired consumers raises interesting challenges regarding traditional 

definitions of culture. Culture is often viewed as a phenomenon that both transmits and is 

transmitted through language. And, culture can be defined as a phenomenon that operates 

on non-linguistic levels(Scollon, 2001b), some of which are influenced by vision. 

And while deaf communities have distinct cultural boundaries with linguistically 

perceptible features (Lucas, 1989, Valli, 2001), members of the blind and visually impaired 

communities, not having a separate language, might not be viewed as a separate culture. 

Further since the majority of blind individuals have had sight at one time and all presumably 

interact with sighted individuals daily, it is difficult to draw a cultural boundary around the 

consumer community. However, within the process of description, certain communication 

issues appear that are similar to the types of issues that appear when people of different 

cultures try to communicate. For example, if a describer uses language that encodes visual 

assumptions (eg: perspective, color, etc.) and the receiver of that description would not 

understand other associated/implied meanings, then miscommunication similar to cross- 

45

cultural miscommunication, although not according to traditional definitions of culture 8 , 

might occur. Further, as Scollon and Scollon state: one culture does not actually 

communicate with another culture; individuals from different cultures do (Scollon, 2001b). 

And, when people communicate they do so in places and with purposes that influence the 

nature of the communication produced (Scollon, 2001a). Within VAD, the places that the 

describers and consumers participate in – the sites of engagement – are very different and, unlike 

in face-to-face communication, these sites of engagement are separated physically, and 

usually by time. While they are at the outer edge of this study’s focus, these types of 

questions are important to ask because, even if the communication span between describers 

and consumers cannot be classified as a cultural divide, there may be sufficient differences 

between the historical, locational, and perceptual orientations between these two groups to 

foster miscommunication similar to the culturally influenced communication failures and 

where cross-cultural sensitivities may be important. 

46

4. STUDY OF AUDIO DESCRIPTION 

The previous section provided a top-down conceptual framework called Visual 

Assistive Discourse (VAD) with a discussion of specific types of roles and factors that might 

affect its success. This section provides a complementary bottom-up and data-driven 

analysis of one specific form of VAD called Audio Description. Of the different varieties of 

VAD, Audio Description (AD) presents the most practical one to study in this forum. Since 

its development as an active process in the early 1980s, AD has been practiced mostly with 

methods that stem from one source and that adhere to specific principles. The other 

established VAD practice, Audio Books, was also investigated as part of this study. But, for 

a variety of methodological and practical reasons, Audio Books was determined to be too 

large and have too many complicating issues to make it a good candidate for the detailed 

language analysis in the time frame for this study. 

This study looks at the stream of descriptive statements in Audio Description as an 

example of language use and aims to answer two questions: 1) what is the constituent 

structure in AD and 2) what types of information are provided and what patterns in 

representation exist within AD? Because this study is the first analysis of the language of 

AD (or VAD for that matter) the information presented will be broad and many important 

opportunities for more research will remain. 

4.1 The Study Corpus 

Within practice of Audio Description, there are a number of important sub-practices, 

each with its own specific challenges. The sub-practice chosen for this study is the 

47

description of films. The data for this study comes from four different video productions 

described by three different description organizations as shown in table 1. 

Table 1 - Study corpus material 

Source Text 

Producer 

Describer 

48 

Description Content 

Length 

Words Length 

A Star Is Born (1937) Selznick International Narrative TV Network 114 Min 6110 37 Min 

L.A. Story (1991) Artisan Entertainment WGBH /DVS 90 Min 4123 29 Min 

Gladiator (2000) DreamWorks SGK WGBH /DVS 148 Min 12337 86 Min 

Gift of Acadia (2000) National Parks 

4.2 Methodology 

Service 

The Washington Ear 14 Min 763 4 Min 

The techniques used to analyze this data are based in large part on spoken discourse 

analysis where the descriptive language is transcribed and then analyzed for structural and 

functional properties. Much of traditional spoken discourse analysis deals with interactional 

conditions. While the process studied is certainly not interactional, there are important 

reasons that these analysis techniques were used as the starting point for the analysis of AD. 

First, the messages the consumer receives are units of speech and so will display properties 

specific to speech. Second, the level of detail used in spoken discourse analysis, that focuses 

on words and utterances in larger contextualized units is a useful way to view units of AD. 

Third, since the creators and consumers of visual description speak the same language and 

are members of the same types of speech communities, this language use can be considered 

a form of spoken discourse, although a very special one.

This approach prioritizes the words of description and does not focus on the 

multimodal issues involved with movies. These multimodal properties are significant, but in 

the interests of space and for publishing concerns, they are subordinated in this analysis to 

the surface representations of the description and relevant dialog. Specific transcription 

conventions and technical issues related to the multimodal issues of these productions are 

discussed in Appendix B. 

4.3 The Structural Components of Audio Description 

Some basic structural definitions are necessary to begin this study. These definitions 

have been derived from the analysis of this AD corpus and also other corpuses of textbooks 

and Internet sites that are not included in this publication with the hope that the terms and 

definitions would be generalizable. 

It would have been convenient and desirable to use the same structural definitions 

used in other areas of linguistic inquiry for VAD. And, in some ways, the language found in 

AD is similar to other language uses. Below the word, at the morphological and 

phonological (word parts and sound component) level, the constituents in AD are identical 

to other forms of spoken language. Above that level, however, at the level that can be 

considered the discourse, different types of structures clearly appear. 

This study proposes a discourse constituent structure based on four components: 

insertions, utterances, representations, and words. Table 2, below, provides some definitions for 

these components. 

49

Table 2 - Summary structural components of audio description 

Basic Structural Hierarchy 

Element Definition 

Insertion: A contiguous stretch of description (Analogous to paragraphs and 

turns in written and spoken discourse) uninterrupted by other 

significant audio content such as dialog. 

Utterance: A continuous stream of words (similar to a sentence) containing one 

or more representations separated by more than ½ second of time 9 . 

Often a gap of 1 second or more separated utterances. 

Representation: An interpreted component of an utterance that conveys information 

about the visual field. Representations having different properties, 

including a focus and a type. 

Word: The words used in Audio Description, presumably like other forms of 

4.3.1 Insertions 

VAD, are a subset of words used in normal spoken/written language. 

They can provide content or function. 

The largest contiguous unit of description is an insertion. This is what Frazier called 

a “bridge” and is almost always bounded by dialog. The term insertion was chosen because 

this is fundamentally what is being done in VAD: descriptions are inserted into another text. 

As examples below will show, this unit could not be considered a paragraph in the traditional 

sense because it does not always express a consistent unit of thought. There are no topic 

sentences or summaries, and the only cohesion devices within them are based on common 

pronominal reference (Halliday, 1985). Neither could it be called a turn or other structure 

50

often considered part of spoken discourse. Without digressing into the issues associated 

with coherence in spoken and written discourse (well beyond the scope of this study), what 

is evident from the transcripts of AD in this study is that insertions are essentially collections of 

utterances. There are no differences in structure or function between the first utterance in an 

insertion and the last or those in between. The utterances inside an insertion are essentially 

interchangeable. Insertions can be of variable length. This corpus contains 842 insertions. 

The shortest are less than one second in length and the longest is over five minutes. The 

mean duration of an insertion in this corpus is 11.09 seconds. 

Transcript 1 below shows these properties of an insertion. In this section taken 

from one movie, there are three insertions in (lines 2-5, 16-17, and 19-24). The only words 

in this production come from a narrator and a describer. The narrator is part of the original 

audio from the source text and a describer is speaking the AD insertions. The describer and 

narrator alternate in a structure that appears similar to turn taking. But, unlike the turn 

structure of a conversation, these two voices are not speaking to each other but each is 

speaking to the audience independently. The describer does not address any of the topics 

in the narration and the narration does not respond to any of the information that is in the 

description. Further, within the descriptions, each utterance reflects an independent 

thought. An analysis of the narrator’s words reveals an expository structure with 

elaboration, contrast, hypothetical constructions and other features of language used to 

convey a range of ideas, while an analysis of the describers words reveals a very different 

type of language use that will be the focus of later portions of this study. 

51

Transcript 1- From “The Gift of Acadia” 1:06 

1 Narrator: The gifts of Acadia and they are many…are simple 

2 Describer:We move over the still waters of Jordan pond toward the twin bare 

domes of the bubble mountains 

3 Bold letters swing out of the screen toward us the gift of Acadia 

4 We continue across Jordan pond toward south bubble mountain 

5 At otter point a huge wave crashes onto a square granite rock .. 

white spray flying 

6 Narrator: It is many ways a gift 

7 First a gift of NATURE .. crafted by the sea 

8 By 500 million years of sediment pressed into rock 

9 Rock rising then subsiding 

10 Glaciers overwhelming and scouring the tops of that rock .. until today 

some scoured rock tops are held by the sea called islands 

11 While some rise free as MOUNtains 

12 Mountains .. westerners laugh .. but in fact that up thrust of granite called 

Cadillac .. its wrinkled bald head gazing out at the sea from 1500 and 30 

feet up is the highest mountain on our nation’s east coast. 

13 But forget that .. because Acadia is not a place for superlatives 

14 On the contrary .. Acadia reminds a society sated with superlatives … 

highest biggest fastest richest … that there are other BETTER values 

15 The value of solitude .. and in solitude contemplation 

16 Describer: A young woman in blue shirt and shorts lies on her back on a rocky 

ridge overlooking the sea below 

17 She is reading a book 

18 Narrator: The value of diversity and in diversity harmony 

19 Describer: A small brown fawn looks at us twitching his left ear 

20 A black-headed loon drifts by 

21 A thin black dragonfly on a green leaf opens and closes its wings 

22 Two little orange-breasted baby robins wiggle their heads 

23 Under water two white-sided dolphins swim smoothly side by side 

24 On the quiet surface of the sea two black triangular dorsal fins 

emerge then curve back down under water 

25 Narrator: Acadia is a meeting ground 

52

This transcript shows that the describer’s language consists mostly of separate 

thoughts. There are only two places where one statement is worded depending on another. 

The first one occurs in line 4 that says “we continue” in reference to line 2 that says “we are 

flying.” The second is the use of the pronoun “she” in line 17 to refer the same woman 

shown in line 16. 

4.3.2 Utterances 

Once a descriptive insertion begins, the listener will encounter a series of one or 

more utterances. The term utterance was chosen because it is a unit of analysis that is 

relevant to the range of speech productions found in conversation (Schiffrin, 1987). 

Utterances can, but need not be, grammatical and were initially defined by Harris as “any 

stretch of talk by one person, before and after which there is silence on the part of that 

person” (Harris, 1951). 

Utterances in AD represent a set of representations about the visible field. Because 

they are independent structures and could usually be rearranged without becoming 

incoherent (in form not in terms of actions), they can be considered like a series of 

snapshots. Utterances can usually be arranged in any way to fill the time available in the 

insertion, and they can be as long or as short as the describer chooses them to be within the 

insertion space. But, as figure 7 shows, most are very short. Almost 60% of utterances are 

between one and two seconds in length and more than 30% are between three and four 

seconds in length. The effect for the listener is that these snapshots of visual information are 

produced as if by a strobe effect where the field is visible for a short period of time and then 

53

epresented as language and then visible again and represented again until dialog or 

meaningful audio from the source text takes over. 

Figure 7 - Length of utterances in corpus 

70.00% 

60.00% 

50.00% 

40.00% 

30.00% 

20.00% 

10.00% 

0.00% 

7.98% 

59.56% 

32.46% 

8.15% 

Much like spoken discourse, utterances are often grammatical, but need not always 

be because context often makes their meanings clear when they are not. Below are two 

examples from transcript 1 that are not grammatical, but meaningful. 


5 At otter point a huge wave crashes onto a square granite rock 

white spray flying 

In line 3, the first part of the utterance describes that something is being read on the 

screen (a type of representation that will be covered below) with “Bold letters swing out of 

the screen towards us.” The describer then continues with the content of what was read 

with “the gift of Acadia.” If an introduction, for example “it reads, ” preceded the part that 

was read, the statement would become grammatical and it would also then consume a few 

more syllables. The second example from line 5 is similar. The first part describes a scene 

54 

1.56% 0.25% 

0-.99 1-2 3-4 5-6 7-8 10+

with action “At otter point a huge wave crashes onto a square granite rock” and the 

ungrammatical clause “white spray flying” is appended without any introduction. But, here 

also the meaning is clear that the white spray relates to the wave crashing that precedes it. In 

these ways, much as is spoken discourse, the elimination of words that are unnecessary for 

meaning to be conveyed can result in more efficient but technically ungrammatical forms. 

Furthermore, utterances can come in many patterns. They can contain a single visual 

feature or action or can include several pieces of information a sequence. For example line 2 

in transcript 2 below indicates a simple action: one character makes a gesture. In line 3 

however, there are two actions: 1) she takes the arm 2) they stroll away. These two actions 

are joined by the connective “and,” but the same effect could have been achieved with the 

use of “then.” The combination of actions need not be sequential. For example, line 9 

shows two actions that occur simultaneously. In this case, they are joined by “as,” but 

simultaneous action is also indicated with “while” preceding the first action. 

Transcript 2 - From "A Star is Born" 16:20 

1 Danny: You’re going to by me a drink come on 

2 Describer: He holds out his arm 

3 she takes it and they stroll away 

4 Later in a bar 

5 Danny: That’s right George there’s nothing like a little rum to take away that 

milk flavor 

6 Describer: The bartender pours two shots of rum into a glass of milk 

7 Later Esther and Danny are drinking the drinks 

8 She playfully punches him 

9 He punches back and catches her as she falls off her stool 

10 Danny: I beg your pardon 

11 Esther: Certainly 

55

4.3.3 Representations 

The previous discussion of utterances introduced the fact that an utterance can 

contain more than one unit of information. In the examples discussed above, the units of 

information were primarily actions presented in sequence or actions that occurred 

simultaneously; or in the case of line 9, two actions envelop a third. But even within an 

utterance that is action based, there can be other types of information. For example, line 20 

from transcript 1 reads: 


This single action is of an object in motion. It also contains a description of a visual 

appearance that tells the listener that this object (a loon) has a black head. Assessing the 

complete sets of meanings any unit of language provides is much larger than the scope of 

this study. That discussion, includes semantics and lexical semantics that are the studies of 

meanings encoded in sentences and words (O'Grady, 2001) as well as pragmatics, the study 

of meanings received by the listener that are not contained within the semantic content 

(Levinson, 1983) and further understanding that is communicated at the discourse level. 

Because the language used in AD is so specialized and does not include vast amounts of 

structures found in the language forms that semantics and pragmatics often draw upon, a 

simplified classification of the different types of information is proposed in this study. 

While it departs somewhat from established linguistic meaning analysis, it is more 

appropriate to the restricted nature of AD. The term chosen for this classification of 

meaning is representation. 

The term representation rather than other linguistic constructs such as “phrase” or 

“clause” is used because, as the data presented below will show, identifiable units of meaning 

56

can come from a range of linguistic forms from words to sentences. The concept of 

representation presented here is the visual information that has been selected by the 

describer to be sent over the auditory channel to the consumer. Representations have both a 

focus and a type. The focus is the person place or thing that the representation is about and 

the type will fall into one of the following seven categories: 

1. Appearance: The external appearance of a person, place, or thing. 

2. Action: Something in motion or changing. 

3. Position: Location of description, location of characters. 

4. Reading: Written or understood information being literally read, 

summarized, or paraphrased. 

5. Indexical: Indicates who is speaking or what is making some sound. 

6. Viewpoint: Relates to text-level information and the viewer as viewer. 

7. State: Not always visible information, but known to the describer 

and conveyed in response to visual information. 

Some examples of these types of representations have already been shown in the first 

two transcripts. Below, each of these types of representations will be discussed using 

additional transcripts. 

Appearance 

Appearance is in some ways the antecedent of all of the other types of representation 

because all representations require an appearance of something in the source text order to be 

realized in the description. But the others, with the possible exception of actions, do not 

convey properties that are externally describable with accuracy. 

Appearance representations are the subset of description that provide information 

about the direct visual properties of something in the source text including luminance, color, 

57

size, and shape. Appearance is realized usually through adjectives and the nouns they modify 

as some examples from transcript 1 illustrate. 

From Transcript 1 

2 We move over the still waters of Jordan pond toward the twin bare 

domes of the bubble mountains 


5 At otter point a huge wave crashes onto a square granite rock .. white 

spray flying 

16 A young woman in blue shirt and shorts lies on her back on a rocky 

ridge overlooking the sea below 

19 A small brown fawn looks at us twitching his left ear 


21 A thin black dragonfly on a green leaf opens and closes its wings 

22 Two little orange-breasted baby robins wiggle their heads 

23 Under water two white-sided dolphins swim smoothly side by side 

24 On the quiet surface of the sea two black triangular dorsal fins emerge 

then curve back down under water 

External appearance can also be conveyed with prepositional attachment as line 16 

shows and also through adverbials. It follows that if a consumer is interested in visual 

information-- in what things look like normally or in certain situations -- that the 

information would be often be provided through appearance representations. 

Action 

Consistent with the examples from transcript 1, shown above, most utterances are 

based on some form of action. Actions can include gestures, movement, and activities. And they 

58

can act as the core representation that other representations are clustered around. Transcript 

3 contains a typical set of action-oriented sequences. 

Transcript 3 - From "Gladiator" 40:26 

1 Maximus: At least give me a clean death .. a soldier’s death 

2 Describer: One guard moves behind Maximus 

3 Then rests his sword point on the back of his neck 

4 Maximus bows his head as the guard raises the sword 

5 Maximus leaps up and buts the guard off balance then catches the 

blade and spears him in the throat 

6 Spinning he chases the second guard whose blade sticks in its 

scabbard 

7 Maximus: The frost .. sometimes it makes the blade stick 

8 Describer: With bound hands Maximus slices the sword across the guard's face 

9 Nearby two other praetorians sit on restless horses 

10 One gallops into the clearing, then twists in his saddle 

11 A sword flies at him end over end 

12 It buries itself in his back 

13 Maximus steps out from the trees glaring 

14 Maximus: Praetorian 

The two insertions in transcript 3 contain action in every utterance. This is a 

representative sample because it shows some of the different ways that action is presented. 

Line 2 shows action that relates the position of one person to another. Line 3 shows an 

action with an object (sword point) and location of the object. Line 4 is an example of 

simultaneous actions and line 5 contains a sequence of actions presented in a list form. Line 

6 represents one action (spinning) as part of another action (although it is likely that the 

meaning intended is two actions in sequence). Line 8 and 13 describe the manner of an 

59

action (with bound hands, glaring), and lines 11 and 12 indicate an action where the agent is 

inanimate. 

The scope of this study does not allow as full an analysis of the action 

representations as would be desirable. The types of meanings associated with different 

English verbs would be a good starting point for a more thorough analysis of the action in 

AD. But, while actions are represented with verbs, not all verbal forms are expected in AD. 

Further a data-driven analysis focusing only on action, the largest part of the AD content 

pie, would probably yield a further subset of action types that are relevant to this domain. 

Position 

Another type of representation that is often associated with actions identifies the 

positions or locations for information being described. Positional representations can act as 

action setters or scene-shifters as transcript 4 shows 

Transcript 4- From "Gladiator" 1:42:30 

1. Cassius: People of Rome … on the fourth day of Antioch .. we can celebrate 

.. the sixty-forth day of the games 

2 Describer: In the crowd Maximus’ servant Cicero looks around 

3 Cassius: In his majestic charity the Emperor has deigned this day to favor the 

people of Rome with an historical final match 

4 Returning to the Coliseum after five years in retirement .. Caesar is 

pleased to bring you the only undefeated champion in Roman 

History .. the legendary Tigris the Gaul 

5 Describer The crowd stands as four galloping horses draw a chariot into the 

arena 

6 Next to the driver a Gladiator salutes the crowd 

7 He wears leather straps across his stocky chest and a metal helmet 

shaped like a tiger's head 

8 On one of the underground ramps leading to an arena gate, 

Maximus swings a short sword 

60

9 Proximo: He knows too well how to manipulate the mob 

10 Maximus: Marcus Aurelius .. had a dream that was Rome Proximo 

Location information working as an action setter will relate characters to each other 

and setting as shown in lines 2, 5, and 6. Scene shifting occurs when a complex scene 

contains multiple perspectives that are alternatively presented to the audience. The scene 

shifts the viewpoint of the audience in but does not advance the action of the movie to a 

new scene. In some ways, it is similar to a flashback or dream sequence that allows for a 

suspension of action. Line 8 from transcript 4 above shows an example of this, as the main 

scene is in the coliseum before a gladiator match, but attention has shifted to a quiet spot 

below the arena. 

While all location information would seem important to viewers not accessing the 

visual component of the text, these scene-shifting descriptions would seem especially 

important to allowing the viewers who cannot see the change in context to be able to 

comprehend the action as a complex scene (typical of film climaxes) unfolds. 

Reading 

Reading occurs when some language or recognizable symbols come on the screen 

and are literally read “as is” by the describer. Line 3 from transcript 1 above is an example 

of information being read. Reading often comes at the beginnings and endings of movies 

when there are credits and titles. It also appears quite frequently throughout some movies in 

various forms. In transcript 5, line 4 from “Gladiator,” a set of words are introduced and 

read to indicate the location that the movie’s action is now in. 

61


1 Juba: Better now? Clean you see 

2 Describer: Maximus lowers his lolling head back onto the wagon 

3 Later the caravan approaches a congested desert town 

4 Words appear Zucchabar Roman Province 

5 A crude amphitheater dwarfs the surrounding red clay buildings 

6 Now in a busy open air tavern an older man with a tough leathered 

face sits by himself swaddled in robes his head wrapped in a black 

turban 

7 He takes careful sips from a small brass cup 

8 Trader: Proximo my old friend 

Transcripts 6 and 7 from “LA Story,” also show the describer reading signs that are 

part of the sets rather than just screen text. 

Transcript 6 - From "La Story" 3:38 

1 Describer he rides in a park with other stationary bikers 

2 a sign reads "stationary bike riding park ...no running” 

Transcript 7 - From "LA Story" 19:00 

1 large white-lettered signs reading "now" hang on the wall 

2. blue lights bathe the hip shoppers 

Transcript 8 below shows a case where a sign in the movie becomes like a character 

(lines 7, 10) and the reading of it is like the speaking of a character. This same transcript 

(lines 13-15) shows that a character is reading from language that is visible in the movie. 

A sighted viewer who could read English would certainly understand what was being 

communicated in this case, but it is unclear if the same would always be true for an AD 

consumer without some descriptive support. 

62


1 Describer: The car's engine dies 

2 He glides off the road stopping behind a digital road sign which 

flashes freeway clear 

3 Harris climbs out of Trudy's Mercedes and lifts the hood while she 

stays seated in the car studying the tilt of her hat in the visor's mirror 

4 A wind shifts the leaves of a weeping willow behind them 

5 Suddenly the lighted sign goes black 

6 Noticing the darkness Harris slowly turns around 

7 The sign flashes "Hi ya" 

8 Harris frowns and returns his attention to the car engine 

9 Harris whips around as the light bulbs explode 

10 Then miraculously regroup to spell "I said Hi ya" 

11 bewildered, Harris points to himself eyebrows raised skeptically 

12 Harris: Hi 

13 ruok? 

14 don’t make me waste letters 

15 R .. U .. O .. K ? 

16 Oh.. Are you OK? Yeah I’m Fine 

17 Describer: The sign says "hug me" 

18 Harris: What? 

In a manner very similar to the way that the speech of person is 

reported/constructed in conversation (Tannen, 1989), the information that is being read is 

introduced through a verb of introduction, for example “read,” “reading,” “says,” “flashes,” 

“appear” as shown as underlined in transcripts 6-8 above. 

Indexical 

Indexical or deictic information is information whose meaning can only be 

determined from context (Levinson, 1983). In conversation, words such as “here” and 

63

“now ” provide meanings for conversants but understanding the meanings requires 

understanding the place and time that the conversation is situated in. In Audio Description, 

a few types of indexical representation were found. In line 5 of transcript 9, the describer 

indicates what object the character line 4 had just mentioned. In this case, in order to 

recover the meaning of this piece of description, the prior dialog is required. 

Transcript 9 - From "A Start is Born" 1:58 

1 Father: Well daughter how was the movie tonight? 

2 Esther: Lovely 

2 She takes off her coat 

3 Boy: Mush that’s what it was just a lot of mush .. there wasn’t anybody 

killed in the whole thing 

4 Father: Oh well then I’ll stick to these.. these don’t talk 

5 Describer: Looking at pictures 

6 Boy That bog cluck Norman Main was in the picture tonight 

Transcript 10 shows another form of indexing where the describer indicates who the 

next speaker is. In line 2, the name Quintus is said by the describer and from accessing the 

video portion of the source text, it is clear that this statement identifies a character as the 

speaker. 


1 Describer: Across the battlefield at the edge of the forest hundreds of barbarians 

wave their swords 

2 Quintus 

3 Quintus: Load the catapults 

4 Describer: On a hill through a light snow the elderly white bearded man watches 

the army prepare 

64

Viewpoint 

Viewpoint representations relate to what the viewer would perceive as affecting the 

entire visual field or text. These include scene changes/shifts, screen and special effects. Scene 

changes are commonly indicated with the marker “now” or “later” as discussed above. 

Transcript 13 is a kind of scene shifter because at this point in the movie, a number of 

different screen effects were appearing in succession and so “next” indicates a change, but in 

this case not necessarily a formal scene change. 

Descriptions of camera effects were fairly rare in this corpus, but transcripts 11-13 

above each show different ways that the viewer’s total perspective can be represented in 

description. Another approach is reflected in the beginning of transcript 1 above when the 

describer says, “we are moving” and “we continue.” A description that preceded the 

transcript began with “we are flying” which reflected what the camera effect was like. It 

should be noted that the only use of “we,” (interpreted to be inclusive of the listener and 

describer) occurrs at the beginnings or productions. 

Transcript 11- From "Gladiator" 1:15:29 

1 Describer Now in the palace a blurred face comes into focus 

Transcript 12- From "Gladiator" 12:00 

1. Describer Surrounded by flames hundreds of men battle in a blur of muted 

color 


1 Describer: Next a montage of funky LA architecture 

65

State 

Description sometimes provide information that is not visually evident but is 

available through the describer’s knowledge of the text. Some of the ways this happens is 

through providing identity or naming, providing relational information about entities that are 

visible, providing internal states including emotions and intention, and specifying time. 

Transcript 10 shows the naming of places. While in the movie “Gladiator” locations 

can be named with screen text as shown in transcript 5 above, in transcript 14, the location 

“Imperial Rome” was not named by the movie producers in this way. This information was 

added by the describers. Also, the buildings were not named in the movie; the describer 

added this information as well. 


1 Describer: As they look at the stands that encircle them the arena seems to spin 

like a carousel blurring the cheering crowd 

2: Now imperial Rome stretches far below 

3: A flock of birds soars over the circus maximus and the coliseum 

Transcript 15 below shows both the naming of a character and the relationship of 

the character to another character in the same utterance. This type of naming appeared to 

occur more with minor characters than main ones. Transcript 15 also shows a common 

revelation of time shift with “Later” that indicates it is later in the script. Because movies 

can contain flashbacks, when a scene changes, viewers may not always know immediately 

that the scene has changed. The use of “later” identifies it as a change that is farther in time. 


1 Describer: Later in his girlfriend Trudy's apartment 

66

Transcript 16 and 176 show example of a character’s internal state being evaluated 

and described. Transcript 16 is an example of the state baldly described, while transcript 17 

shows it embedded in an action. 

Transcript 16 - From "La Story" 90:38 

1. Describer: Slowly conditions clear is spelled over the screen 

2 Content Harris smiles 

3 In an aerial view, other digital road signs along the highways echo the 

same message 

Transcript 17- From "La Story" 25:36 

1. Describer: Now a deluge of mail shoots through the letter slot in Harris' front 

door 

2. from the kitchen he irritatedly kicks wastebasket underneath the 

opening where it catches the streaming mail 

A variation of the description of a character’s internal state is by having “appears..” 

preceding the evaluative phrase. 

4.3.4 Words 

The words used in AD are the same as the words used elsewhere, but a subset of the 

lexical items normally employed in these other language systems. Because AD only allows 

descriptions of what is immediately available in the visual field, large amounts of normal 

vocabulary should never be seen in AD, or in VAD for that matter unless being read. For 

example, there should be no negation, modals, conditionals, past or future tense, anything 

that is hypothetical, or any references outside of the text at the time it is being described. 

Further, because the audience can include members of different disability experiences and 

different backgrounds, a widely accessible vocabulary is used. 

67

Some words in AD seem to be taking on special roles. For example, because the 

reference time in AD is understood to always mean the current time of the Source Text, 

words such as “now” and “later” can serve new functions as scene changers. Also, words 

such as “as” and “while” are markers of parallel action. 

68

5. USING THE DEFINITIONS FOR ANALYSIS 

The previous section provided some structural and functional definitions of Audio 

Description (AD). These definitions and the way they are organized should be viewed as an 

initial and proposed framework for understanding a stream of description. This section adds 

another dimension, opening new territory and also tying together aspects of the two 

previous sections when these structural and functional definitions are presented within the 

context of analysis. Naturally, the analysis of any language system is a large conceptual field 

and this study has already addressed two significant aspects of this type of communication, 

so this section is an abbreviation of what it might have been if the definitions of the 

conceptual framework of VAD and the definition of AD existed prior to this research. In 

addition to connecting concepts raised in the two previous sections, it also relates the 

analysis of this language system to analyses used with other discourse types. In essence, this 

section also connects AD (and VAD) to other systems of language use. 

5.1 Descriptive Mass 

As described in section 4, the insertion is largely a collection of independent 

utterances, and insertions are placed mostly where the texts allows them. Because of this 

role, they provide a window on the total impact of AD on a text. As table 1 above 

illustrated, the amount of time AD takes up in texts is significant, not less that 20 percent 

and in one case almost 60 percent of the total span of the text. Because insertions appear 

where dialog and other audio cues are not, these figures can conversely be seen as 

representing the dialogue free portions of these texts, as the negative space, the portions that 

allow for and may require description. The insertions can be viewed as a quantity of 

69

epresentations that are distributed where the text allows. This descriptive mass could then be 

analyzed according to types and patterns of representations to see the impacts on the 

production. If, for example, this descriptive mass does not contain any descriptions of 

scenery or facial expressions, consumers wanting this information will probably not get it. 

The descriptive mass can also be analyzed in terms of how it occupies the mind of 

the consumer compared to the dialog and other content. Frazier (1975) described periods in 

a performance with what he called “low audio” periods where dialog and other clues were 

lacking and where different types of character, setting, or continuity information (his terms) 

could be inserted. He described how the insertions would provide essential information, 

mostly at the beginnings of scenes. This study reveals a much more pervasive use of 

description than Frazier presented. While Frazier’s 90-minute production, “The 

Autobiography of Ms. Jane Pittman,” had 34 insertions or bridges, contemporary described 

productions have comparatively more as table 4 illustrates, and are distributed throughtout 

scenes. 

Table 3 - Comparison of description mass in different four texts 

Text Length Insertions Utterances Length 

70 

Amount 

Described 

A Star is Born (1937) 111 min 382 692 37 min 20% 

Jane Pittman (1976) 10 

109 min. 34 Unknown 27min. 25% 

LA Story (1991) 90 min. 156 451 29 min. 32% 

Gladiator (2000) 148 min. 269 1391 86 min. 58% 

Taking these feature films as data points, a trend towards less dialog/more 

description is evident, the newer films have a greater need for description and the role of the

describer is increasing historically. Using these films as representative samples, not only is 

the amount of description increasing, but the amount of description in each insertion is 

increasing as well. The 1937 film had more than half of its description in short insertions of 

1-5 utterances in length, while the 2000 production had almost half of its description mass 

allocated to sequences with more than 20 utterances and several of these insertions 

contained over 70 utterances. 

Table 4- Distribution of description mass by insertion length 

Number of utterances per insertion 

21+ 16-20 11-15 6-10 1-5 

Star is Born (1937) 0% 13% 4% 24% 60% 

LA Story (1991) 21% 5% 12% 21% 42% 

Gladiator (2000) 47% 4% 15% 14% 20% 

Figure 8 - Chart version of table 4 data 

Allocation 

of description mass 

utterances per insertion 

100% 

80% 

60% 

40% 

20% 

0% 

Star is Born 

(1937) 

71 

LA Story 

(1991) 

Gladiator 

(2000) 

21+ 16-20 11-15 6-10 1-5

5.2 The Textual Role of Insertion Content 

In addition to looking broadly and quantitatively at the descriptive mass, individual 

insertions can be analyzed qualitatively to look at the portion of a text and the roles in the 

text that the insertion plays. It is clear that long descriptive sequences occur in these texts. 

But, are they conveying information that is supplemental or essential to the text? 

In several parts of this corpus, it is clear that insertions do contain essential plot 

information that is conveyed without dialog. For example, in “Gladiator,” there is a scene 

(35:50) where the Emperor’s son murders his father by suffocation. The dialog and sound 

effects do not make clear that this has happened and when the dead man is discovered in the 

following scene, there is no indication that the son is responsible. In “LA Story,” there is a 

restaurant scene (15:23) where the main romantic characters in the film meet and exchange a 

number of non-verbal signals that are described in detail in the AD. Without the AD to fill 

in the gaps, it is quite possible that visually impaired viewers would not have access to this 

information that is critical and essential to understanding the plot until much later in the 

production, if at all. 

This aspect of the textual significance of insertions has been integrated into both 

NTN and DVS practices, as they will review a movie using only audio cues to determine 

areas where the text is not clear without description (Gould, 2002, Stovall, 2002). 

5.3 Sample Analysis: Persistent Entity Development 11 

Most of the content of AD is about people and things. Many of these entities will 

exist over extended parts of a text. How and when information is presented about these 

entities, when they get named and when they are referred to as new and given, is a 

72

potentially important aspect to modeling the consumer experience. By whatever term is 

chosen, whether a text model (Wilson, 1986) or information state that has been applied to 

conversation (Schiffrin, 1987, Schiffrin, 1994), there exists in the mind of the consumer a set 

of mental representations that reflect their understanding; and one of the main sources of 

these representations is the contents of the streams of description in AD. The streams of 

description and the order that information is presented then become potentially important 

topics of analysis. For example, the main character in “Gladiator” is referred to by five 

different terms in the first fifteen minutes of the production. Table 6 shows these different 

terms and the location (time) in the text they appear. This type of approach seems designed 

to reflect the revelation of information that the authors of the text intended sighted viewers 

to experience because it is only after fifteen minutes that this character is referred to by 

name in the text. 

Table 5 - Referring terms for main character of "Gladiator" 

Time Utterance 

2.32 Now a warrior lifts his head and blinks as if waking from a dream 

3.34 Now the scruffy warrior mounts the earthworks 

3.44 The scruffy faced warrior general smiles and nods to his men 

3.59 Under a heavy mist the general makes his way through hundreds of soldiers taking 

positions in the mud 

15.51 Maximus looks away his eyes searching the field 

This use of different terms for the same individual, called referring sequences, (Schiffrin, 

1994) is also found in spoken discourse. In spoken discourse, new entities that are being 

introduced into a conversation are often marked with a specific introduction such as “there 

73

is” (Schiffrin, 1994). In AD, however, new entities are not marked this way, they are usually 

introduced with “a x” as “a warrior “ from utterance from 2:32 above shows. Subsequently 

they can be referred to definitively with “the” as illustrated in utterance at 3.34. It follows 

that there may be similar patterns for things and places and that the ways that they are 

referred to repeatedly might provide a baseline for how referents can be referred to in Audio 

Description. 

5.4 Sample Analysis: The Scene as Frame 

While the structures of insertions and utterances are significant to understanding 

AD, the productions that AD makes accessible (films, television, plays) are structured 

according to scenes and shots. Frazier had envisioned that description would be inserted at 

the beginning of scenes, but this study reveals a different pattern where insertions occur 

throughout the scene. AD insertions occur before and during scenes. And, scene changes 

are often contained within an insertion. Analyzing AD in terms of scenes is a potentially 

useful perspective because it is through scenes that the author intended their audience to 

perceive the text. 

An important concept in understanding what a scene is experientially is the concept 

of a frame of experience (Goffman, 1974, Tannen, 1993a) also (Bateson, 1972). Tannen 

connects Goffman’s frame of experience to the concept of schema that was described in 

section 3. For the sighted viewer of a production, much of the information associated with a 

scene or frame will likely come mostly with visual cues: they see the scene has changed and 

that different characters are present or not. For the consumer of AD, the cues need to be 

74

embedded in the description. The following example from transcript 2 shows a scene 

change that occurs at the end of an insertion as they sometimes, but not always, do: 

1. Describer: He holds out his arm 

2 she takes it and they stroll away 

3 ----> Later in a bar 

-------- New Scene/Frame 

4 Danny: That’s right George there’s nothing like a little rum to take away that 

milk flavor 

5 Describer: The bartender pours two shots of rum into a glass of milk 

6 Later Esther and Danny are drinking the drinks 

Similar to the restaurant example often used to illustrate schema theory (Schank, 

1977), the fact that the action has been shifted into a bar allows the reference to a bartender 

as an existing entity (“the bartender”) even though it is the first reference to him. As a 

consumer is told about a scene or frame change, it may be important to provide additional 

cues to support the conceptual transition the consumer should make including when new 

characters or objects become relevant. A challenge to consumers may be when scenes 

change in the middle of dialog and there is not the opportunity for an insertion to indicate 

the scene change. 

The sequence of information that is important is not only that which related to 

persistent entities as described above, but also to frames of experience as shown here. 

5.5 Sample Analysis: Utterance Patterns 

It may be of consequence to consumers how information is presented. Just as in 

spoken discourse, the utterance is a unit that is perceived by the listener as a coherent 

75

thought 12 . The utterances in AD and their form can be analyzed to create a picture of the 

manner that information is being presented. One such analysis that may be useful is to look 

at the patterns of representation structure. Table 6 shows short sections from movies by 

two different describers: “A Star is Born” by Narrative Television Network (NTN) and 

“Gladiator” by the Descriptive Video Service (DVS). 

Table 6- Comparison of description styles 

“A Star is Born” 14:46 “Gladiator” 132:54 

Danny gives Mr. Randall a confused glance 

and smiles at Ester 

Underground Maximus runs through a 

narrow passageway 

She steps closer and Danny takes a step back In the yard gladiators ram a praetorian with 

a table 

Danny and Mr. Randall nod at each other Arrows pierce Hagen in the back then the 

chest 

Esther turns and runs up the stairs in tears Two soldiers stab him 

Danny follows her He kneels blood dripping from his mouth 

then falls 

He stops her as she begins to enter her room In the passageway Maximus turns a corner 

Danny gives Mr. Randall a confused glance 

and smiles at Ester 

He stops and tosses aside his torch 

She steps closer and Danny takes a step back Directly in front of him a stone archway 

leads outside 

The structures of the utterances reveal different approaches to describing what are 

essentially sequences of actions. In the NTN description, each utterance begins with the 

subject (representational focus) and then the action is presented. The theme of each of these 

utterances is the person or group performing the action (Halliday, 1985), while the DVS 

76

selection shows a wider range of representational combinations. The DVS style has a more 

varied structure with the theme or focus often different from the actor. A question for 

consumers might be whether this variety of representation and utterance variation is helpful 

and interesting or confusing and distracting. 

5.6 Sample Analysis: Representational Combinations 

In the example above showing a change of scenes, two representations were used in 

combination, the state representation “later” and the location representation “in a bar,” to 

signal the new scene. Within this corpus, a location by itself rarely started a scene; it usually 

shifts focus to a different viewpoint of the same scene. Also, “later” by itself, rarely starts a 

scene. It indicates time has shifted. While a detailed analysis of these combinations is 

beyond the scope of this study, this is an area that deserves more attention because these 

combinations may be important for the development of standards that allow a few key 

words to operate as markers on a number of levels in AD as they can in conversation 

(Schiffrin, 1987). 

Representational combinations may also be important because they may reflect a 

type of filtering or implicit encoding on the part of the describer that may or may not be 

optimal for the consumer. For example, co-occurrences of location and appearance 

representations in the same utterance were extremely rare (less than .5%) in this corpus. 

Either seemed to exist more easily in the same representation as an action, but not together. 

Is this related to the concept of consciousness having a primary and peripheral foci (Chafe, 

1994) and that attention to location, reduces the attention to visual appearance? Or, is it 

possible that this is a result of the fact that the describers are able to see the visual image of 

77

the location as they are describing (something consumers can not do) and so the need to 

describe the appearance of a place is not evident to describers when descriptions are created? 

5.7 Analysis Challenges: Time, Reality, and Cultural Elements 

There are three other areas that will be briefly discussed as analytical challenges. All 

three are significant and cannot properly be addressed in the scope of this study. The first 

topic is time, and it affects AD in a number of ways. First, it can constrain the amount of 

space for description and the location in the text where the descriptions can be inserted. For 

some productions this a major issue for describers. Another time aspect comes within an 

insertion when certain audio cues (ex: a glass falls) that requires part of the insertion to 

reflect the short audio element further constraining the arrangement of utterances. The 

second of these is reality, which presents other challenges for describers because while 

everything that is described is supposed to be in a direct and non-evaluative manner, certain 

visual effects are intended by the producers to reflect an imaginary or dreamlike state. 

Language and visual presentations have different approaches to treating reality (Leuween, 

2002). Time and reality intersect with flash-backs and flash-forwards that suspend the time 

and reality that is active in the text and present another reality and time that is a like a special 

frame inserted into some other text frame. Third is the cultural significance of certain 

scenes. As many have discussed, images of various celebrities (Barthes, 1957) and culturally 

recognized images can be used as independent symbols and their appearance in a cultural 

production is often expected by the author of a text to have specific significance (Stephens, 

1998). All three of these aspects of AD – time, reality, and cultural significance -- are 

important factors relating to the decisions that describers must make. 

78

6. SUMMARY: A LANGUAGE SYSTEM 

This study began with the assertion that the practices of using language as a 

substitute for visual information in electronic textual settings is a specific type of language 

use. This study has provided evidence from four different perspectives and in each of these 

perspectives, this language system, Visual Assistive Discourse (VAD) and its variant Audio 

Description (AD) have been shown to have characteristics in common with and distinct 

from other uses of language. 

VAD was explored initially in section two as a set of practices that exist for social 

reasons, to provide access to visually dependent electronic texts through assistive 

technology. These existing social practices act as the foundation for this language system 

because language is socially constituted phenomena (Gee, 1999, Halliday, 1978, Scollon, 2001b). 

The four practice areas that were discussed in this study are all new additions to human 

existence, far newer than the 100,000 years and 5,000 years that speech and writing are 

thought to have developed respectively. Even though these practices are performed by 

different types of organizations and based on different methods, they serve a common set of 

consumers and have the same goals: to provide access to information that without the use of 

language would be largely inaccessible; and to provide that information using language in a 

way that allows the recipient the maximum opportunity to use the text in a manner that the 

authors of the text intended. 

In section three, these practices are abstracted to present a definition of this language 

system including role types and textual components. The common communicative 

properties and participant structure including a describer, consumer, and source and 

79

modified texts were presented and compared to other language processes. Similar to spoken 

discourse, the product of VAD is usually received as speech. Similar to written 

communication, the communication process of VAD is, from author to consumer (through 

describer), unidirectional. Similar to interpreted sign language/speech events, VAD requires 

an intermediary to enable the process. But, unlike any of these other language systems, in 

VAD, the source information is not language, but visual information, which is a different 

phenomena, and the describer, rather than just converting visual information to language, is 

making decisions that are affected by the multimodal texts that the visual information sits in. 

The describer is making additions that are like informational prosthetics to create a new text 

that is accessible for the consumer who will process it to create in his own mind a 

representation influenced by both descriptive language and residual information including 

dialog and auditory cues (Frazier, 1975). 

Section four is data driven. It takes one form of VAD, Audio Description (AD) as 

the subject of a descriptive study. A corpus of four films with more than 150 minutes of 

AD were transcribed and used in a study that looked at the structural and functional 

components of the streams of description. This data provides clear evidence of a type of 

language that is different in form and function from much of written and spoken discourse. 

While the words in AD are drawn from a select group of the common language, they 

represent a restricted set. Because the words used in AD are restricted based on tense, type 

and modality -- reflecting only real rather than any unreal (irrealis) states -- the language 

system of AD can not be considered a simply a dialog or a register (Gregory, 1978). Further, 

AD includes different types of representations that are not relevant to written or spoken 

discourse. The world that the describer describes is focused on the visual field of a movie 

80

(video) screen and description can cover both the surface appearance of elements inside that 

screen and other textual information. In fact, the majority of the AD utterances in this 

corpus were about actions rather than appearance; they were mostly descriptions of what 

was happening rather than what things look like. This bias towards action is likely connected 

to the type of text (movies) used in the study. The descriptions also contained information 

that was read, information about the production, the changes of scenes and shifts in time 

and focus as well as names of and relationships between visual elements. 

In section 5, the data collected in the study of Audio Description was used to build 

on the structural and functional definitions that had been established earlier by using them as 

the object of sample quantitative and qualitative analysis. This section shows that when AD 

is viewed as linguistic data, it can be understood in terms familiar to spoken discourse 

analysis including referring sequences (Schiffrin, 1994) and frames of experience (Tannen, 

1993a, Tannen, 1993b). This linguistic data for AD can be used to compare different types 

of description to see evidence of different choices that describers make in their creation of a 

text. And, it showed how critical AD is to understanding movies, especially those produced 

recently that contain long periods with little or no dialog and where essential textual 

information is often contained in long sequences with no dialog at all. 

As this study concludes with providing the first formal definition of the language 

system Visual Assistive Discourse (VAD) based on actual practices, it leaves much work to 

be done. VAD is today a concept. It is born as an analytical construct in this thesis, and is 

constituted in principles that influence how visual descriptions are created including the 

recognition that a description is not an isolated phenomena that relates directly to visual 

information, but is actually situated in a multimodal text. The practices and language that form 

81

the basis for this study have been addressed with evidence from interviews, published 

reports, research into language use and analysis, and a corpus of the productions of language 

drawn from one of many representative sources. This study concludes that Visual Assistive 

Discourse – using language as a surrogate for vision: to augment another text and make it 

accessible to the visually impaired -- is a language system that is distinct and can be 

described, measured, and taught. 

82

7. FUTURE STEPS 

This study has placed VAD in a linguistic framework and opened it up to the 

benefits of large bodies of linguistics research. The theoretical and empirical base developed 

in this study can now be used as a foundation to support further research of AD and VAD 

in ways more consistent with established language systems. Below are some discussions of 

research paths that are either indicated as a result of this study and/or more practical now 

that this study has been completed. 

7.1 Consumers Study 

Understanding the recipients of communication is an important part of 

understanding the effects of any communicative process. This is an aspect that is especially 

challenging for VAD because there is no typical visually impaired individual and because at 

the time a description is created there is no knowledge of who the specific consumers will be 

or how they might fit into the wide range of potential combinations of age, history, and 

disability factors. A number of studies have been undertaken into the usability of the 

Internet (Gerber, 2002a, Gerber, 2001, Gerber, 2002b), the television viewing habits of the 

visually impaired (Kuhn, 1992b), and the benefits of Audio Description (Lovering, 1993, 

Packer, 1996) that consider the perspectives of consumers of described media. Additional 

study that builds upon this work in ways to develop a fuller profile(s) of members of the 

visually impaired community as users of electronic media would be helpful for both 

researchers and practitioners. While there is no typical member of this consumer 

community, perhaps there are prototypical types that could be constructed . 

83

This study has not specifically supported this important research area other than 

reinforce the need for it. Discussions with researchers and advocacy groups during this 

study identified many practical challenges to pursuing this line of inquiry including such 

basic factors as the fact that many visually impaired individuals are not members of easily 

identifiable groups for contact and may not use or have access to technology that would 

assist them (Kaye, 2000). This type of research is envisioned as best performed in concert 

with advocacy groups and description providers, those organizations with existing contacts 

to the consumer community. The methods could be ethnographic (if practical) and could be 

supported by surveys, interviews, and linguistic analysis of feedback and comments. It is 

important to recognize that the consumers of VAD are not in an empowered position with 

respect to VAD, and may be reluctant to offer what would seem to be criticism of a service 

that is so clearly important. 

7.2 Supporting Further Developments of Audio Description 

Audio Description is an evolving field and is growing in use. It is now practiced by 

more organizations in more countries than before. This study supports further investigation 

into AD by providing a baseline set of definitions. During the course of this study, certain 

questions about AD arose including: 

o Are there optimal structural patterns for AD insertions/utterances? 

o Does the preponderance of grammaticality in these structures limit the 

amount of information that can be transmitted? 

84

o Does the quantity of time that AD consumes, usually more than any one 

character, present issues in terms of ear fatigue, and would multiple voicing 

or different descriptive approaches be useful in optimizing long sequences? 

o Is the style of Audio Description used in movies (and perhaps other recorded 

media) weak in areas such as special effects that were not relevant to the 

performance roots of the method? 

o Are there techniques from one variant of AD that would help the others 

such as the creation of program notes as in theatrical AD? 

At the time of this study, there have been active discussions regarding standards and 

guidelines. While these discussions are the rights and responsibilities of the participants, 

especially the consumers, the structures developed in this study could be useful topics for 

facilitating these discussions. Another type of follow-on would be to study other forms of 

Audio Description such as television or live performances to add to refine the taxonomy of 

representations this study produced and to provide a basis for comparison. There is also 

certainly much work to be done in furthering the descriptive study begun in this report and a 

more thorough investigation of action representations, referential approaches, and 

experiential frames seem useful paths of inquiry and paths with relevant literature from other 

linguistics studies. 

7.3 Descriptive & Comparative Studies of Other Forms of VAD 

The practices of providing audio textbooks and developing accessible websites are 

parts of VAD that have significant investments in the description process. But, they have 

not been studied in depth and it is unlikely that most of the organizations providing these 

85

services would consider what they are primarily doing as describing visual content, but that 

rather, it is a component (maybe a lesser one) of their efforts. Not surprisingly, the initial 

research into the descriptions in these areas revealed a wide range of approaches, even 

within the same text. 

Descriptive studies of textbooks and the Internet would provide significant new 

information into the range of representations and approaches used in VAD and would 

complement the description of AD done in this research. These descriptive studies could be 

used for comparison of style and approach to the role of the describer and would certainly 

yield important insights. 

7.4 Human Subjects Studies with AD 

Throughout the course of the research for this study, human subject studies to 

measure the effectiveness of AD, were discussed. For a variety of reasons, not the least of 

which is that there was no baseline definition of AD that could be used to structure such a 

study; this type of research was not attempted. The definitions and analyses provided here 

can be used to structure these types of studies to compare styles of description, the 

descriptive language itself, and limits and tolerances of listeners when exposed to long 

segments of description. There are fundamental questions that almost all describers 

interviewed for this study expressed about how description works inside the mind of the 

consumer and how those effects could be maximized. 

7.5 Educational Materials Study 

Educational materials are an especially difficult issue for the blind and visually 

impaired. At the time of this study, there are a number of initiatives to address educational 

86

materials using digital technology and technology that should allow visual information as 

well as language to be transmitted. These efforts include a national initiative to exchange 

textbook information electronically (CAST, 2002). This technology, and the human 

practices that include creating tactile and other alternative representations for the visual 

information, should also be capable of supporting visual descriptions and the range of 

descriptions that exist in VAD. A study of the visual description issues associated with 

educational material, perhaps involving a subset of the consumer study mentioned above 

focusing on prototypical student types, would be an important contribution to the 

understanding of some of the challenges that exist in making accessible instructional 

materials usable to more students 

7.6 Assistive Technological Research 

Currently, a number of organizations are involved in the development of assistive 

technology. This technology, both hardware and software will create opportunities and may 

impose limits on VAD consumers in the future. While using language as a replacement for 

images is not the only way to provide some of this information to the visually impaired, it is 

the only solution for certain types of texts and can be cost a effective solution for many 

others. An investigation into the current research and standards efforts into future assistive 

technology would provide an opportunity to inform them with an understanding of the 

linguistic possibilities of VAD. If these efforts to develop the next wave of assistive 

technologies were aware of and understood the linguistic dimensions of the Visual Assistive 

Discourse, perhaps those technologies could be designed to optimize the language 

experience of the consumer. After all of the efforts for regulation and service delivery, and 

87

after all of the technology used in attempts to remove barriers and provide an equivalence of 

experience, these assistive practices are still fundamentally processes of human 

communication. 

88

APPENDIX A: GLOSSARY 

Accessibility: Refers to whether an individual with a disability can even gain 

minimal access to information. Does not imply that the 

information is in a form that is meaningful or relevant but 

only not unavailable due to a barrier. 

Audio Description: Traditionally refers to human voice description for live events 

and performances, including television and movies and 

museum exhibits. Occasionally, this term is used to mean any 

description done through the human voice for visually 

impaired and blind individuals. 

Consumer: A person who receives the visual description. Also called a 

description consumer. 

Describer: The person or organization/group that is responsible for the 

modified text and the description. 

Described Video: Term used by Federal Communications Commission and 

others to refer to a video product that has Audio Description 

added to it. 

Description Process: The events including previewing, writing, editing, and 

narrating that creates the described text. 

Descriptive Mass: Applies to the total set of descriptive content as a group. 

Electronic Text Descriptions: Descriptions that go on Internet sites or other digital 

publications. Could be rendered in Braille or synthesized 

speech. 

Experiential Equivalence: Concept that the experience that a disabled person has, while 

not the same as, is equivalent in fundamental ways to the 

experience the non-disabled enjoy. 

89

Insertion: This is a set of one or more utterances that are read or heard 

as a continuous stream by the description consumer. In 

Audio Description, insertions are usually positioned between 

actor dialog and other significant audio. 

Modified Text: The modified text is the one with visual description, the one 

that a consumer would experience. 

NTN: Narrative Television Network, a major describer for 

broadcast television. 

Representation: A piece of information that can be inferred from a 

description segment. Can be of a type and have a focus. 

Representation Focus: The part of the visual field or element not in the visual field 

that the description relates to. Can be a person place or thing 

or the entire visual field. 

Representation Type: Category of information provided in the representation. For 

Audio Description, seven categories were found. 

Restricted Description: A description that is limited in size by the text and/or the 

technology of the text. 

Source Text: The material that is being described before the description is 

added. This is the video/film/book/play/event that the 

describer sees. 

Text: Text in this sense means a composed body of information 

such as a story, play, movie, or textbook. 

Text Model: A term used in reading research to describe the mental image 

of information (propositions and concepts) contained in 

something that is being read, or in the case of this study 

listened to. 

90

Textually Situated: Means that the visual elements, and the descriptions for 

them, are not independent entities without context. The 

context is the text, such as a movie or an Internet site, that 

they are placed in, and properties of the text and how they are 

placed will affect their meaning. 

Transrepresentation: The creation of one modal artifact to represent another as in 

words being used for visual information. 

Unrestricted Description: A description that can be of any length. 

Utterance: A unit of description that is provided as one continuous 

speech unit; similar to an utterance in spoken discourse. 

Usually is grammatical, but might not be. Can contain one or 

more representations. 

Visual Assistive Discourse: A term introduced in this study to mean the process of 

providing visual information through language across the 

various text types that the practice is employed in. 

Visual Description: An informal term to indicate either the practice of Visual 

Assistive Discourse or an instance of description. 

Visual Description Practice: A term used in this study to denote the real-life practices that 

are currently based on a type of text or an organization. 

Internet textual equivalents for section 508 and Audio 

Description are to examples of visual description practices. 

91

APPENDIX B: TRANSCRIPTION & MULTIMODAL ISSUES 

Transcription Conventions 

The transcription conventions used in this investigation are based upon principles 

and approaches used in spoken discourse analysis (Tannen, 1989). In transcripts, the 

following notation is used: 

.. Perceptible pause of less than ½ second 

… Perceptible pause of ½ second or more 

CAPS Indicates emphatic stress 

[ ] Around overlapping speech (extremely rare) 

� Arrow indicates significant points 

/ / Slashes indicates uncertain transcription (extremely rare) 

Multimodal Issues 

Many discourses, including face-to-face conversation, operate with concurrent 

communication modalities. The non-language parallel properties can include gestures, 

Indexical references, and environmental information. In the texts that are used with visual 

assistive discourse, these parallel properties are often extremely rich in information that is 

key to understanding the nature of the description and how the descriptive statements 

interact with the regular source text statements. One option for dealing with the multimodal 

issues would be to publish this document with rich media so that the source text could be 

presented in the same forum as the description and analysis. The option chosen for this 

study takes a more economical approach. Since all of the material used for this study is 

92

published, all of the transcriptions will be connected to the published work by reference to 

the text and time position of the sequence transcribed. 

Sub second timing 

The technology used to record and playback the films used in this study did not 

show times below a second. As a result, the times reported in transcriptions were reported 

with the time showed, which causes some loss of precision. With a corpus over 150 minutes 

of transcribed content, any cumulative loss of precision with the transcription of individual 

utterances is expected to be negligible. 

93

APPENDIX C: VERBAL DESCRIPTIONS FOR FIGURES 

Below is a table that has two columns. The first column is the name of the figure 

and the second column contains a descriptive text. 

Figure 1 – 

Different 

Practices of 


Description 

in 2002 

Figure 2 - 

Conceptual 

View of 

Internet 


Figure 3 - 

Overview of 

VAD and 

other 

prototypical 

communicati 

on processes 

This diagram has four circles/ovals to denote the four description areas of 

Audio Description, Audio Books, educational software/rich media, and 

Internet sites. Audio Description is larger than the others and has an oval 

where the others have circle. Inside the Audio Description oval are three 

circles titled Live Description 1981, Described Video Early 1970’s, and 

Audio Tours 1980s. The three circles to the right of the Audio 

Description oval are titled Audio Books 1948/1971, Software & 

Interactive Media Late 1990s, and The Internet & MultiMedia Late 199s. 

This diagram is a set of nested ovals with no overlaps. The outer oval is 

titled Internet. Inside this are two ovals titled Multimedia Documents and 

Live (Simulcast) Events. The Multimedia oval is further broken down into 

four sub ovals titled Text and Hyper Text, Still Images, Moving Images, 

and Interactive Elements. 

This diagram contains four sub diagrams each with its own label. 

The first sub-diagram is labeled Face-to-face conversation and it has two 

circles both labeled conversant with a box in between labeled coconstructed 

text. Two-directional arrows connect each conversant to the 

text. 

The second sub-diagram is labeled Interactional Sign Language 

Interpretation and this features three circles labeled deaf conversant, 

hearing conversant, and interpreter. A box labeled hearing text is in 

between the hearing conversant and the interpreter and it is connected 

with bi-directional arrows to both. A box labeled signed text connects the 

deaf conversant and interpreter and is connected to both with bidirectional 

arrows. A third is in between all three and is labeled visual text 

and is connected to the interpreter, deaf conversant, and hearing 

conversant with bi-directional arrows. 

The third sub-diagram is labeled written communication and contains two 

circles labeled author and reader. In between is a box labeled composed 

text and one-way arrows go from the author to the text and then from the 

text to the reader. 

The fourth sub-diagram is labeled Visual Assistive Discourse and contains 

three circles labeled author, describer, and consumer. A box labeled source 

text is connected with a one-way arrow from the author. The source text 

has two arrows leading from it. One goes to the describer and one goes to 

another box labeled modified text. From the describer is an arrow and a 

94

Figure 4 - 

Chafe's View 

of Immediate 

Mode 

Figure 5 - 

View of a the 

description 

process 

Figure 6 - 

Conceptual 

diagram of 

consumer's 

process 

Figure 7 - 

Length of 

utterances in 

corpus 

box labeled insertions that is connected to the modified text by a one-way 

arrow into the modified text. A one-way arrow goes from the modified 

text to the consumer. 

This diagram has three boxed. A small one at the top is labeled 

environment. From it an arrow leads to a large box labeled 

EXTROVERTED CONSCIOUSNESS. Inside this box are two labels: 

represented and representing. The arrow has labels for perceiving, acting, 

and evaluating next to it. From the large box is an arrow to a small box 

labeled language. Next to this arrow is the label speaking. At the bottom 

is the label “speaking in the immediate mode.” 

This diagram is an expansion of the part of figure 3 dealing with the 

describer and is labeled team-based Description Process. It has three solid 

circles labeled Text Producer, Support Team, and Describers, and four 

boxes labeled Source Text, Working Documents, Manuals & Style Guides, 

and Modified Text. There is a large dotted circle that enclosed the 

Describers, Support Team, Working Documents, and Manuals & Style 

Guides boxes. The Source Text box is half inside and half outside the 

dotted line. 

A one-way arrow goes from Text producer to Source Text and a two-way 

arrow goes from text Producer to Support Team. A One-way arrow goes 

from Support Team to Working Docs and then to Describers. A one-way 

arrow goes from Source Text to Modified Text another to and to 

Describers. The describer has arrow coming from Source Text, Working 

Docs, and Manuals & Style Guides, and a double arrow coming 

connecting to support team. A one-way arrow goes from Describers to 

Modified Text. 

This diagram is conceptually similar to figure 5, but it focuses on the 

consumer. 

A circle labeled consumer is surrounded by four boxes labeled Text Model, 

Personal History, World Knowledge, and Purpose and Goals that sit 

across a dotted outer circle. Outside this outer circle is a box labeled 

Modified Text. An arrow comes in from the modified text to the 

Consumer. Arrows also come in to consumer from Personal History, 

World Knowledge, and Purpose and Goals and a double arrow connects 

Consumer to Text Model. 

This bar chart contains the following data: 

Length of utterances in seconds 

Duration 0-.99 1-2 3-4 5-6 7-8 

Percentage 7.98% 59.56% 32.46% 8.15% 1.56% 0.25% 

95

Figure 8 - 

Chart version 

of table 3 data 

Percentage 7.98% 59.56% 32.46% 8.15% 1.56% 0.25% 

Number 195 1455 793 199 38 6 

This bar chart contains the following data: 

Number of Utterances in Insertion 

1-5 6-10 11-15 16-20 21+ 

Star is Born (1937) 60% 24% 4% 13% 0% 

LA Story (1991) 42% 21% 12% 5% 21% 

Gladiator (2001) 20% 14% 15% 4% 47% 

(Campbell Et.Al, 1934) (Martin/DVS, 1991) (Franzoni, 2000) (NPS/Ear, 

2000) 

96

REFERENCES 

Access Board, United States Government Architectural and Transportation Barriers 

Compliance Board. 2001. Electronic and Information Technology Accessibility 

Standards. In Section 508 of the Rehabilitation Act Amendments of 1998. Washington DC: 

Architectural and Transportation Barriers Compliance Board. 

AFB. 1991. A Picture is Worth a Thousand Words For Blind and Visually Impaired Persons 

Too: An Introduction to Audiodescription. New York: American Foundation for the 

Blind. 

AFB. 2000. Education: An Overview. New York: American Foundation for the Blind. 

AFB. 2001a. Quick Facts and Figures on Blindness and Low Vision. New York: American 

Foundation for the Blind. 

AFB. 2001b. Statistics for Professionals: American Foundation for the Blind. 

Alonzo, Adam. 2001. A Picture is Worth 300 Words: Writing Visual Descriptions for an Art 

Museum Web Site. Paper presented at Center On Disabilities: Technology And Persons 

With Disabilities Conference 2001, Northridge. 

Artic Technologies, Inc. 2002. What is a Speech Friendly Site?: Artic. 

ASTC. 2001. Best Practices 

Audio Description: Association of Science-Technology Centers Incorporated. 

Baquis, David. 2002. Meetings and email conversations, with Phil Piety. Washington, DC. 

Barthes, Roland. 1957. Garbo's Face. 

Bateson, Gregory. 1972. Steps to an Ecology of Mind. Chicago: University of Chicago Press. 

Board, Access. 2001a. Web-based Intranet and Internet Information and Applications 

(1194.22): United States Access Board. 

Board, United States Government Architectural and Transportation Barriers Compliance. 

2001b. Electronic and Information Technology Accessibility Standards. In Section 

508 of the Rehabilitation Act Amendments of 1998. Washington DC: Architectural and 

Transportation Barriers Compliance Board. 

Burnham, Betsy. 2002. Email regarding APH descriptions for visual information, with Philip 

Piety. Washington, DC. 

97

Campbell Et.Al, Narrative TV Network. 1934. A Star is Born, Described by Narrative TV 

Network, ed. William A. Wellman. 

Carpenter, Patricia A; Marcel Adam Just. 1986. Cognitive Processes in Reading. In Reading 

Comprehension: From Theory to Practice, ed. J Orasanu. Hillsdale, NJ: Lawrence Erlbaum. 

CAST. 2002. The National File Format Initiative at NCAC: Center for Applied Special 

technology. 

Chafe, Wallace. 1994. Discourse, Consciousness, and Time. Chicago: University of Chicago Press. 

Corn, Anne L.; Wall, Robert S. 2002. Access to Multimedia Presentations for Students with 

Visual Impairments. Journal of Visual Impairment and Blindness 96:197. 

Dwyer, Francis M. 1978. Strategies for Improved Visual Learning: A Handbook for the Effective 

Design, and Use of Visualized Materials. State College, Pennsylvania: Learning Services. 

Elbers, Loekie; Loon-Vervoorn, Anita van. 1999. Lexical Relationships in Children Who Are 

Blind. Journal of Visual Impairment and Blindness 93:419. 

Franzoni, David et al/DVS. 2000. Gladiator, ed. Ridley Scott. 

Frazier, Gregory MA. 1975. The Autobiography of Miss Jane Pitman: An all-audio 

adaptation of the teleplay for the blind and visually handicapped, Film and 

Communication, San Francisco State University: Masters. 

Gamma, Erich and Richard Helm, Ralph Johnson, John Vlissedes. 1995. Design Patterns: 

Elements of Reusable Object-Oriented Software. New York: Addison-Wesley Longman, Inc. 

Gee, James Paul. 1999. An Introduction to Discourse Analysis Theory and Method. New York: 

Routledge. 

Gerber, Elaine Ph.D. 2002a. Surfing by Ear: Usability Concerns of Computer Users Who 

Are Blind or Visually Impaired. In Access World. 

Gerber, Elaine and Connie Kirchner. 2001. Who's Surfing? Internet Access and Computer 

Use by Visually Impaired Youths and Adults. New York City: American Federation 

of the Blind. 

Gerber, Elaine Ph.D. 2002b. Conducting Usability Testing With Computer Users Who Are 

Blind or Visually Impaired. Paper presented at 17th Annual International Conference of 

California State University Northridge (CSUN) "Technology and Persons with Disabilities", 

March 18-23, 200, New York. 

Goffman, Erving. 1963. Behavior in Public Places: Notes on Social Organization of Gatherings. New 

York: The Free Press. 

98

Goffman, Erving. 1974. Frame Analysis: An Essay on the Organization of Experience. Cambridge, 

Massachusetts: Harvard University Press. 

Goldberg, Larry. 2002. Email communication, with Phil Piety. 

Gould, Bryan. 2002. Conversation at WGBH, DVS, with Phil Piety. Boston, MA. 

Gregory, Michael & Susanne Carroll. 1978. Language and Situation: Language and Society. 

Boston: Routledge & Kegan Paul. 

Haberlandt, Karl. 1988. Component Processes in Reading Comprehension. In Reading 

Research: Advances in Theory and Practice, ed. M. Daneman. San Diego: Academic Press. 

Halliday, M.A.K. 1978. Language as a social semiotic. Baltimore, Maryland: University Park 

Press. 

Halliday, M.A.K. 1985. An Introduction to Functional Grammar. New York: Arnold. 

Hardy, Steven Thomas. 2000. Vygotsky's Contributions to Mentally Healthy Deaf Adults. 

Washington, DC: Gallaudet University. 

Harris, Helen. 2000. Reply Comments of Helen Harris, ed. Federal Communications 

Commission. Washington DC. 

Harris, Zelig. 1951. Methods in Structural Linguistics. Chicago: University of Chicago Press. 

Hatim, Basil. 1997. Communication Across Cultures: Translation Theory and Contrastive Text 

Linguistics: Exeter Linguistics Studies. Exeter. 

Holsánová, Jana. 2001. Picture Viewing and Picture Description: Two Windows to the Mind. Lund, 

Sweden: Lund University Cognitive Science. 

Iedema, Rick. 2003. Multimodality, resmiotization: extending the analysis of discourse as 

multi-semiotic practice. Visual Communication 2:29-57. 

Kaye, H. Stephen. 2000. Disability and the Digital Divide. Washington, DC: U.S. 

Department of Education. 

Kerscher, George. 2001a. Converging Standards in Electronic Books: The Daisy 

Consortium. 

Kerscher, George. 2001b. Theory Behind the DTBook DTD: The Daisy Consortium. 

Knowlton, Marie and Robin Wetzel. 1996. Braille Reading Rates as a Function of Reading 

Task. Journal of Visual Impairment and Blindness. 

99

Kress, Gunther & Theo Van Leeuwen. 2001. Multimodal Discourse: The Modes and Media of 

Contemporary Communication. New York: Arnold/Oxford University Press. 

Kress, Gunther, Theo van Leeuwen. 1996. Reading Images; The Grammar of Visual Design: 

Routledge. 

Kuhn, David. 1992a. The Use of Descriptive Video in Science Programming. Boston: 

WGBH Educational Foundation. 

Kuhn, David; Corinne Kirchner. 1992b. Viewing Habits and Interests in Science 

Programming of the Blind and Visually Impaired Television Audience. New 

York/Boston: American Foundation for the Blind, WGBH Educational Foundation. 

Lemke, Jay. 2002. Travels in Hypermodality. Visual Communication 1:299-325. 

Lessig, Lawrence. 1999. CODE and Other Laws of Cyberspace. New York: Basic Books. 

Leuween, Theo Van. 2002. Ten reasons why linguists should pay attention to visual 

communication. Paper presented at Georgetown University Roundtable, Georgetown 

University. 

Levie, W. Howard and Richard Lentz. 1982. Effects of Text Illustrations: A Review of 

Research. Educational Communication and Technology Journal 30. 

Levine, Barry. 2002. Digest Number 267: Audio Description International. 

Levinson, Stephen. 1983. Pragmatics: Cambridge Textbooks in Linguistics. Cambridge, 

England: Cambridge University Press. 

Lovering, Sharon. 1993. Video Description Brings Enjoyment to All. In The Braille Forum: 

American Council of the Blind. 

Lucas, Ceil. 1989. The Sociolinguistics of the Deaf Community: Academic Press, Inc. 

Martin/DVS, Steve. 1991. LA Story, ed. Mick Jackson: WGBH Descriptive Video Service. 

Metzger, Melanie. 1999. Sign Language Interpreting: Deconstructing the Myth of Neutrality. 

Washington, DC: Gallaudet University Press. 

Miller, Lori. 2002. Digest Number 267: Audio Description International. 

NCAM. 2002. Access to Rich Media: WGBH. 

NFB. 2000. Blindness Statistics: National Federation of the Blind. 

100

Norris, Sigrid. 2002. Multimodal Discourse Analysis: A Conceptual Framework. Paper 

presented at Georgetown University Round Table on Language and Linguistics, Georgetown 

University. 

NPS/Ear, National Park Service/Metropolitan Washington Ear. 2000. The Gift of Acadia: 

National Park Service. 

O'Grady, William and John Archibald, Mark Aronoff, Janie Rees-Miller. 2001. Contemporary 

Linguistics. New York: Bedford/St. Martins. 

Packer, Jaclyn PdD. 1996. Video Description in North America. In New Technologies in the 

Education of the Visually Handicapped, ed. Dominique Berger: John Libbey Eurotext. 

Packer, Jaclyn Ph.D. & Corinne Kirchner, Ph.D. 1997a. Who's Watching? A Profile of the 

Blind and Visually Impaired Audience for Television and Video. Journal of Visual 

Impairment and Blindness. 

Packer, Jaclyn PhD & Barbara Gutierrez MA, Corrine Kirchner PhD. 1997b. Origins, 

Organizations, and Issues in Video Description: Results from In-depth Interviews 

with Major Players. New York: American Foundation for the Blind. 

Perfetti, Charles A. 1988. Verbal Efficiency in Reading Ability. In Reading Research: Advances in 

Theory and Practice, ed. M. Daneman. San Diego: Academic Press. 

Pfanstiehl, Cody. 2002a. Email Communication, with Phil Piety. 

Pfanstiehl, Margaret. 2002b. Founder, Washington Metropolitan Ear, with Phil Piety. Silver 

Spring, MD 20901. 

Pfanstiehl, Margaret R. Ed.D, and Cody. 1984. Unpublished Training Materials. Ms. Silver 

Spring, MD 20901. 

Pfanstiehl, Margaret R. EdD. 2002c. Discussions Regarding Audio Description, with Phil 

Piety. 

Phansteihl, Margaret. 2002. Discussions Regarding Audio Description, with Phil Piety. 

Piety, Philip. 2001. Thamus and Theuth are Dead: The impacts of digital communications on 

types of communication (Unpublished research paper), 31. Washington DC: 

Georgetown University. 

Raman, TV. 1994. Audio System for Technical Readings, Computer Science, Cornell 

University: PhD. 

RFB&D. 2001a. Annual Report. Princeton, New Jersey. 

101

RFB&D. 2001b. Recording for the Blind & Dyslexic Annual Report 2001: Recording for the 

Blind & Dyslexic. 

Rosch, Eleanor. 1978. Principles of categorization. In Cognition and Categorization, ed. Eleanor 

Rosch. Hillsdale, N.J.: Erlbaum Associates. 

Schank, Roger C. and Robert Abelson. 1977. Scripts, plans, goals, and understanding:An inquiry 

into human knowledge structures. Hillsdale, NJ: Erlbaum. 

Schiffrin, Deborah. 1987. Discourse Markers: Studies in Interactional Sociolinguistics. 

Cambridge UK: cambridge University Press. 

Schiffrin, Deborah. 1994. Approaches to Discourse: Blackwell Textbooks in Linguistics. Malden 

MA: Blackwell Publishers. 

Schroeder, Fredric K. 1994. Braille Usage: Perspectives of Legally Blind Adults and Policy 

Implications for School Administrators, University of New Mexico. 

Scollon, Ron. 2001a. Mediated Discourse: The Nexus of Practice. new York: Routledge. 

Scollon, Ron and Suzanne Wong Scollon. 2001b. Intercultural Communication: A Discourse 

Approach: Language in Society. Oxford: Blackwell. 

Simpson, John. 2001. Improved TV Access for Blind Viewers in the Digital Era. Paper 

presented at Radio, Television, and New Media, Canberra, Australia. 

Slatin, John PhD. 2002. A Review Of: "Beyond Alt Text: Making the Web Easy to Use for 

Users with Disabilities". Information Technology and Disabilities 8. 

Slatin, John PhD & Sharron Rush. 2001. Maximum Accessibility: Making Your Web Site More 

Usable for Everyone: Web Design. Boston, MA: Addison-Wesley. 

Smith, Chris. 2002. Personal Communication: Meeting @ RFB&D, with Phil Piety. Boston, 

MA. 

Snyder, Joel. 2002a. Discussion, with Phil Piety. McLean, VA. 

Snyder, Joel. 2002b. Fundamentals of Audio Description: Audio Description Associates. 

Stephens, Mitchell. 1998. The rise of the image the fall of the word. Oxford: Oxford University 

Press. 

Stokoe, William. 1965. A Dictionary of American Sign Language on Linguistic Principles. 

Cambridge: Cambridge University Press. 

Stovall, Jim. 2002. Conversation, with Phil Piety. 

102

Tannen, Deborah. 1981. Introduction. Paper presented at Georgetown University Round Table 

1981: Discourse Analysis, Georgetown University. 

Tannen, Deborah. 1989. Talking Voices: Repetition dialog and imagery in conversational discourse.vol. 

6: Studies in Interactional Sociolinguistics. New York: Cambridge University Press. 

Tannen, Deborah. 1993a. What's In a Frame?: Surface Evidence for Underlying 

Expectations. In Framing in Discourse, ed. Deborah Tannen. New York. 

Tannen, Deborah and Cynthia Wallat. 1993b. Interactive Frames and Knowledge Schemas 

in Interaction. In Framing in Discourse, ed. Deborah Tannen. New York. 

Townsend, David & Caroline Carrithers, Thomas Bever. 1987. Listening and Reading 

Processes in College and Middle School-Age Readers. In Comprehending Oral and 

Written Languages, ed. R. Horowitz & J. L. Samuel. New York: Academic Press. 

Valli, Ceil Lucas & Clayton. 2001. Sociolinguistic Variation in ASL. Washington DC: Gallaudet 

University Press. 

Vollmer, Judy. 2002. Personal Communication: Meeting, with Phil Piety. Boston, MA. 

Vygotsky, Lev. 1934. Thought and Language. Boston, MA: MIT Press. 

W3C, World Wide Web Consortium. 1999. Web Content Accessibility Guidelines 1.0. 

Wall, Robert S.; Corn, Anne L. 2002. Production of Textbooks and Instructional Materials in 

the United States. Journal of Visual Impairment and Blindness 96:212, 211. 

Warren, David H. 1994. Blindness and Children: An Individual and Differences Approach. 

Melbourne Australia: Cambridge University Press. 

Weber, John. 2002. NPR Radio Producer, Audio Description Volunteer Washington Ear, 

with Phil Piety. Washington DC. 

Wilson, Paul T. and Richard Anderson. 1986. What They Don't Know Will Hurt Them: The 

Role of Prior Knowledge in Comprehension. In Reading Comprehension: From Theory to 

Practice, ed. J Orasanu. Hillsdale, NJ: Lawrence Erlbaum. 

Wlodkowski, Tom. 2002. Access to Convergent Media: Barriers to Convergent Media for 

Individuals Who Are Blind or Have Low Vision, 4. Boston: National Center For 

Accessible Media (NCAM). 

Wyver, Shirly R. and Rosaliyn Markham, and Sonia Hlavacek. 2000. Inferences and Word 

Associations of Children with Visual Impairments. Journal of Visual Impairment and 

Blindness:204-217. 

103

NOTES 

1 Statistics on blindness and visual impairment are a challenge because the condition 

often co-occurs with other conditions that might be used to characterize an 

individual such as diabetes or mental retardation. 

2 This is not to suggest that there are no differences that are related to language. There 

have been studies showing that there are substantial differences in development of 

concepts and prototypes. 

3 The Motion Picture Association of America (MPAA) and others recently challenged 

this ruling. The challenge was upheld on technical grounds and an appeal is in 

process at the time of this writing. 

4 Not all images are described. Prior to recording, someone marks up the text and may 

decide to exclude certain images. 

5 The act was originally passed in 1973. The 1998 amendment brought in section 508. 

6 A 1992 report, “The Use of Descriptive Video in Science Programming,” revealed 

indications of benefit Kuhn, David. 1992a. The Use of Descriptive Video in 

Science Programming. Boston: WGBH Educational Foundation., but was this 

researcher was not able to see an experimental method that could yield measured 

results. 

7 I am simplifying the characterizations of spoken and written text for the purposes of 

comparison. 

8 I should be clear that this is an area where I have extremely sketchy information and 

may run counter to concept of inclusion within the same speaking community as 

sighted individuals. Scollon & Scollon, for example, describe four different 

definitions of culture do not include perceptive information. 

9 The technology used in transcribing the movies represented time in terms of seconds 

and not lower so the determination of gaps and lengths was approximate. 

10 This data was recovered from Frazier’s thesis and so it is based on the timings he 

presented and not on transcripts as the other films are. 

11 This term is not intended to create a direct reference to the a similarly named 

concept in software technology although one could imagine a distant connection to 

104

technology in the future. The term is used here to represent the concept of real 

people places and things that exist over extended periods of time in a text. 

12 I am certainly simplifying this issue and basing this aspect of my work on the 

definition of Utterance in Schiffrin 1987. This is a convenience for both author and 

reader and it is expected that other approaches to spoken discourse analysis that 

focus on the unit of production analogous to the utterance could also be applied in 

this area. 

105

Audio Description a Visual Assistive Discourse - Communication ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?