06.12.2012 Views

Audio Description a Visual Assistive Discourse - Communication ...

Audio Description a Visual Assistive Discourse - Communication ...

Audio Description a Visual Assistive Discourse - Communication ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

AUDIO DESCRIPTION, A VISUAL ASSISTIVE DISCOURSE:<br />

An Investigation into Language Used to<br />

Provide the <strong>Visual</strong>ly Disabled<br />

Access to Information in Electronic Texts<br />

A Thesis<br />

Submitted to the Faculty of the<br />

Graduate School of Arts and Sciences<br />

of Georgetown University<br />

in partial fulfillment of the requirements for the<br />

degree of<br />

Master of Arts<br />

in <strong>Communication</strong>, Culture and Technology<br />

By<br />

Philip J Piety, B.S.<br />

Washington, DC<br />

February 24, 2003


Copyright 2003 By Philip Piety<br />

All Rights Reserved<br />

ii


ABSTRACT<br />

<strong>Visual</strong>ly impaired and blind individuals face challenges in accessing many types of<br />

texts including television, films, textbooks, software, and the Internet because of the rich<br />

visual nature of these media. In order to provide these individuals with access to this visual<br />

information, special assistive technology allows descriptive language to be inserted into the<br />

text to represent the visual content. This study investigates this descriptive language. It<br />

looks at it as a system of human communication and investigates the process of creating<br />

descriptions that includes an intermediary called a describer and the modifications they make<br />

to a text in order to render it accessible and usable. There are different practices of creating<br />

these descriptive insertions and many terms refer to them including <strong>Audio</strong> <strong>Description</strong>,<br />

described video, and textual and verbal equivalents. This study considers these practices as<br />

variants of a type of communication called <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> that has specific and<br />

definable properties. This study is the first academic investigation into the language process<br />

since the practice was conceptually described as a technique for television by Frazier in 1975.<br />

It addresses broad questions about this unique communication form. Who does it? Why it<br />

is unique? What does it look like? And, how can it be analyzed?<br />

The approach taken is structured as a study within a study. The outer study looks at<br />

the theoretical issues of using language as a visual prosthetic and shows it having properties<br />

in common with both prototypical spoken and written discourse as well as communication<br />

like sign language interpretation that relies upon an intermediary. The inner study uses a<br />

corpus of more than 23,000 words of <strong>Audio</strong> <strong>Description</strong> drawn from four movies described<br />

by three organizations proficient in the practice of describing film. Analysis of this data<br />

i


shows it to be a language system with distinctive constituent and discursive structures. This<br />

study shows that the fundamental nature of a unit of an inserted description is not an<br />

isolated representation of isolated visual information, but rather, a semantic unit that is<br />

situated in several definable ways within a multimodal text.<br />

ii


ACKNOWLEDGMENTS<br />

Producing this work has allowed me to learn about language, research, and some of<br />

the challenges that accrue to the study of nontraditional subjects. Many people both inside<br />

and outside the University have contributed to this effort, and I am forever grateful to them.<br />

Within Georgetown University, I must first mention and thank my thesis advisor,<br />

Professor Randy Bass, whose help was invaluable in connecting my research interests to the<br />

completion of this process. Next, I am thankful for knowing and benefiting from the advice<br />

of Professor Ron Scollon; my second reader and an informal advisor almost since the time I<br />

considered this as a potential topic. Ron has provided constant encouragement and support<br />

and, as the process was nearing completion, I found myself recollecting many things he said<br />

about the choices that scholars make that directly related to my work. I am also grateful to<br />

Professors Jay Lemke and John Castellani of the University of Michigan and Johns Hopkins<br />

University Center for Technology in Education. Professor Castellani was the teacher who<br />

exposed me to this area as an educational challenge and Professor Lemke provided<br />

extremely valuable feedback on an early draft of this document.<br />

Also within Georgetown, Professors Shukla, Tannen, Tinkcom, and Tyler were<br />

gracious in spending time discussing the challenges of this topic and provided me with<br />

encouragement and kind thoughts. I also appreciate the many email exchanges with Daniel<br />

Loehr and Kristin Mulrooney and the early discussions with Elisa Everts: all doctoral<br />

students at Georgetown. I also thank Professors Hamilton and Schiffrin whose classes at<br />

Georgetown were memorable, enriching, and relevant to my work. And, of course, Dr.<br />

iii


Suzanne Wong Scollon who gave me the nudge in this direction and then advised me<br />

(correctly) of some of the challenges I would encounter in studying this as a language topic.<br />

Outside of academia, the first person to thank is someone who everyone involved<br />

with the field of <strong>Audio</strong> <strong>Description</strong> owes much to. Nobody, with the exception of her<br />

husband Cody Pfanstiehl, has made a larger contribution to the method and the advocacy of<br />

this emerging field than Dr. Margaret Pfanstiehl. The Pfanstiehls discussed their work many<br />

times, provided documents that relate to their methods, and referred me to others whose<br />

input was essential. My path was made smoother by being able to say, “the Pfanstiehls<br />

suggested I call.” Among those who responded were Joel Snyder, Director of Described<br />

Media with the National Captioning Institute, himself a major contributor to the field of<br />

<strong>Audio</strong> <strong>Description</strong>, and Larry Goldberg, Director of Media Access at the WGBH<br />

Educational Foundation who was helpful as was his staff, including Theresa Maggiore and<br />

Bryan Gould. I also thank Jim Stovall, founder of the Narrative Television Network who<br />

discussed in vivid detail how a person uses <strong>Audio</strong> <strong>Description</strong>.<br />

Within the advocacy community, I owe much to the American Foundation for the<br />

Blind and the many supportive phone conversations with Dr. Elaine Gerber; and the<br />

support of Drs. Jaclyn Packer, and Corrine Kirchner. And, I also thank Curtis Chong from<br />

the National Federation for the Blind who was the first person in the visually impaired<br />

advocacy community that spoke with me about my research and Melanie Brunson of the<br />

American Council of the Blind.<br />

At the Recordings for the Blind and Dyslexic, I am grateful to Judy Vollmer and the<br />

staff in the Boston studio; and Chris Smith in Washington DC who spent many hours with<br />

me and provided me with equipment, tapes, and insights into their process.<br />

iv


From the Center for Applied Special Technology, I appreciate the thoughts, time,<br />

and encouragement of Dr. Robert Dolan. And, from the United States Access Board, both<br />

David Baquis and Doug Wakefield were a tremendous help and spent a good deal of time as<br />

did Dr. Judith Dixon from the Library of Congress. I also thank John Weber, a good<br />

personal friend, National Public Radio producer, and volunteer audio describer who is one<br />

the people who sparked my interest in this area. Bob Regan, Product Manager for<br />

Macromedia Corporation and student with the Trace Center at the University of Wisconsin<br />

at Madison was also helpful as I began this investigation.<br />

My largest thanks must go to the person I dedicate this work to. My wife Sarah has<br />

encouraged me to follow my intellectual pursuits for their value alone. From the beginning<br />

of this process, she has served as a constant reminder in times of doubt that my work is<br />

worthwhile.<br />

v


TABLE OF CONTENTS<br />

1. INTRODUCTION......................................................................................................1<br />

1.1 Overview............................................................................................................. 1<br />

1.2 Research Goals & Questions............................................................................... 3<br />

1.3 Anticipated Benefits and Limitations of this Research....................................... 5<br />

1.4 Organization of the Sections............................................................................... 7<br />

1.5 Note for <strong>Visual</strong>ly Impaired Readers ................................................................... 7<br />

2. BACKGROUND INFORMATION...........................................................................8<br />

2.1 The Primary Consumers of <strong>Visual</strong> <strong>Description</strong>.................................................. 8<br />

2.2 Practices of Describing Images for <strong>Assistive</strong> Technology ............................... 10<br />

2.2.1 <strong>Audio</strong> <strong>Description</strong>..................................................................................... 10<br />

2.2.2 <strong>Audio</strong> Books ............................................................................................. 14<br />

2.2.3 Software and Interactive Media ................................................................ 15<br />

2.2.4 Multimedia and Internet Sites................................................................... 16<br />

2.3 The Goal of Accessible Media: Usability and Experiential Equivalence......... 18<br />

2.4 Previous Qualitative Evaluations of <strong>Description</strong>.............................................. 19<br />

2.5 The Practical Case for a Unified Model............................................................ 20<br />

2.5.1 Reasons to Consider as Separate Practices ............................................... 21<br />

2.5.2 Reasons to Consider as a Single Process .................................................. 21<br />

3. VISUAL ASSISTIVE DISCOURSE.......................................................................24<br />

3.1 <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> Defined.................................................................. 24<br />

3.1.1 The “<strong>Discourse</strong>” in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>........................................ 24<br />

3.1.2 Common Properties of Different Descriptive Practices............................ 25<br />

3.1.3 Comparison of VAD and Other <strong>Communication</strong> Systems ....................... 26<br />

3.1.4 The Components of the VAD <strong>Communication</strong> System............................ 28<br />

3.2 Conceptual Issues with Words for Images........................................................ 29<br />

3.2.1 Sequential vs. Parallel............................................................................... 30<br />

3.2.2 Raw vs. Processed Information................................................................. 30<br />

3.2.3 Schema Theory......................................................................................... 31<br />

3.2.4 Translation-Interpretation-Transrepresentation........................................ 32<br />

3.3 <strong>Description</strong>s: Situated/Constrained in Multimodal Texts................................. 32<br />

3.3.1 Textually Situated <strong>Description</strong>s................................................................ 33<br />

3.3.2 Constraints: Detail vs. Interpretation........................................................ 35<br />

3.4 Discussion: The Role of the Describer ............................................................. 37<br />

3.4.1 Describing as a Way of Thinking ............................................................. 37<br />

3.4.2 Describer as Intermediary......................................................................... 39<br />

3.4.3 <strong>Description</strong> as a Group Process ................................................................ 39<br />

vi


3.5 Discussion: The Role of the Consumer ............................................................ 41<br />

3.5.1 The Consumers: Actively Building a Text Model.................................... 41<br />

3.5.2 The Purposes and Goals of Consumers .................................................... 43<br />

3.6 Cultural Issues with <strong>Description</strong>? ..................................................................... 45<br />

4. STUDY OF AUDIO DESCRIPTION......................................................................47<br />

4.1 The Study Corpus ............................................................................................. 47<br />

4.2 Methodology..................................................................................................... 48<br />

4.3 The Structural Components of <strong>Audio</strong> <strong>Description</strong>........................................... 49<br />

4.3.1 Insertions ................................................................................................... 50<br />

4.3.2 Utterances.................................................................................................. 53<br />

4.3.3 Representations ......................................................................................... 56<br />

4.3.4 Words........................................................................................................ 67<br />

5. USING THE DEFINITIONS FOR ANALYSIS......................................................69<br />

5.1 Descriptive Mass............................................................................................... 69<br />

5.2 The Textual Role of Insertion Content ............................................................. 72<br />

5.3 Sample Analysis: Persistent Entity Development............................................. 72<br />

5.4 Sample Analysis: The Scene as Frame ............................................................. 74<br />

5.5 Sample Analysis: Utterance Patterns ................................................................ 75<br />

5.6 Sample Analysis: Representational Combinations ........................................... 77<br />

5.7 Analysis Challenges: Time, Reality, and Cultural Elements............................ 78<br />

6. SUMMARY: A LANGUAGE SYSTEM ................................................................79<br />

7. FUTURE STEPS......................................................................................................83<br />

7.1 Consumers Study .............................................................................................. 83<br />

7.2 Supporting Further Developments of <strong>Audio</strong> <strong>Description</strong>................................. 84<br />

7.3 Descriptive & Comparative Studies of Other Forms of VAD.......................... 85<br />

7.4 Human Subjects Studies with AD..................................................................... 86<br />

7.5 Educational Materials Study............................................................................. 86<br />

7.6 <strong>Assistive</strong> Technological Research .................................................................... 87<br />

APPENDIX A: GLOSSARY.............................................................................................89<br />

APPENDIX B: TRANSCRIPTION & MULTIMODAL ISSUES...................................92<br />

Transcription Conventions ........................................................................................... 92<br />

Multimodal Issues........................................................................................................ 92<br />

Sub second timing ........................................................................................................ 93<br />

vii


APPENDIX C: VERBAL DESCRIPTIONS FOR FIGURES .........................................94<br />

REFERENCES ..................................................................................................................97<br />

NOTES.............................................................................................................................104<br />

viii


LIST OF FIGURES<br />

Figure 1 – Different practices of visual description in 2002................................................ 10<br />

Figure 2 - Conceptual view of Internet descriptions ........................................................... 16<br />

Figure 3 - Overview of VAD and other prototypical communication processes................. 27<br />

Figure 4 - Chafe's view of immediate mode........................................................................ 38<br />

Figure 5 – Conceptual view of a the description process.................................................... 40<br />

Figure 6 - Conceptual diagram of consumer's process........................................................ 42<br />

Figure 7 - Length of utterances in corpus........................................................................... 54<br />

Figure 8 - Chart version of table 4 data.............................................................................. 71<br />

LIST OF TRANSCRIPTS<br />

Transcript 1- From “The Gift of Acadia” 1:06................................................................... 52<br />

Transcript 2 - From "A Star is Born" 16:20........................................................................ 55<br />

Transcript 3 - From "Gladiator" 40:26............................................................................... 59<br />

Transcript 4- From "Gladiator" 1:42:30.............................................................................. 60<br />

Transcript 5 - From "Gladiator" 47:49............................................................................... 62<br />

Transcript 6 - From "La Story" 3:38................................................................................... 62<br />

Transcript 7 - From "LA Story" 19:00................................................................................ 62<br />

Transcript 8 - From "LA Story" 20:36................................................................................ 63<br />

Transcript 9 - From "A Start is Born" 1:58......................................................................... 64<br />

Transcript 10 - From "Gladiator" 6:10............................................................................... 64<br />

Transcript 11- From "Gladiator" 1:15:29............................................................................ 65<br />

Transcript 12- From "Gladiator" 12:00............................................................................... 65<br />

Transcript 13 - From "LA Story" 1:50................................................................................ 65<br />

Transcript 14 - From "Gladiator" 58:42.............................................................................. 66<br />

Transcript 15 - From "LA Story" 7:50................................................................................ 66<br />

Transcript 16 - From "La Story" 90:38............................................................................... 67<br />

Transcript 17- From "La Story" 25:36................................................................................ 67<br />

ix


LIST OF TABLES<br />

Table 1 - Study corpus material.......................................................................................... 48<br />

Table 2 - Summary structural components of audio description......................................... 50<br />

Table 3 - Comparison of description mass in different four texts....................................... 70<br />

Table 4- Distribution of description mass by insertion length............................................. 71<br />

Table 5 - Referring terms for main character of "Gladiator"............................................... 73<br />

Table 6- Comparison of description styles.......................................................................... 76<br />

x


1.1 Overview<br />

1. INTRODUCTION<br />

The title of this work “<strong>Audio</strong> <strong>Description</strong> a <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>” may<br />

introduce one or two terms into the vocabulary of the readers. The practices that form the<br />

basis for this study are recent having only become established towards the end of the<br />

twentieth century. They are based on communication and digital technologies, and rely<br />

upon language for their success.<br />

<strong>Audio</strong> <strong>Description</strong> is a way to provide a visually impaired person access to visually<br />

rich productions including movies, television programming, plays, live events, and museum<br />

exhibitions. With <strong>Audio</strong> <strong>Description</strong>, a describer inserts spoken words to provide<br />

representations of information contained in the visual field of the production. The inserted<br />

description, when combined with existing audio content, including dialog from the original<br />

production, creates a new text that is more accessible than it would be without the addition<br />

of the description. This study looks at <strong>Audio</strong> <strong>Description</strong> as a language system and<br />

describes its features and structural components and shows how it can be analyzed in a<br />

manner similar to other discourse types.<br />

<strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> is a term introduced in this study to describe the type of<br />

communication process that <strong>Audio</strong> <strong>Description</strong> is. <strong>Audio</strong> <strong>Description</strong> and practices like it<br />

rely upon special assistive technology and <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> is a linguistic view of the<br />

communication processes that operate over and through assistive technology, but is<br />

fundamentally a human communication process.<br />

1


Specific productions and practices of language use can be viewed as parts of broader<br />

classes: a telephone conversation can be viewed as a type of conversation, electronic mail can<br />

be viewed as a form of correspondence, and conversation that utilizes a sign language<br />

interpreter can be viewed by its reliance upon an communicative intermediary. Further,<br />

these classifications are not purely hierarchical: a conversation interpreted with sign language<br />

is also a conversation, while an interpreted unidirectional event such as a political speech is<br />

not.<br />

This study is descriptive. It is the first attempt to look at this area, not as a service or<br />

accessibility issue, but as language system. There is an asymmetry to this domain because<br />

language and vision operate differently and to view this system within a language framework<br />

may require a conceptual shift. That shift is from the perspective of the sighted describer<br />

who works with both vision and language to the perspective of the consumer of this service<br />

who often receives just the language. From that perspective, it is a language issue. While<br />

many human interactions from buying coffee to expressing amity or hostility can be<br />

accomplished with language in a secondary or non-existent role, in this area, language is the<br />

essential medium of exchange. The contents of this study could be expressed as theses, as<br />

propositions, within the framework of problem statements. At the core of these different<br />

ways to frame this study is the belief that <strong>Audio</strong> <strong>Description</strong> is a type of <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />

that is a type of language system. This language system is used by people who have different<br />

perceptual abilities, but are members of the same speech communities. Understanding this<br />

language system can involve many different levels of analysis including the participant<br />

structure, its environmental constraints, the conditions under which it is practiced, and its<br />

external form from small parts such as words to larger discourse units.<br />

2


The sister practices that are also types of <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>, like <strong>Audio</strong><br />

<strong>Description</strong>, are recent and in formative stages. So this study may provide benefits to these<br />

practices by illuminating the techniques and methodologies that <strong>Audio</strong> <strong>Description</strong> employs.<br />

Because this study is situated at an early stage in the lives of these practices that are steadily<br />

growing, and there is little academic work to build on, it takes a broad approach and tries to<br />

address from several important perspectives some fundamental questions about what this<br />

process is like.<br />

Readers of this document should see what the author has become convinced of: that<br />

these practices are vast both in the range of conceptual issues and their impacts on people’s<br />

lives. This is a technical study and it will not address many issues people who are<br />

experienced in visual description consider important such as the interpretive aspects of<br />

description, which content should be described, or describing under specific textual<br />

constraints. These are important areas to be sure, but the focus of this study is to help<br />

define the process characteristics and participants in a way that will support many types of<br />

ongoing qualitative discussions.<br />

1.2 Research Goals & Questions<br />

As a descriptive study, this research can potentially serve a range of purposes from<br />

qualitative analysis to comparative studies that require a formal definition of the language<br />

use. For the purposes of this document, this study’s intent can be formalized in two sets of<br />

research questions in two different types of studies. The first study looks at all of these<br />

communication practices as variations of a type of process that is distinctive from other<br />

3


forms of language use. This study defines the process called <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />

(VAD) through the following questions:<br />

1. What properties are common to the practices of providing visual information<br />

through language in electronic texts?<br />

2. How is this communication process similar to and distinct from other forms<br />

of language use in terms of participants and development of a text?<br />

3. Other than providing a theoretical definition, what other reasons exist for<br />

viewing these practices as variations of a common process?<br />

The method employed in the conceptual study is descriptive and logical. It describes<br />

the properties and components that are broadly at work in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> and<br />

then draws in related literature and concepts to assemble a conceptual view that reflects that<br />

which is common about the practices and the common implications of the structure.<br />

The second or inner study looks at the language produced in one of the visual<br />

description practices. The practice chosen for this study is <strong>Audio</strong> <strong>Description</strong>. While <strong>Audio</strong><br />

<strong>Description</strong> (AD) will not show every possible linguistic form of this descriptive process, it<br />

is a practice area with a strong methodological history. This second study part has two main<br />

research questions:<br />

1. Is there a constituent structure that is different from other language uses?<br />

2. What types of information are provided within <strong>Audio</strong> <strong>Description</strong> and what<br />

patterns in representation exist?<br />

4


These two types of studies support each other to provide a more comprehensive<br />

view of this unique communication process and thereby support the proposition that <strong>Audio</strong><br />

<strong>Description</strong> is an example of a type of language process called <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>.<br />

1.3 Anticipated Benefits and Limitations of this Research<br />

This research is intended as an initial investigation into a way to use language that has<br />

largely been unexplored, linguistically. An anticipated benefit of this research is the<br />

understanding of this process as a formal system with specific characteristics that can be<br />

observed, measured, and taught. At the time that this study is being written a number of<br />

discussions in the community of those providing and those using visual description are<br />

occurring that involve standards and guidelines (Levine, 2002). Historically, these practices<br />

are in their earliest and formative stages. And, as <strong>Audio</strong> <strong>Description</strong> and its sister language<br />

forms are like other language systems, they will evolve through negotiation of the<br />

participants. This research is then positioned at an important stage in this emerging language<br />

system and aims to support the development of it by a description of its process level<br />

characteristics that can support this ongoing negotiation.<br />

As with other areas that benefit a disabled community, much of the previously<br />

available resources to support the blind and visually impaired in accessing visual information<br />

have been focused on service, delivery, and regulatory areas rather than research. Some may<br />

see parallels to the struggles of the deaf community to have acceptance of sign language that<br />

for years was banned as ‘deviant.’ Some may find that compared to other fields, the amount<br />

of previous research in this area is small especially considering the conceptual issues involved<br />

5


and that this study is required to cover a broad area without the ability to focus on specific<br />

and important components.<br />

Will this study yield insights that help the larger population of language users? While<br />

making no claim that this area will have broad relevance, it is important to note the historical<br />

precedence for research into areas of communicative challenge, including the documentation<br />

of American Sign Language (Stokoe, 1965) and Vygotsky’s work with deaf-blind children<br />

(Hardy, 2000) that have provided insights into general properties of language and cognition.<br />

One of the dimensions of this research is that forms of information such as pictures, actions,<br />

and gestures that are not usually considered language or equivalent to language are being<br />

replaced with language. The fact that these non-linguistic forms do not have the same<br />

properties of speech or writing could be a basis to consider this research as not a ‘linguistic’<br />

effort or alternatively may reinforce the need to define the non-linguistic visual information<br />

with the type of structure often used to study language and language use.<br />

Independent of the benefits for the study of language, there is a rich history of<br />

efforts originally intended to benefit a disabled group that resulted in benefits for the general<br />

population. Some examples include illuminated elevator floor lights, originally for the deaf;<br />

ramps and curb cuts originally for wheelchairs that are now used by baby strollers and<br />

bicyclists; and the secondary audio programming (SAP) channel now used for a multilingual<br />

support in television programming. Perhaps because <strong>Audio</strong> <strong>Description</strong> and its sister<br />

practices are so new, there are benefits to those challenged by language or learning a new<br />

language or other groups as yet unknown.<br />

6


1.4 Organization of the Sections<br />

This study is organized from the outside in and starts with current practices and<br />

completes by describing some paths for further study. Section two contains background<br />

information that discusses the blind and visually impaired communities, the existing practices<br />

of providing visual description and some of the relevant literature that exists. Section three<br />

looks specifically at <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD) as a formal system by defining it in<br />

terms of properties and a participant framework as well as implications for the principle<br />

roles and processes. Section four looks at <strong>Audio</strong> <strong>Description</strong> (AD) with a review of several<br />

productions of movies described by different organizations. This section develops a set of<br />

terms for the structural and functional components based on examples from transcripts of<br />

AD. Section five builds upon the definitions in section four with a series of analytic<br />

perspectives that looks at referring sequences (Schiffrin, 1994), experiential frames<br />

(Goffman, 1974, Tannen, 1993a), foci of consciousness (Chafe, 1994), and sites of<br />

engagement (Scollon, 2001a). Section summarizes the findings from AD to the larger<br />

practices of VAD and section seven describes some paths for additional research.<br />

1.5 Note for <strong>Visual</strong>ly Impaired Readers<br />

In order to make this document more usable if it is translated into an accessible<br />

form, the topical headings and subheadings use a numbering system rather than the standard<br />

American Psychological Association (APA) style. Also, all of the images used in this report<br />

have associated verbal descriptions contained in Appendix C. These descriptions will also be<br />

extracted as a separate file that can be accessed concurrently with this document.<br />

7


2. BACKGROUND INFORMATION<br />

The goal of this section is to provide some orientation to readers who will likely have<br />

different experiences and understandings. Below will be a discussion of the consumers of<br />

visual description, the main practices of visual description, the goals of accessible media, and<br />

some practical benefits for viewing these disparate practices as variations of a single process.<br />

2.1 The Primary Consumers of <strong>Visual</strong> <strong>Description</strong><br />

Most, but not all, of the consumers of visual description are the blind or visually<br />

impaired. Because of the variety of factors that impair vision, the effects that occur when<br />

vision loss comes at different ages, and the way statistics on visual impairments are collected,<br />

a simplified description of the communities of visually impaired individuals is not possible in<br />

this forum. Not only is there no typical blind person, there are numerous ways to categorize<br />

the population, including grouping those who have had an impairment from birth<br />

(congenital) or acquired it at some point in life (adventitious), or by looking at the nature of<br />

the impairment such as focus, loss of visual field, or ability to discern details. Current<br />

estimates of the prevalence of visual impairment are that there are 1.3 million citizens<br />

affected with some form of visual impairment in the United States; 98,000 of which are<br />

students (AFB, 2001a). The National Federation of the Blind reports that 3.5% of the<br />

population over 65 years of age has some visual impairment (NFB, 2000), while the<br />

American Foundation For the Blind reports a much higher 16% of the same age group 1 .<br />

Timely access to accessible educational information including textbooks, multimedia,<br />

and educational videos are considered significant impediments to education and training.<br />

Many blind children are educated in mainstream schools, but less than half complete high<br />

8


school. For those that complete high school, they are just as likely to take some college<br />

courses as sighted individuals and much less likely to graduate (AFB, 2001b). Nationally,<br />

unemployment for working age visually impaired individuals is over 50% with less than one<br />

in three of those of working age and legally blind being employed (AFB, 2000).<br />

The visually impaired community does not have a separate language. Braille is a way<br />

to represent a natural language such as English in a tactile form, but most members of this<br />

community do not read Braille, or do not read it well. While Braille is an important mode of<br />

communication for the blind, it is not a universal solution for communicating information.<br />

At present, Braille literacy is in decline (Schroeder, 1994). VIB individuals use language in<br />

essentially the same ways 2 , including visual references, as do those with sight (Warren, 1994)<br />

(Elbers, 1999). For those who lost sight after critical developmental ages, concepts<br />

grounded in vision such as colors, perspective, visually based inferences (Wyver, 2000) are<br />

still relevant even if they cannot see. For the others, including the congenitally blind, there is<br />

also interest in visual information and in understanding the visual world. No research or<br />

other indications emerged in this study that descriptive language for the VIB should be<br />

different from language that would be used with sighted consumers of visual description<br />

who were not accessing the visual information. The differences between regular language<br />

use and the language used in VAD (discussed later in this document) are not related to visual<br />

impairments but to the fundamental nature of the visual descriptive process.<br />

9


2.2 Practices of Describing Images for <strong>Assistive</strong> Technology<br />

Using language in electronic texts to replace vision is practiced by a variety of<br />

organizations and with a variety of text types. This analysis divides these into four mostly<br />

unrelated areas as illustrated in figure 1. Except for serving a common set of customers,<br />

there is little apparent integration of these activities, at the time of this study.<br />

Live<br />

<strong>Description</strong><br />

1981<br />

<strong>Audio</strong> <strong>Description</strong><br />

Described<br />

Video<br />

Figure 1 – Different practices of visual description in 2002<br />

2.2.1 <strong>Audio</strong> <strong>Description</strong><br />

<strong>Audio</strong><br />

Tours<br />

One of the two most developed areas of these practices is called <strong>Audio</strong> <strong>Description</strong><br />

(AD). AD actually encompasses several different usages that have in common that they use<br />

human voice rather than voice synthesis. They also work primarily with texts that are in<br />

motion rather than fixed. Further, they also share a common methodological history.<br />

Beginning with Live <strong>Description</strong><br />

<strong>Audio</strong><br />

Books<br />

1948/1971<br />

Although it appears that the very first described performances were recorded on tape<br />

and broadcast through radio reading services (Packer, 1997b), the first sustained and<br />

standardized AD program began in 1981 with the work of Dr. Margaret Pfanstiehl and her<br />

organization called the Metropolitan Washington Ear. The “Ear” began providing live AD<br />

10<br />

Software &<br />

Interactive<br />

Media<br />

Late 1990s<br />

The Internet<br />

& Multimedia<br />

Late 1990s


for plays at a Washington DC area theater using local FM transmitters and a describer who<br />

would insert narration between sections of dialog. The practices and techniques developed<br />

for live description were influential for all of the major areas of AD that followed.<br />

The most significant difference between live description and the other forms<br />

discussed below is that, as a result of being live, the describer must be sensitive to events in<br />

real time and the description cannot be pre-packaged because even theatrical performance<br />

(sometimes by design) varies in terms of events and timing. This is not to indicate that the<br />

description is fully spontaneous. Before a theatrical production is described, it is previewed<br />

by the describer and in some cases by two describers, one for the program information and<br />

one for the performance, so there is some redundancy as well as preparation in the creation<br />

of the descriptions (Weber, 2002).<br />

Live description is also now used in non-theatrical settings such as weddings and<br />

ceremonies. Occasionally, live events are broadcast simultaneously (simulcast) on the<br />

Internet.<br />

Described Video<br />

The next major development in <strong>Audio</strong> <strong>Description</strong>, and now the most prevalent<br />

form of AD is called described video. Described video includes television, films, and streaming<br />

media. This form of visual description reaches thousands of viewers/hearers daily. And, as<br />

stated earlier, some of the earliest described performances occurred experimentally in the<br />

1970s. In 1975, a theoretical approach was developed as part of a Master’s thesis by Gregory<br />

Frazier (Frazier, 1975). But, it was not until 1982 when the Washington Ear connected with<br />

WGBH, the public broadcasting station in Boston that had pioneered closed captioning and<br />

11


continues to pioneer accessible media including the secondary audio program (SAP) channel,<br />

that described video became a broadcast reality (Packer, 1997b). WBGH, with consultation<br />

from “the Ear” began broadcasting described shows and later, with support from the U.S.<br />

Department of Education, launched the descriptive video service (DVS) in 1990 and the<br />

National Center for Accessible Media (NCAM) in 1993. DVS is now one of the two largest<br />

providers of described video material in the U.S. and, in addition to providing broadcast<br />

products, sells a collection of described videos while NCAM focuses on newer technologies<br />

such as interactive media and devices. WGBH is not the only organization providing<br />

described video products. In 1988, seemingly independent of these efforts on the East<br />

Coast, Jim Stovall, who lost his sight as a young adult, developed another descriptive<br />

approach. Rather than broadcast on a separate channel, his company used the existing audio<br />

channel with description inserted in between dialog so that all viewers receive the same<br />

audio content. His company, the Narrative Television Network (NTN), also began with an<br />

emphasis on television programming and was a commercial enterprise that was sustained by<br />

advertising as well as funding from the U.S. Department of Education. Initially, the<br />

descriptive style used at NTN was sparser than the original description style used at “The<br />

Ear” and WGBH, but by today, Stovall indicates their styles are fairly close (Stovall, 2002).<br />

With a recent Federal <strong>Communication</strong>s Commission (FCC) ruling 3 that requires<br />

several prime time hours of television programming each day to be described, a number of<br />

other organizations have entered the <strong>Audio</strong> <strong>Description</strong> market space. Recently, the<br />

National Captioning Institute (NCI), known for real-time closed captioning, launched a<br />

program of described media led by Joel Snyder who has been active in <strong>Audio</strong> <strong>Description</strong><br />

12


since the early 1980’s, including work with the Ear and the National Endowment for the<br />

Arts, and in developing audio tours of museums.<br />

A significant feature of all of the organizations that provide description for television<br />

and film is that their staffs are usually paid and they employ an extensive pre-description<br />

process including scripts, writers, and editors.<br />

Recorded Tours<br />

Starting in the mid 1980s, museums began offering <strong>Audio</strong> <strong>Description</strong> tours (Snyder,<br />

2002a) and the practices have spread extensively (ASTC, 2001). Since this type of <strong>Audio</strong><br />

<strong>Description</strong> is more geared towards fixed stimuli such as paintings and museum exhibits, it<br />

would seem to be similar in some ways to the content found in books.<br />

Methodological Ancestry<br />

All of the major providers of described video products in the US had early and<br />

substantive consultations with and training from the Washington Ear and these<br />

organizations readily attributed much of their understanding of the principles of description<br />

to the Ear and the Phanstiehls (Goldberg, 2002, Snyder, 2002a, Stovall, 2002). In addition,<br />

at the time of this report, there are active AD programs throughout the world and many of<br />

these programs have had substantive contact with those based in the US (Simpson, 2001)<br />

including from the Ear (Pfanstiehl, 2002a). While the practice of <strong>Audio</strong> <strong>Description</strong> is now<br />

spread over dozens of organizations, there is a common methodological history derived<br />

from the live performances that began 1981.<br />

13


2.2.2 <strong>Audio</strong> Books<br />

Historically, printed material has been a very different medium from film and video<br />

and so it follows that the description practices used for printed material would be unrelated<br />

to <strong>Audio</strong> <strong>Description</strong>. In the United States, there is only one organization, Recording for<br />

the Blind and Dyslexic (RFB&D), that seems to provide most of the descriptions of visual<br />

information in books. The American Printing House (APH) for the Blind may also, but<br />

repeated inquiries provided no confirmation. And, while the APH does provide material in<br />

audio form, providing material in tactile form (Braille and tactile images) seems to be their<br />

focus, so that today, RFB&D is essentially the main supplier of audio textbooks (Burnham,<br />

2002, Wall, 2002).<br />

RFB&D began as Recording for the Blind (RFB) based in Princeton, New Jersey. It<br />

was chartered in 1948 to help visually impaired soldiers returning from World War II and is<br />

funded by government, private industry, and subscription services. This organization<br />

provides textbooks on a range of subjects in audio format to students with documented<br />

visual and/or learning disabilities. These textbooks are read along with descriptions of<br />

images 4 into a digital recording system that allows page-by-page access. The material is then<br />

distributed either digitally or on cassette tape. RFB&D operates 32 recording studios<br />

nationally, using over 5300 volunteers, and serving over 25,000 blind members (RFB&D,<br />

2001a). The RFB&D volunteers usually have deep experience in subject areas that they read<br />

for and are often retired professionals.<br />

The National Braille Association (NBA) produced a manual in 1971 for recording<br />

books on tape that included instructions on descriptions of images, including maps,<br />

diagrams, and charts. RFB&D uses the NBA guidelines and also has developed an extensive<br />

14


set of procedures for describing images in the range of disciplines taught in public schools<br />

and post-secondary institutions. Subjects include chemistry, computer science, social<br />

studies, geography, and math. This process includes volunteer readers and staff that support<br />

those readers by selecting which images will be described and where in the audio stream the<br />

descriptions will be placed. The volunteer who is reading the text then creates the<br />

descriptions. RFB&D policy suggests that the image descriptions be written out prior to<br />

reading by the volunteer, but linguistic evidence indicates that some (and perhaps most) of<br />

the descriptions are spontaneous. Like WGBH, RFB&D has subject focus specialists/areas<br />

and often assigns material of certain types to studios in different cities (Smith, 2002,<br />

Vollmer, 2002).<br />

There is also a worldwide consortium effort underway to develop a digital talking<br />

book (DTB) standard (Kerscher, 2001a). This effort includes publishers and library systems<br />

with the goal of providing a standard document interchange format based on the worldwide<br />

web consortium (W3C) extensible markup language (XML) (Kerscher, 2001b). Similar to<br />

the Internet, these standards use the term textual equivalent in relation to images.<br />

2.2.3 Software and Interactive Media<br />

Software and Interactive media are relatively new areas for visual description. While<br />

the prototypical texts for visual description, plays and books, have been around for<br />

thousands of years, the software industry is less than fifty years old. And, it is only within<br />

the last twenty years (more recently in a major way) that software and interactive media<br />

components have come in contact with the general population that includes the visually<br />

impaired.<br />

15


Interactivity is used here to denote a human-technology interchange where the<br />

human is presented with options to direct the technology to alter its form and/or functions.<br />

The term “Rich Media” is used by NCAM to mean “elements on a web page (or in a<br />

separate player) which exhibit dynamic motion over time or in response to user interaction”<br />

(NCAM, 2002) Convergent Media is another term used at NCAM for interactive elements,<br />

mostly in relation to digital television (Wlodkowski, 2002). The same regulations that cover<br />

the Internet also cover interactive elements on web pages; including navigational graphics<br />

(client & server maps), applets, and other dynamic media. This area is fairly new and there is<br />

little history to study.<br />

2.2.4 Multimedia and Internet Sites<br />

The term multimedia is<br />

often used to describe (like<br />

multimodal) content that is<br />

conveyed through more than one<br />

representational form (Corn,<br />

2002). The Internet can be<br />

viewed as an interconnected<br />

multimedia environment. The<br />

most important differentiator<br />

Figure 2 - Conceptual view of Internet descriptions<br />

Still<br />

Images<br />

Live (Simulcast) Events<br />

between Internet/multimedia and those mentioned before is that all of the earlier<br />

publication types, with the exception of live performances, can be included in multimedia<br />

16<br />

The Internet<br />

Text and Hypertext<br />

Moving<br />

Images<br />

Interactive<br />

Elements


texts. An Internet text can include live elements as well. The Internet, as illustrated in figure<br />

2, is really the superset of all other descriptive practices.<br />

<strong>Description</strong>s of non-textual components on the Internet are addressed through<br />

several standards and guidelines. The most widely known of these guidelines is section 508<br />

of the United States Rehabilitation Act of 1998 5 that covers many Federal government<br />

websites. Non-textual components (still images, moving images, and interactive elements)<br />

on executive branch websites are required under section 508 to have a description. Known<br />

commonly as the “508 standard,” it is often used as a measure for other organizations,<br />

including local governments, non-governmental organizations, and universities who wish to<br />

make their websites accessible but are not bound by the 508 mandate. There is also a<br />

voluntary guideline developed by the Worldwide Web Consortium (W3C, 1999) that covers<br />

essentially the same territory as section 508 but differs in some details. There are several<br />

software tools that check web sites for compliance to these standards that allows web sites to<br />

claim compliance with a standard. There is also a movement called the “Speech Friendly<br />

Sites” that connects to the 508 and W3C guidelines as well (Artic Technologies, 2002).<br />

There have also been many books published just in the last few years on developing<br />

accessible websites. None seems to provide more than a handful of pages related to the<br />

challenges associated with describing images and creating textual equivalents.<br />

All of the standards, guidelines, and books reviewed in this study take a similar<br />

approach to non-textual components. They rely upon features inherent in the hypertext<br />

markup language (HTML) standard for descriptions to be placed on these non-textual<br />

elements. And, they specify these elements should have a “textual equivalent” description.<br />

The guidance for that equivalent specifies that it must convey “the meaning of the image”<br />

17


(Board, 2001a, Board, 2001b). For many non-textual elements such as buttons and audio<br />

files, the description options are straightforward and formulaic. For images that carry<br />

essential communicative content rather than just decoration, for example the images found<br />

in textbooks, the descriptions are more challenging. None of these guidelines really<br />

addresses the range of descriptive options that might exist or how the person ‘surfing’ the<br />

web would cognitively process different descriptive approaches. In short, the standards and<br />

guidelines require the existence of descriptions, but provide little detail as to what those<br />

descriptions should include or how they should be constructed.<br />

No estimates of the number of organizations attempting to make Internet sites<br />

accessible were available. And, since the guidelines are so open, it is likely that there are<br />

hundreds if not thousands of different approaches to describing images in these new media.<br />

2.3 The Goal of Accessible Media: Usability and Experiential Equivalence<br />

The goals of accessibility practices discussed here have been expressed with two<br />

types of approaches, usability and experiential equivalence. Some distinguish usability from<br />

accessibility as: accessibility is based on the technology being able to access the information<br />

while usability is the information being meaningful and that has an intended effect upon the<br />

listener (Baquis, 2002). So, accessibility can be viewed as a component of usability (Slatin,<br />

2001). Experiential equivalence is a term that takes the concept of usability further to consider<br />

the usable experience as one that is equivalent to the experience the non-disabled would<br />

have. Section 508 requirements for the Internet specify that the description should<br />

“communicate the same information as its associated element” (Access Board, 2001). The<br />

18


NBA guidelines used at RFB&D have a similar guideline to ”allow the author to make his<br />

own impression on the listener.”<br />

Each practice approaches this concept with different terminology, but the underlying<br />

principle is the same – detailed pictorial description is not required, but rather the goal of the<br />

process is to provide essential information to allow the consumer to experience the original<br />

text in a similar manner as a sighted consumer would have including the opportunity to<br />

assign their own meanings to the texts rather than use the interpretations of others.<br />

2.4 Previous Qualitative Evaluations of <strong>Description</strong><br />

It seems that little research has been done to measure the benefits of visual<br />

description. Some studies related to <strong>Audio</strong> <strong>Description</strong> in recent years, and based mostly on<br />

subject self reports have indicated that described video and live performances are beneficial<br />

on a number of levels 6 . In “Video <strong>Description</strong> in North America,” Packer cites seven types<br />

of benefits from description (Packer, 1996):<br />

1. Gaining knowledge about the visual world,<br />

2. Gaining better understanding of televised materials,<br />

3. Feeling independent,<br />

4. Experiencing social connection,<br />

5. Feeling equality with fully sighted,<br />

6. Experiencing enjoyment, and<br />

7. Relief of burden on sighted viewers<br />

These findings were based on analysis of DVS customer feedback and are consistent<br />

with other reports that a production with <strong>Audio</strong> <strong>Description</strong> provides conceptual and<br />

19


cultural inclusion that is not possible otherwise (Lovering, 1993, Packer, 1997a, Packer,<br />

1997b, Pfanstiehl, 2002c).<br />

Outside of the realm of <strong>Audio</strong> <strong>Description</strong>, no studies of the benefits of describing<br />

visual information were found by this investigation. It follows that an appreciable portion of<br />

textbook images contain substantive content, and certainly, studies have shown that images<br />

in texts often improve comprehension over the text alone (Levie, 1982). As to whether the<br />

described images have similar benefits does not seem to have been studied. The Internet<br />

and interactive media areas are too new and perhaps too diverse to provide any generalized<br />

practices for comment. And, what little research has been done into this area from the<br />

consumer’s perspective indicates that these areas are much less successful than <strong>Audio</strong><br />

<strong>Description</strong> and audio books (Gerber, 2002a, Gerber, 2001, Gerber, 2002b).<br />

No studies seem to have been conducted as to whether description of other types of<br />

graphics holds benefits for people who are not visually impaired such as those with cognitive<br />

impairments, including Alzheimer’s disease, learning disabilities, and second language<br />

learners who could benefit from additional spoken descriptions to accompany visual<br />

information.<br />

2.5 The Practical Case for a Unified Model<br />

While there are clearly academic reasons to consider these practices as variations of a<br />

single process, it does introduce a new layer of complexity into what are challenging and<br />

little studied fields. There are also practical benefits to viewing this area as a single process<br />

from the perspective of those providing these services. Below considerations for and against<br />

this view are presented.<br />

20


2.5.1 Reasons to Consider as Separate Practices<br />

The two most important sets of facts to support separate analyses of descriptive<br />

discourses are the differences in traditional media types and consumer interests. Books and<br />

dynamic moving media (video, film) operate according to different audience dynamics. A<br />

book is read according to the schedule of the reader and video productions such as<br />

television shows and movies are linear and have their own fixed timescales. Textbooks<br />

contain still images including diagrams and illustrations that convey specific and often<br />

conceptual messages, while cultural video and film rarely contain diagrams or illustrations.<br />

And while both books and dynamic moving media contain segments, scenes and chapters,<br />

the organizational content of these segments is different. Books present subsections/topics<br />

with hierarchical structures (Lemke, 2002, Raman, 1994) and video productions contain<br />

characters, locations, action, and dialogue that interrelate on a continuous basis as the scene<br />

evolves. There are also substantial differences in different types of consumers and their<br />

backgrounds. Congenitally or early adventitiously blind who have little or no experience<br />

with the visual world want different types of descriptions than those who have a memory of<br />

the visual world and usually want much more (Pfanstiehl, 2002b), raising questions about<br />

what areas of description are appropriate.<br />

2.5.2 Reasons to Consider as a Single Process<br />

The reasons for a more integrated model of this unique communication process fall<br />

into four areas. First, while more traditional media such as books and television shows can<br />

be viewed as very different interactional experiences for the consumer, these lines are not so<br />

neatly drawn with respect to newer digital media. Educational software and the Internet<br />

21


have characteristics common to books and videos. Furthermore, educational videos, while<br />

using the same media as cultural productions, contain many of the structural characteristics<br />

of textbooks including purposeful illustrations and the ability for the viewer to navigate the<br />

text’s structure. Textbooks now also frequently come with digital media in the form of a<br />

compact disc or references to an accompanying website. In addition, Internet sites can and<br />

frequently do contain all of the characteristics of both books and video as well as additional<br />

characteristics of interactivity and hypermodality. In other words, the web can be like a<br />

book, or a movie, or both, and more.<br />

Second, regarding the audience differences, one could also view this as an issue of<br />

relative proportions. Both congenitally and adventitiously blind of different age groups are<br />

consumers of books, movies, television, and the Internet, but perhaps in different ratios. It<br />

is more probable that the reader of a textbook will be an adolescent, but it is conceivable<br />

that adults returning to school or helping their children with homework might use the same<br />

instructional texts. Likewise children may choose for personal interest or family<br />

participation to watch a film or video genre geared towards an older audience. The Internet<br />

as a media space is used by all ages. And, while many government websites might be geared<br />

towards older (more likely to be adventitiously blind) populations, some websites, including<br />

those for museums and cultural collections, are often oriented to the needs of a young<br />

audience.<br />

Third, digital technologies present new opportunities for the ways that the<br />

information in managed and delivered. Digital technology allows traditional boundaries text<br />

types based on publishing restrictions to be blurred creating new hybrid text types (Iedema,<br />

2003) (Piety, 2001). Currently descriptive technology is developing (along with almost all<br />

22


media technology) as a digital technology with all of the power for distribution and<br />

dependence on software as other digital media. It is likely that descriptive technology will<br />

evolve as other information technologies have to have not just a dependence upon software<br />

but an architecture that is governed by software (Lessig, 1999). Trends in information technology<br />

continue in the directions of special purpose hardware being replaced by general-purpose<br />

hardware governed by special purpose software; and more recently, general-purpose<br />

software governed by conceptual models (Gamma, 1995). The power of the model-based<br />

technology is that we may see performances of assistive technology in the future derived<br />

from a conceptual model of the fundamental human processes at work that the technology<br />

supports. If this is the case, the more general this model can be, the more widely it can be<br />

applied to different textual and technological situations.<br />

Fourth, a dimension that will be discussed in more detail below is that of the<br />

describer. <strong>Description</strong>s come from sighted individuals who face a number of significant<br />

choices in constructing their descriptions. Once this unique communication system and the<br />

special and powerful role of the describer is understood, a role that is similar in some ways<br />

to a sign-language interpreter, there may reasons to consider this a professional skill that<br />

transcends media type. There are already a number of organizations such as NCAM and<br />

NTN discussed earlier that work to make several types of media accessible. By focusing on<br />

the properties of this system that are common to all media types, perhaps an economy of<br />

scale can be achieved in future training and possibly accreditation.<br />

23


3. VISUAL ASSISTIVE DISCOURSE<br />

The practices described in the previous section share many properties. They are<br />

intended to support the same types of people, they all rely on technology, and they have<br />

essentially the same goals. This allows them, despite their differences, to be viewed as<br />

members of the same family of communication practices. This study introduces a term for<br />

these practices, <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD). Below, it is formally defined, including<br />

the characteristic properties that distinguish it, how it compares with other better known<br />

communication systems, as well as a conceptual discussion that covers issues related to using<br />

words for images and implications for understanding the principle roles of describers and<br />

consumers of description.<br />

3.1 <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> Defined<br />

This process can be defined from a number of perspectives. Below, four different<br />

types of definitions are presented: 1) by the meanings intended in the term discourse, 2) five<br />

distinctive properties of VAD, 3) how VAD compares as a process to other prototypical<br />

communications processes, and 4) the essential components of VAD.<br />

3.1.1 The “<strong>Discourse</strong>” in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />

The term “discourse” is often used in different ways (Scollon, 2001b) and several<br />

different meanings are intended with the use of it in VAD. First, some have used it to<br />

indicate units of language larger than a sentence and language in use (Schiffrin, 1994). These<br />

are two of the senses it is used here because visual descriptive language is often extended<br />

beyond words and clauses to larger text sub-units.. <strong>Discourse</strong>s have also been looked at as<br />

24


social practices where language is but one element used in social activities that exist for<br />

specific purposes (Gee, 1999, Scollon, 2001a). From this perspective the roles and activities<br />

of the describers and receivers are considered to be important parts of the discursive<br />

process. <strong>Discourse</strong> has also been defined as a multimodal process when it includes different<br />

forms of communicative content (Kress, 2001). And finally, discourse is used here as<br />

Tannen defined it as ‘language in context across all forms and modes’ (Tannen, 1981), and<br />

she said, linguists in their study of discourse are concerned “with the central questions of<br />

structure, of meaning, and how these function to create coherence.” The search for<br />

coherence is an important goal of studying this communication process because in order for<br />

visual description to be meaningful and true to its intention, it must be coherent to the<br />

listener and coherent with the intension of the author(s) of the texts. <strong>Discourse</strong> then is used<br />

in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> to include the linguistic and non-linguistic elements that<br />

interact to create meaning with accessible media using assistive technology.<br />

3.1.2 Common Properties of Different Descriptive Practices<br />

is employed.<br />

There are five basic properties that will be evident in all of the practices where VAD<br />

1. Technology enables the reception of remote descriptions.<br />

2. Descriptive messages are non-interactive.<br />

3. The descriptions are subordinate to another text.<br />

4. The description process is non-transparent.<br />

5. The process is constrained by the text.<br />

25


The first of these properties, that technology enables the reception of remote descriptions,<br />

means that descriptions are created in a different place and possibly different time from<br />

when they are received and it is through technology that these descriptions are brought to<br />

the receiver. The second property is that descriptive messages are non-interactive. This<br />

means that they are received as one-way messages without the listener being able to<br />

influence the form or ask for clarification. The third property is that descriptions are<br />

subordinate to another text. The inserted language is not an independent communication<br />

process, but is a component in a text that plays a very specific role. Fourth, the description<br />

process is non-transparent. The act of describing creates a new text, different from the<br />

original text that is influenced by insertion of descriptive language, the choices reflected in<br />

that language, and the acoustic and prosodic properties of the insertions. Fifth, the process is<br />

constrained by the text. The nature of the descriptions can be constrained by the type of the<br />

text and the technology used to transmit it. And, for <strong>Audio</strong> <strong>Description</strong> only, because<br />

descriptions are inserted in between dialog and other audio content, the types of gaps in the<br />

specific text affect the description.<br />

3.1.3 Comparison of VAD and Other <strong>Communication</strong> Systems<br />

As a class or type of language, the system of VAD has specific components and<br />

roles. Figure 3 shows an abstracted representation of VAD compared to typical written,<br />

conversational, and sign language interpretation processes. Looking at the broad<br />

characteristics of visual description, it is both possible to see it is as a unique communicative<br />

process and how it is related to some other systems. It is like conversation in that it is<br />

26


eceived in a spoken form. It is similar to reading because the receiver of it cannot interact<br />

with the text as they cannot with a book. And, there is an intermediary that facilitates the<br />

communication as sign language interpretation provides. And, unlike any of these other<br />

processes, information is inserted to replace visual information only creating a secondary or<br />

modified text.<br />

Figure 3 - Overview of VAD and other prototypical communication processes<br />

Conversant<br />

Face-to-face Conversation<br />

Author<br />

Written <strong>Communication</strong><br />

Co-constructed<br />

Text<br />

Composed<br />

Text<br />

Conversant<br />

Reader<br />

Hearing<br />

Conversant<br />

The similarities and differences extend beyond the reception of the text to the<br />

creation of texts. In conversation, the text is co-constructed and shared between the<br />

interlocutors (Schiffrin, 1987). It emerges rather than being static. Conversely, in most<br />

written communication, an author creates the text that he expects to be used by a reader<br />

who then reads that work in the form the author produced it 7 . With sign language<br />

interpretation (and all interpretations and translations), there are several texts. The<br />

27<br />

Spoken<br />

Text<br />

Interpreter<br />

<strong>Visual</strong><br />

Text<br />

Interactional Sign Language Interpretation<br />

Author<br />

<strong>Visual</strong><br />

Source<br />

Text<br />

<strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />

Describer<br />

Verbal<br />

Insertions<br />

Modified<br />

Text<br />

Signed<br />

Text<br />

Deaf<br />

Conversant<br />

Consumer


interpreter exchanges spoken and signed texts with the conversant, while all of the<br />

conversants share a common visual text that could also be called part of the context. With<br />

VAD, however, the describer is in a position, in fact is required to, add to and modify the text<br />

created by the author. In order to make the text accessible, the describer must alter it by<br />

inserting language that reflects the describer’s choices for representing the visual<br />

information. Where the sign language (SL) interpreter is responsible for converting discrete<br />

discourse units from sign to speech or visa versa, the visual describer must select certain<br />

elements from the visual field and determine which elements will be described, in what<br />

order, and using what terminology.<br />

3.1.4 The Components of the VAD <strong>Communication</strong> System<br />

Any instance of VAD consists at least four components:<br />

• Source text<br />

• Modified text that includes insertions of descriptive content<br />

• Describer<br />

• Consumer.<br />

The source text is a production that was developed by some individual or<br />

organization to communicate with an audience in a specific way and with specific messages.<br />

Generally, the audience that is anticipated in the development of a source text is a sighted<br />

audience and the source text includes both visual and verbal content. In most cases, the<br />

source text is prewritten and recorded. The modified text contains both original verbal<br />

content produced by the author or the source text and descriptive insertions that are<br />

anonymous and disembodied messages (Goffman, 1963) sent to a receiver who is unknown<br />

28


to the describer. The modified text is created by either a third party describer or by the<br />

author of the source text. The describer is a conceptual role and in practice may be one or<br />

more individuals responsible for creating and placing descriptive insertions. In some cases,<br />

such as audio books, the describer is also presenting other content, while in other cases, such<br />

as described video, the describer presents only the descriptive content. The consumer is also<br />

a conceptual role and equates to a range of individuals. The consumer is generally not<br />

known to the describer; both specifically as the person who will be listening to the<br />

description or by known characteristics because the consumer can be from any many age<br />

group and possessing variable visual impairments, as described earlier. There is no typical<br />

consumer and there may be different preferences for types of information to be provided.<br />

These four components will be present in any VAD event.<br />

3.2 Conceptual Issues with Words for Images<br />

At its core, VAD is a process of using words as a substitute for information available<br />

in visual form. The differences between the visual and acoustic channels and the types of<br />

information usually conveyed through language and the visual field are significant and worth<br />

at least a brief discussion. Language and vision are fundamentally different phenomena with<br />

vision being a primary process and language a secondary process (Bateson, 1972) that<br />

reflects a level of interpretation. As Lemke says, “No text is an image. No text has the exact<br />

same set of meaning affordances as any image” (Lemke, 2002). These differences can be<br />

seen in two aspects: sequential vs. parallel processing, and the ways and volumes that each<br />

provides information. This topic connects to schema theory and it is difficult to classify as<br />

either an interpretation or a translation activity.<br />

29


3.2.1 Sequential vs. Parallel<br />

Language is the result of thoughts being encoded and retrieved in a sequential form<br />

where visual information is perceived in parallel, though a gestalt, as a dynamic whole.<br />

Encoding information in language involves specific representational choices including the<br />

words to use and the sequence in which they are placed. When it is spoken, language<br />

becomes a linear medium. As Slatin says of receiving the description of a web page graphic<br />

“it is an experience of time, not in space” (emphasis in original) (Slatin, 2001). While<br />

language becomes fixed when written or spoken, visual information can change as it is<br />

viewed. It changes in the environment, and moving media such as video and film so that the<br />

person viewing is seeing multiple abstract elements (color, shape, line) and recognizable<br />

objects (people, places, things, actions) concurrently. This dynamic property can also be true<br />

of still images because when viewing a static picture, the eye moves and recognizes different<br />

information over time(Holsánová, 2001, Kress, 1996) and many still images have attentional<br />

foci or cues (Dwyer, 1978) and/or motion called vectors built into them (Kress, 1996).<br />

3.2.2 Raw vs. Processed Information<br />

The second main difference to discuss in relation to this topic is the amount of<br />

information a unit of language and image can provide and the way each provides it. In<br />

general, the visual field can provide vast amounts of information and detail where language<br />

requires specific encoding into words that, by their very nature, summarize and categorize<br />

information, often into prototypical concepts (Rosch, 1978). The following hypothetical<br />

example of two statements of visual description is used to highlight what a description could<br />

be and some of the word/concept issues presented.<br />

30


1. A hobo sits at the counter of a greasy spoon<br />

2. A waitress approaches him<br />

These two statements describe a scene and they show certain choices a describer<br />

might make. One of these choices is vocabulary: the terms “greasy spoon” and “hobo” are<br />

specific to a type of location and person. For a listener who knows what these terms mean,<br />

they likely convey a clearer image with fewer words than the more generic alternatives “run<br />

down lower class dining establishment” and “poor traveler.” Another decision the<br />

describer makes is to keep these characters anonymous. They could have names in a story<br />

that have yet to be revealed, but the description in this case only provides the externally<br />

known information. Yet another decision is the choice to focus on these details (waitress,<br />

hobo, counter, greasy spoon) over others that are present in the same image such as time of<br />

day or other characters. Clearly, these two statements do not equate to a visual scene, but<br />

they may provide the information necessary for a hearer to construct a visual scene.<br />

3.2.3 Schema Theory<br />

At the same time that words cannot provide all of the information available in a<br />

visual field, certain words and concepts associated with them imply a whole range of related<br />

meanings that are often available visually as well. The word “restaurant” for example will<br />

usually imply many physical and conceptual elements (kitchens, wait staff, menus, food) that<br />

do not have to be seen. Complexes of concepts and expectations are thought by some to be<br />

stored in mental structures called schemata (Schank, 1977). When schemata are activated, as<br />

by words and/or visual stimuli, expectations of other information and perhaps assumed<br />

values for information not provided and events that might follow become activated as well<br />

31


(Tannen, 1993a). If these schemas guide expectations in interaction and conversation<br />

(Tannen, 1993b) and are at work in the process of reading (Perfetti, 1988, Wilson, 1986) --<br />

two activities that VAD is similar to, they may play a role in allowing a few words to provide<br />

a large amount of content in VAD.<br />

3.2.4 Translation-Interpretation-Transrepresentation<br />

As a language process, visual description could be seen as a member of a family of<br />

translation and interpretation activities. According to Metzger (Metzger, 1999), “Both<br />

translation and interpretation deal with the rendering of a given text into another language”.<br />

A key issue here is that the text that is being rendered is made up of visual elements (scenes,<br />

signs, smiles, etc.) rather than formalized language. If translation is what happens when the<br />

written is converted into the written and interpretation when the conversational is converted<br />

into the conversational (Hatim, 1997), then visual description might be viewed as trans-<br />

representation because it takes raw visual information and converts it into language. The<br />

transrepresentation is affected by the sequential/parallel, high detail/high concept, and<br />

author encoded/receiver determined meaning boundaries.<br />

3.3 <strong>Description</strong>s: Situated/Constrained in Multimodal Texts<br />

In addition to the general differences between language and visual information<br />

described above, visual descriptions are strongly affected by their environment. They are<br />

both situated in and constrained by where they are placed in a text and the textual nature of<br />

the visual elements that are being represented. While it might seem that descriptions would<br />

relate to those elements that are the most salient visual components in a multimodal text<br />

32


(Norris, 2002), the nature of VAD imposes further effects on the nature of the description;<br />

in ways constraining them.<br />

3.3.1 Textually Situated <strong>Description</strong>s<br />

The selection, meaning, and content of any descriptive insertion is influenced by its<br />

textual position, which can be viewed in three ways: 1) as the textual environment it is<br />

within, 2) its position in the text in relation to other occurrences of the same visual stimuli,<br />

and 3) the relative importance of the stimuli independent of its salience. The hypothetical<br />

description of a hobo in the example above would probably be placed in between other<br />

salient textual information as shown in the example below:<br />

3. <strong>Audio</strong>: Clock chimes 5 times<br />

4. Describer: A hobo sits at the counter of a greasy spoon<br />

5. Describer: A waitress approaches him<br />

6. Woman: Coffee?<br />

In this example, the descriptions from lines 4&5 are contextualized with time and<br />

dialog. The information from the source text in lines 3 and 6 provides context for the<br />

description and equally important, the description provides context for the source text<br />

elements. This mutual relationship between the descriptive insertions and existing source<br />

text content is an essential aspect to the meaning of a description because it is all these<br />

elements that the consumer of a described production will receive. These same lines of<br />

description take on a yet other meanings if part of a textbook as the hypothetical example<br />

below illustrates:<br />

33


7. Reader: During the depression, many people lost their jobs and<br />

traveled searching for work, see picture 1<br />

8. Picture 1 shows a hobo sits at the counter of a greasy spoon<br />

9. a waitress approaches him<br />

These examples should show one of the most important aspects of visual<br />

description, the textual environment and that these descriptions do not come ‘naked’ with a<br />

seemingly unlimited set of meanings as lines 1-2 show. Rather, as lines 3-6 and 7-9 illustrate,<br />

descriptions are textually situated and understanding their meanings involves an understanding<br />

of the information provided in the environment of the source text.<br />

The descriptions are textually situated in a second way by the sequential position of<br />

visual information in the source text. If the examples above were not the first time that the<br />

image or place is described, but a successive occurrence where the audience would be<br />

expected to recognize the place, then a different description could be called for entirely, even<br />

if the visual image was exactly the same. It would be different, if for no other reason that it<br />

was known (“the greasy spoon” rather than “a greasy spoon”) and also because additional<br />

details could be provided if the previously supplied information is expected to be<br />

understood.<br />

An additional factor in textually situating descriptions is the relevance to the text that<br />

visual elements have. Also, if there were details about the scene that were important for<br />

future plot reasons but might not be the most salient feature (as we assume the hobo and<br />

waitress are) but are important for the viewer to know about, the description becomes<br />

situated not only by its enclosing information or its appearance in the text but also because<br />

34


of its significance to the internal structure of the text (Gould, 2002, Pfanstiehl, 1984, Snyder,<br />

2002a) as judged by the describer.<br />

3.3.2 Constraints: Detail vs. Interpretation<br />

Each type of text creates conditions that affect the nature of the description. The<br />

limits or restrictions on visual description are imposed by the technology used in the modified<br />

text and by the methods of the describer. These restrictions can place pressures on the<br />

manner in which the description is rendered. A concept behind these constraints being that,<br />

as the language used in descriptions covers more information with less words, the more the<br />

scene might be evaluated and summarized by the describer leaving the consumer of the<br />

description less opportunity to independently assign meaning (Baquis, 2002). Alternatively,<br />

to provide the building blocks of information to allow the consumer more opportunities to<br />

infer meaning may take more time to describe and hear. In the hypothetical examples of<br />

description from above, there were 14 words and 19 syllables. If the text allowed a smaller<br />

insertion, certain changes would need to be made that would either reduce the amount of<br />

detailed information or summarize the situation with less verbal content. For example,<br />

“hobo” might become “bum” or “man” and “greasy spoon” might become “diner.” Each<br />

of these choices in language might have a slightly different effect on the hearer.<br />

The books recorded by RFB&D can use either a special format audiocassette tape or<br />

a special digital format. In both cases, the image descriptions are placed in specific locations<br />

(usually after a paragraph where they are referenced) in the linear recording. While the<br />

consumer of the RFB&D media can scan forwards and backwards, they cannot access the<br />

images separately from the body of the text because both are combined into the same audio<br />

35


stream. The RFB&D technology does not impose any restrictions on the amount of<br />

description, so they would be called unrestricted descriptions.<br />

Internet sites have built-in functions for describing pictures directly in a property<br />

attached to an image (Alt-text) or in a separate page (Long-Desc) that the user can navigate<br />

to. <strong>Assistive</strong> technology either renders the descriptions into refreshable Braille devices or<br />

uses voice synthesis. Some browsers place limits on the amount of Alt–text that can be<br />

displayed, so direct descriptions are potentially restricted while indirect descriptions are<br />

unrestricted. Slatin, in “Maximum Accessibility,” recommends no more than 150 characters<br />

(Slatin, 2001), including punctuation, while Alonzo recommends no more than 300 words<br />

for long description (Alonzo, 2001). Internet descriptions could be viewed as restricted by<br />

convention.<br />

Restrictions for descriptions in <strong>Audio</strong> <strong>Description</strong> are more complicated. As Frazier<br />

had envisioned, description is mostly inserted into the gaps or bridges (Frazier, 1975) in the<br />

audio. A basic rule for all <strong>Audio</strong> <strong>Description</strong> practices is that descriptions do not impede<br />

dialog. And, then theoretically, that entire span between dialog segments could be used for<br />

description. But, there are cases where non-dialog audio cues (crashes, bumps, etc.) require<br />

the description to be synchronized so that the consumer is able to make sense out of the<br />

original audio material. Also, in some cases such as opera or musicals, music or parts of a<br />

soundtrack are considered more important than description so visual description is further<br />

restricted by both the specific text and the conventions of the describer.<br />

<strong>Description</strong>s of software and interactive elements tend to be short and functional,<br />

describing the interactive effect of a function associated with an image rather than an<br />

36


elaborate description of the image itself (Board, 2001b, Gerber, 2002a, Slatin, 2001, W3C,<br />

1999). These descriptions are unrestricted, but usually brief.<br />

3.4 Discussion: The Role of the Describer<br />

Ultimately communication, or a failure to communicate effectively happens between<br />

people. In VAD, these people are primarily either describers or consumers. Having<br />

considered some of the conceptual issues with representing visual information in language,<br />

the different characteristics of language, and the effect of textual constraints, it is now<br />

appropriate to discuss the role of describer again. Describers are the first human link in the<br />

unidirectional process of making media accessible. What the describer selects for<br />

description, the manner it is described in, and how it is positioned in the modified text is<br />

final. The describer is a gatekeeper of information. It is a role that is both powerful and difficult.<br />

The describer must balance all of the visual and linguistic factors, must select which<br />

information is to be presented and how it will be presented within the textual constraints.<br />

Three dimensions of the describer role worth discussing are the cognitive process for<br />

describing, the role as intermediary, and the practice effects that add other dimensions to<br />

how descriptions are created.<br />

3.4.1 Describing as a Way of Thinking<br />

Practitioners of <strong>Audio</strong> <strong>Description</strong>, one of the most methodologically established<br />

forms of VAD say that providing visual description does not seem to be a natural process<br />

for most people and often requires specific training (Gould, 2002, Phansteihl, 2002, Stovall,<br />

2002). Snyder says, “We must learn to see the world anew” (Snyder, 2002b). Perhaps a<br />

reason why it requires special training is that, while descriptions are produced using the tools<br />

37


of everyday communication (speech and writing), the way language is used in visual<br />

description is not common.<br />

Chafe describes two modes of<br />

conversation: immediate and displaced (Chafe,<br />

1994). Immediate conversation deals with an<br />

extroverted consciousness where the mind is<br />

focused on perceiving, acting on, and evaluating<br />

information that is in the present; while<br />

displaced conversation uses a consciousness<br />

that is introverted to remember and imagine.<br />

Chafe describes the majority of everyday talk as<br />

38<br />

Figure 4 - Chafe's view of immediate mode<br />

perceiving<br />

acting<br />

evaluating<br />

environment<br />

EXTROVERTED CONSCIOUSNESS<br />

represented<br />

representing<br />

speaking<br />

language<br />

Speaking in the Immediate Mode<br />

dealing with the introverted and displaced consciousness: events that have happened, did not<br />

happen, or might happen as well as possible conditions and realities and much of the<br />

grammar of languages such as English are designed to accommodate these unreal (irrealis)<br />

situations. As we can see in figure 4, Chafe’s view of immediate consciousness places<br />

evaluating as a natural and integral part of the perceptual process. But, this conflicts with the<br />

goals of VAD, so the describer is required to suppress (or control) the evaluative process so<br />

that the described information is in such a form as to allow the listener to do evaluation<br />

from the raw informational materials. While describing in some ways may be as old as<br />

language itself, only describing what is actually occurring visually at the moment, and<br />

suppressing evaluation, is like asking the describer to pull an “end-run” around the natural<br />

thought process of extroverted consciousness by proceeding directly to speaking. Further,


the environment of VAD is neither the environment that surrounds the describer or the<br />

consumer, but rather the environment of the text and its visible boundaries. The describer is<br />

also performing this conceptual end run while only focusing on a small window of the<br />

immediate experience, which is the world of the text.<br />

3.4.2 Describer as Intermediary<br />

The describer is in an intermediary position that is similar to both a sign language<br />

interpreter and an author. As an interpretive mediator, the describer is responsible for<br />

creating language that is to serve as an equivalent to visual information. This role, places<br />

the describer in a position of relative power similar to the role that Metzger found<br />

interpreters may take in “Sign Language Interpreting: Deconstructing the Myth of<br />

Neutrality” (Metzger, 1999). Furthermore, since the words of the describer, unlike the<br />

words of conversational SL interpreters, are final, the role of describer has some qualities of<br />

an author (Harris, 2000). While the goal of VAD practices do not call for the describer to<br />

take a substantive role in the communication process and to act as Pfanstiehl says “as a color<br />

video camera,” (1984), the fundamental nature of VAD places the describer in a non-<br />

transparent position that will affect the process through the choices they make.<br />

3.4.3 <strong>Description</strong> as a Group Process<br />

For many producers of description, creating the modified text is not a one-person<br />

task. It involves teams with formalized procedures, reference manuals, and working<br />

documents. While this study has not investigated the nature of description teams and their<br />

documents as illustrated in figure 5, it should follow that the process that gives birth to the<br />

modified text could have significant impacts upon the quality of the result.<br />

39


Both of the <strong>Audio</strong> <strong>Description</strong> organizations interviewed for this study use teams in<br />

the creation of their descriptive content, but the teams seem configured differently. In the<br />

case of RFB&D, each book is pre-read and which images are to be described as well as<br />

where the descriptions will be positioned is marked up for the readers assigned to it. The<br />

reader then only describes those pictures that have been pre-selected (Smith, 2002, Vollmer,<br />

2002). Once a RFB&D reader encounters a picture notation, he or she describes it in a<br />

seemingly individual way. For RFB&D the reader is also the author of the descriptions.<br />

Conversely, the established organizations doing recorded <strong>Audio</strong> <strong>Description</strong> organizations<br />

use a script that can be written and edited by people other than the person who is the voice<br />

of description.<br />

Figure 5 – Conceptual view of a the description process<br />

Text<br />

Producer<br />

Source<br />

Text<br />

Working<br />

Docs<br />

Support<br />

Team<br />

Describers<br />

Original Information<br />

Manuals<br />

& Style<br />

Guides<br />

<strong>Description</strong>s<br />

All of the organizations doing recorded <strong>Audio</strong> <strong>Description</strong> interviewed for this<br />

research use a prescreening process where some members of the description team listen to<br />

just the audio to identify critical points of comprehension failure. And, all of these<br />

organizations indicated that the script writing and editing are extensive processes that<br />

40<br />

Team-based<br />

<strong>Description</strong> Process<br />

Modified Text


include evaluations about the amount and the type of description that can be inserted. As a<br />

result, the <strong>Audio</strong> <strong>Description</strong> insertions rarely show markers of spontaneity such as<br />

nonfluencies or hesitations.<br />

3.5 Discussion: The Role of the Consumer<br />

The role of the consumer is a disadvantaged position in VAD. Where readers of<br />

books and recipients of SL interpretation each have a different communicative challenge,<br />

using a non-interactive text and requiring a communicative intermediary respectively, the<br />

visually impaired face both of these challenges with the described visual information. For<br />

the consumer, they do not experience both a source text and a modified text. The text they<br />

receive is the modified one and it is an amalgam of original material and insertions of<br />

description. Throughout the investigations leading to this report, those involved with<br />

providing descriptions across all media types consistently expressed an interest in the internal<br />

mental process of consumers, in how the descriptions they provide are effective or not in<br />

fulfilling the goals of description. In this way, VAD is similar to the other language systems<br />

described above because there is no consensus on universal principles for how language and<br />

thought actually work inside someone’s mind. There are, however, some indications of what<br />

type of activity listening to VAD could be similar to.<br />

3.5.1 The Consumers: Actively Building a Text Model<br />

Research indicates that the process of listening is very similar in terms of basic<br />

perception and comprehension to reading (Townsend, 1987). So, a person listening to a text<br />

might be engaging in the same or similar cognitive processes as would someone reading the<br />

same text. Reading is a process that involves both the decoding of the graphic, or tactile<br />

41


symbols for Braille (Knowlton, 1996), and cognitive processes of developing a set of mental<br />

representations called a “text model.” (Carpenter, 1986, Haberlandt, 1988). Because of the<br />

structural similarities to reading, it can be deduced that the descriptive insertions are<br />

providing information that contributes to the process of building a text model in the mind of<br />

listener.<br />

If the process of receiving a described text is similar to reading, then a number of<br />

factors will influence the text model that is built. The listener’s personal history and world<br />

knowledge (schemata), as well as their goals will all probably influence what type of a mental<br />

representation is built from the text they receive (Wilson, 1986). Since it is the consumer<br />

that is creating their own understanding as they experience the text, the descriptions should<br />

be viewed as cognitive tools (Vygotsky, 1934), rather than brush strokes to a painting painted<br />

by the describer, as pieces of information that are the building blocks of mental structures in<br />

the mind of the consumer as she actively develops an understanding of the text.<br />

Anecdotal<br />

evidence from those<br />

who practice <strong>Audio</strong><br />

<strong>Description</strong> supports<br />

the view that receiving<br />

description is mentally<br />

active. Frazier (1975)<br />

describes how as an<br />

Figure 6 - Conceptual diagram of consumer's process<br />

Modified Text<br />

Consumer’s<br />

Process<br />

Original<br />

Information<br />

<strong>Description</strong>s<br />

individual listening to a production assembles an understanding of the action from audio<br />

42<br />

Text<br />

Model<br />

Consumer<br />

Purposes<br />

& Goals<br />

Personal<br />

History<br />

World<br />

Knowledge


clues. A quarter century later, DVS customers in feedback sessions reported that when the<br />

appearance of a character is described long after a character is introduced to the audience,<br />

that the new descriptive information can clash with a “mental picture” that the listener has<br />

already created (Gould, 2002). Stovall, himself blind, and the founder of the Narrative<br />

Television Network, one of the two largest U.S. producers of <strong>Audio</strong> <strong>Description</strong>, described<br />

that the listener has a mental picture in his or her mind – it may not be exactly what the<br />

person or place looks like on the film, but it is sufficient for understanding the production<br />

(Stovall, 2002).<br />

3.5.2 The Purposes and Goals of Consumers<br />

Another important dimension in understanding the role of the consumer is<br />

motivation and purpose. Why do people listen to <strong>Audio</strong> Described productions or listen to<br />

textbooks? While these questions may seem elementary, it is important to recognize that this<br />

area is probably the most important part of the study of VAD and also suffers from the least<br />

real data. The formal studies of the consumption of VAD were in <strong>Audio</strong> <strong>Description</strong> and a<br />

small amount on visually impaired people using the Internet. There is little documentation<br />

on the larger practices that people listening to visual description are engaged in. And, the<br />

discussion below provides no more than a few selected examples that may adumbrate some<br />

of the larger issues that remain to be explored in individual motivations and uses for<br />

described material.<br />

As part of the research for this publication, members of an online community for<br />

<strong>Audio</strong> <strong>Description</strong> were polled on this topic and encouraged to provide insight into this<br />

process. Most of the responses received indicated that the service of <strong>Audio</strong> <strong>Description</strong> is<br />

43


essential to providing access to productions, but there was little specificity as to type of<br />

production or information that were of interest. This general view is supported by some<br />

publications from the American Foundation for the Blind that says that <strong>Audio</strong> <strong>Description</strong> is<br />

used both for the enriching, aesthetic experience of the content of the text and for cultural<br />

inclusion (AFB, 1991). <strong>Visual</strong>ly impaired individuals often report watching movies with<br />

sighted friends and family and that the understanding of culturally relevant texts, whether<br />

from individual or group viewing, is useful in social activities.<br />

One post to an <strong>Audio</strong> <strong>Description</strong> online community that predated the poll just<br />

mentioned indicated that facial expressions were specifically interesting to a congenitally<br />

blind listener who said “Even though <strong>Audio</strong> <strong>Description</strong> does not give me a concrete<br />

example of the various ways of smiling, it does provide me with very valuable information<br />

about what kind of expressions may be exchanged between people” (Miller, 2002). In<br />

another area, WGBH and the American Foundation for the Blind conducted a survey of<br />

audio description customers that indicated a strong interest in science programming (Kuhn,<br />

1992b).<br />

In the area of visual descriptions for textbooks, while it is logical that the consumers<br />

are visually impaired individuals using this material for educational purposes, the visually<br />

impaired now make up only 25% of the customers of RFB&D, with the remaining members<br />

having dyslexia and/or other learning disabilities (RFB&D, 2001b). It was also reported<br />

informally that the recorded RFB&D materials have been used in classes of reading<br />

challenged students who are neither blind nor dyslexic. Naturally, these uses are not related<br />

to VAD, but may impact the type of service provided to the visually impaired students.<br />

44


The Internet, being a broad publishing medium can by definition be used for a range<br />

of situations from purely informational to entertainment and education. Research from the<br />

American Foundation for the Blind indicates that much of the Internet is difficult to use for<br />

the visually impaired despite the accessible technology (Gerber, 2002a, Gerber, 2001, Slatin,<br />

2002). But, all indications are that the visually impaired attempt to use the Internet for<br />

similar reasons as the rest of the population: eCommerce, information, entertainment, etc.<br />

3.6 Cultural Issues with <strong>Description</strong>?<br />

Is there a cultural dimension to VAD? <strong>Communication</strong> between the sighted<br />

describers and visually impaired consumers raises interesting challenges regarding traditional<br />

definitions of culture. Culture is often viewed as a phenomenon that both transmits and is<br />

transmitted through language. And, culture can be defined as a phenomenon that operates<br />

on non-linguistic levels(Scollon, 2001b), some of which are influenced by vision.<br />

And while deaf communities have distinct cultural boundaries with linguistically<br />

perceptible features (Lucas, 1989, Valli, 2001), members of the blind and visually impaired<br />

communities, not having a separate language, might not be viewed as a separate culture.<br />

Further since the majority of blind individuals have had sight at one time and all presumably<br />

interact with sighted individuals daily, it is difficult to draw a cultural boundary around the<br />

consumer community. However, within the process of description, certain communication<br />

issues appear that are similar to the types of issues that appear when people of different<br />

cultures try to communicate. For example, if a describer uses language that encodes visual<br />

assumptions (eg: perspective, color, etc.) and the receiver of that description would not<br />

understand other associated/implied meanings, then miscommunication similar to cross-<br />

45


cultural miscommunication, although not according to traditional definitions of culture 8 ,<br />

might occur. Further, as Scollon and Scollon state: one culture does not actually<br />

communicate with another culture; individuals from different cultures do (Scollon, 2001b).<br />

And, when people communicate they do so in places and with purposes that influence the<br />

nature of the communication produced (Scollon, 2001a). Within VAD, the places that the<br />

describers and consumers participate in – the sites of engagement – are very different and, unlike<br />

in face-to-face communication, these sites of engagement are separated physically, and<br />

usually by time. While they are at the outer edge of this study’s focus, these types of<br />

questions are important to ask because, even if the communication span between describers<br />

and consumers cannot be classified as a cultural divide, there may be sufficient differences<br />

between the historical, locational, and perceptual orientations between these two groups to<br />

foster miscommunication similar to the culturally influenced communication failures and<br />

where cross-cultural sensitivities may be important.<br />

46


4. STUDY OF AUDIO DESCRIPTION<br />

The previous section provided a top-down conceptual framework called <strong>Visual</strong><br />

<strong>Assistive</strong> <strong>Discourse</strong> (VAD) with a discussion of specific types of roles and factors that might<br />

affect its success. This section provides a complementary bottom-up and data-driven<br />

analysis of one specific form of VAD called <strong>Audio</strong> <strong>Description</strong>. Of the different varieties of<br />

VAD, <strong>Audio</strong> <strong>Description</strong> (AD) presents the most practical one to study in this forum. Since<br />

its development as an active process in the early 1980s, AD has been practiced mostly with<br />

methods that stem from one source and that adhere to specific principles. The other<br />

established VAD practice, <strong>Audio</strong> Books, was also investigated as part of this study. But, for<br />

a variety of methodological and practical reasons, <strong>Audio</strong> Books was determined to be too<br />

large and have too many complicating issues to make it a good candidate for the detailed<br />

language analysis in the time frame for this study.<br />

This study looks at the stream of descriptive statements in <strong>Audio</strong> <strong>Description</strong> as an<br />

example of language use and aims to answer two questions: 1) what is the constituent<br />

structure in AD and 2) what types of information are provided and what patterns in<br />

representation exist within AD? Because this study is the first analysis of the language of<br />

AD (or VAD for that matter) the information presented will be broad and many important<br />

opportunities for more research will remain.<br />

4.1 The Study Corpus<br />

Within practice of <strong>Audio</strong> <strong>Description</strong>, there are a number of important sub-practices,<br />

each with its own specific challenges. The sub-practice chosen for this study is the<br />

47


description of films. The data for this study comes from four different video productions<br />

described by three different description organizations as shown in table 1.<br />

Table 1 - Study corpus material<br />

Source Text<br />

Producer<br />

Describer<br />

48<br />

<strong>Description</strong> Content<br />

Length<br />

Words Length<br />

A Star Is Born (1937) Selznick International Narrative TV Network 114 Min 6110 37 Min<br />

L.A. Story (1991) Artisan Entertainment WGBH /DVS 90 Min 4123 29 Min<br />

Gladiator (2000) DreamWorks SGK WGBH /DVS 148 Min 12337 86 Min<br />

Gift of Acadia (2000) National Parks<br />

4.2 Methodology<br />

Service<br />

The Washington Ear 14 Min 763 4 Min<br />

The techniques used to analyze this data are based in large part on spoken discourse<br />

analysis where the descriptive language is transcribed and then analyzed for structural and<br />

functional properties. Much of traditional spoken discourse analysis deals with interactional<br />

conditions. While the process studied is certainly not interactional, there are important<br />

reasons that these analysis techniques were used as the starting point for the analysis of AD.<br />

First, the messages the consumer receives are units of speech and so will display properties<br />

specific to speech. Second, the level of detail used in spoken discourse analysis, that focuses<br />

on words and utterances in larger contextualized units is a useful way to view units of AD.<br />

Third, since the creators and consumers of visual description speak the same language and<br />

are members of the same types of speech communities, this language use can be considered<br />

a form of spoken discourse, although a very special one.


This approach prioritizes the words of description and does not focus on the<br />

multimodal issues involved with movies. These multimodal properties are significant, but in<br />

the interests of space and for publishing concerns, they are subordinated in this analysis to<br />

the surface representations of the description and relevant dialog. Specific transcription<br />

conventions and technical issues related to the multimodal issues of these productions are<br />

discussed in Appendix B.<br />

4.3 The Structural Components of <strong>Audio</strong> <strong>Description</strong><br />

Some basic structural definitions are necessary to begin this study. These definitions<br />

have been derived from the analysis of this AD corpus and also other corpuses of textbooks<br />

and Internet sites that are not included in this publication with the hope that the terms and<br />

definitions would be generalizable.<br />

It would have been convenient and desirable to use the same structural definitions<br />

used in other areas of linguistic inquiry for VAD. And, in some ways, the language found in<br />

AD is similar to other language uses. Below the word, at the morphological and<br />

phonological (word parts and sound component) level, the constituents in AD are identical<br />

to other forms of spoken language. Above that level, however, at the level that can be<br />

considered the discourse, different types of structures clearly appear.<br />

This study proposes a discourse constituent structure based on four components:<br />

insertions, utterances, representations, and words. Table 2, below, provides some definitions for<br />

these components.<br />

49


Table 2 - Summary structural components of audio description<br />

Basic Structural Hierarchy<br />

Element Definition<br />

Insertion: A contiguous stretch of description (Analogous to paragraphs and<br />

turns in written and spoken discourse) uninterrupted by other<br />

significant audio content such as dialog.<br />

Utterance: A continuous stream of words (similar to a sentence) containing one<br />

or more representations separated by more than ½ second of time 9 .<br />

Often a gap of 1 second or more separated utterances.<br />

Representation: An interpreted component of an utterance that conveys information<br />

about the visual field. Representations having different properties,<br />

including a focus and a type.<br />

Word: The words used in <strong>Audio</strong> <strong>Description</strong>, presumably like other forms of<br />

4.3.1 Insertions<br />

VAD, are a subset of words used in normal spoken/written language.<br />

They can provide content or function.<br />

The largest contiguous unit of description is an insertion. This is what Frazier called<br />

a “bridge” and is almost always bounded by dialog. The term insertion was chosen because<br />

this is fundamentally what is being done in VAD: descriptions are inserted into another text.<br />

As examples below will show, this unit could not be considered a paragraph in the traditional<br />

sense because it does not always express a consistent unit of thought. There are no topic<br />

sentences or summaries, and the only cohesion devices within them are based on common<br />

pronominal reference (Halliday, 1985). Neither could it be called a turn or other structure<br />

50


often considered part of spoken discourse. Without digressing into the issues associated<br />

with coherence in spoken and written discourse (well beyond the scope of this study), what<br />

is evident from the transcripts of AD in this study is that insertions are essentially collections of<br />

utterances. There are no differences in structure or function between the first utterance in an<br />

insertion and the last or those in between. The utterances inside an insertion are essentially<br />

interchangeable. Insertions can be of variable length. This corpus contains 842 insertions.<br />

The shortest are less than one second in length and the longest is over five minutes. The<br />

mean duration of an insertion in this corpus is 11.09 seconds.<br />

Transcript 1 below shows these properties of an insertion. In this section taken<br />

from one movie, there are three insertions in (lines 2-5, 16-17, and 19-24). The only words<br />

in this production come from a narrator and a describer. The narrator is part of the original<br />

audio from the source text and a describer is speaking the AD insertions. The describer and<br />

narrator alternate in a structure that appears similar to turn taking. But, unlike the turn<br />

structure of a conversation, these two voices are not speaking to each other but each is<br />

speaking to the audience independently. The describer does not address any of the topics<br />

in the narration and the narration does not respond to any of the information that is in the<br />

description. Further, within the descriptions, each utterance reflects an independent<br />

thought. An analysis of the narrator’s words reveals an expository structure with<br />

elaboration, contrast, hypothetical constructions and other features of language used to<br />

convey a range of ideas, while an analysis of the describers words reveals a very different<br />

type of language use that will be the focus of later portions of this study.<br />

51


Transcript 1- From “The Gift of Acadia” 1:06<br />

1 Narrator: The gifts of Acadia and they are many…are simple<br />

2 Describer:We move over the still waters of Jordan pond toward the twin bare<br />

domes of the bubble mountains<br />

3 Bold letters swing out of the screen toward us the gift of Acadia<br />

4 We continue across Jordan pond toward south bubble mountain<br />

5 At otter point a huge wave crashes onto a square granite rock ..<br />

white spray flying<br />

6 Narrator: It is many ways a gift<br />

7 First a gift of NATURE .. crafted by the sea<br />

8 By 500 million years of sediment pressed into rock<br />

9 Rock rising then subsiding<br />

10 Glaciers overwhelming and scouring the tops of that rock .. until today<br />

some scoured rock tops are held by the sea called islands<br />

11 While some rise free as MOUNtains<br />

12 Mountains .. westerners laugh .. but in fact that up thrust of granite called<br />

Cadillac .. its wrinkled bald head gazing out at the sea from 1500 and 30<br />

feet up is the highest mountain on our nation’s east coast.<br />

13 But forget that .. because Acadia is not a place for superlatives<br />

14 On the contrary .. Acadia reminds a society sated with superlatives …<br />

highest biggest fastest richest … that there are other BETTER values<br />

15 The value of solitude .. and in solitude contemplation<br />

16 Describer: A young woman in blue shirt and shorts lies on her back on a rocky<br />

ridge overlooking the sea below<br />

17 She is reading a book<br />

18 Narrator: The value of diversity and in diversity harmony<br />

19 Describer: A small brown fawn looks at us twitching his left ear<br />

20 A black-headed loon drifts by<br />

21 A thin black dragonfly on a green leaf opens and closes its wings<br />

22 Two little orange-breasted baby robins wiggle their heads<br />

23 Under water two white-sided dolphins swim smoothly side by side<br />

24 On the quiet surface of the sea two black triangular dorsal fins<br />

emerge then curve back down under water<br />

25 Narrator: Acadia is a meeting ground<br />

52


This transcript shows that the describer’s language consists mostly of separate<br />

thoughts. There are only two places where one statement is worded depending on another.<br />

The first one occurs in line 4 that says “we continue” in reference to line 2 that says “we are<br />

flying.” The second is the use of the pronoun “she” in line 17 to refer the same woman<br />

shown in line 16.<br />

4.3.2 Utterances<br />

Once a descriptive insertion begins, the listener will encounter a series of one or<br />

more utterances. The term utterance was chosen because it is a unit of analysis that is<br />

relevant to the range of speech productions found in conversation (Schiffrin, 1987).<br />

Utterances can, but need not be, grammatical and were initially defined by Harris as “any<br />

stretch of talk by one person, before and after which there is silence on the part of that<br />

person” (Harris, 1951).<br />

Utterances in AD represent a set of representations about the visible field. Because<br />

they are independent structures and could usually be rearranged without becoming<br />

incoherent (in form not in terms of actions), they can be considered like a series of<br />

snapshots. Utterances can usually be arranged in any way to fill the time available in the<br />

insertion, and they can be as long or as short as the describer chooses them to be within the<br />

insertion space. But, as figure 7 shows, most are very short. Almost 60% of utterances are<br />

between one and two seconds in length and more than 30% are between three and four<br />

seconds in length. The effect for the listener is that these snapshots of visual information are<br />

produced as if by a strobe effect where the field is visible for a short period of time and then<br />

53


epresented as language and then visible again and represented again until dialog or<br />

meaningful audio from the source text takes over.<br />

Figure 7 - Length of utterances in corpus<br />

70.00%<br />

60.00%<br />

50.00%<br />

40.00%<br />

30.00%<br />

20.00%<br />

10.00%<br />

0.00%<br />

7.98%<br />

59.56%<br />

32.46%<br />

8.15%<br />

Much like spoken discourse, utterances are often grammatical, but need not always<br />

be because context often makes their meanings clear when they are not. Below are two<br />

examples from transcript 1 that are not grammatical, but meaningful.<br />

3 Bold letters swing out of the screen toward us the gift of Acadia<br />

5 At otter point a huge wave crashes onto a square granite rock<br />

white spray flying<br />

In line 3, the first part of the utterance describes that something is being read on the<br />

screen (a type of representation that will be covered below) with “Bold letters swing out of<br />

the screen towards us.” The describer then continues with the content of what was read<br />

with “the gift of Acadia.” If an introduction, for example “it reads, ” preceded the part that<br />

was read, the statement would become grammatical and it would also then consume a few<br />

more syllables. The second example from line 5 is similar. The first part describes a scene<br />

54<br />

1.56% 0.25%<br />

0-.99 1-2 3-4 5-6 7-8 10+


with action “At otter point a huge wave crashes onto a square granite rock” and the<br />

ungrammatical clause “white spray flying” is appended without any introduction. But, here<br />

also the meaning is clear that the white spray relates to the wave crashing that precedes it. In<br />

these ways, much as is spoken discourse, the elimination of words that are unnecessary for<br />

meaning to be conveyed can result in more efficient but technically ungrammatical forms.<br />

Furthermore, utterances can come in many patterns. They can contain a single visual<br />

feature or action or can include several pieces of information a sequence. For example line 2<br />

in transcript 2 below indicates a simple action: one character makes a gesture. In line 3<br />

however, there are two actions: 1) she takes the arm 2) they stroll away. These two actions<br />

are joined by the connective “and,” but the same effect could have been achieved with the<br />

use of “then.” The combination of actions need not be sequential. For example, line 9<br />

shows two actions that occur simultaneously. In this case, they are joined by “as,” but<br />

simultaneous action is also indicated with “while” preceding the first action.<br />

Transcript 2 - From "A Star is Born" 16:20<br />

1 Danny: You’re going to by me a drink come on<br />

2 Describer: He holds out his arm<br />

3 she takes it and they stroll away<br />

4 Later in a bar<br />

5 Danny: That’s right George there’s nothing like a little rum to take away that<br />

milk flavor<br />

6 Describer: The bartender pours two shots of rum into a glass of milk<br />

7 Later Esther and Danny are drinking the drinks<br />

8 She playfully punches him<br />

9 He punches back and catches her as she falls off her stool<br />

10 Danny: I beg your pardon<br />

11 Esther: Certainly<br />

55


4.3.3 Representations<br />

The previous discussion of utterances introduced the fact that an utterance can<br />

contain more than one unit of information. In the examples discussed above, the units of<br />

information were primarily actions presented in sequence or actions that occurred<br />

simultaneously; or in the case of line 9, two actions envelop a third. But even within an<br />

utterance that is action based, there can be other types of information. For example, line 20<br />

from transcript 1 reads:<br />

20 A black-headed loon drifts by<br />

This single action is of an object in motion. It also contains a description of a visual<br />

appearance that tells the listener that this object (a loon) has a black head. Assessing the<br />

complete sets of meanings any unit of language provides is much larger than the scope of<br />

this study. That discussion, includes semantics and lexical semantics that are the studies of<br />

meanings encoded in sentences and words (O'Grady, 2001) as well as pragmatics, the study<br />

of meanings received by the listener that are not contained within the semantic content<br />

(Levinson, 1983) and further understanding that is communicated at the discourse level.<br />

Because the language used in AD is so specialized and does not include vast amounts of<br />

structures found in the language forms that semantics and pragmatics often draw upon, a<br />

simplified classification of the different types of information is proposed in this study.<br />

While it departs somewhat from established linguistic meaning analysis, it is more<br />

appropriate to the restricted nature of AD. The term chosen for this classification of<br />

meaning is representation.<br />

The term representation rather than other linguistic constructs such as “phrase” or<br />

“clause” is used because, as the data presented below will show, identifiable units of meaning<br />

56


can come from a range of linguistic forms from words to sentences. The concept of<br />

representation presented here is the visual information that has been selected by the<br />

describer to be sent over the auditory channel to the consumer. Representations have both a<br />

focus and a type. The focus is the person place or thing that the representation is about and<br />

the type will fall into one of the following seven categories:<br />

1. Appearance: The external appearance of a person, place, or thing.<br />

2. Action: Something in motion or changing.<br />

3. Position: Location of description, location of characters.<br />

4. Reading: Written or understood information being literally read,<br />

summarized, or paraphrased.<br />

5. Indexical: Indicates who is speaking or what is making some sound.<br />

6. Viewpoint: Relates to text-level information and the viewer as viewer.<br />

7. State: Not always visible information, but known to the describer<br />

and conveyed in response to visual information.<br />

Some examples of these types of representations have already been shown in the first<br />

two transcripts. Below, each of these types of representations will be discussed using<br />

additional transcripts.<br />

Appearance<br />

Appearance is in some ways the antecedent of all of the other types of representation<br />

because all representations require an appearance of something in the source text order to be<br />

realized in the description. But the others, with the possible exception of actions, do not<br />

convey properties that are externally describable with accuracy.<br />

Appearance representations are the subset of description that provide information<br />

about the direct visual properties of something in the source text including luminance, color,<br />

57


size, and shape. Appearance is realized usually through adjectives and the nouns they modify<br />

as some examples from transcript 1 illustrate.<br />

From Transcript 1<br />

2 We move over the still waters of Jordan pond toward the twin bare<br />

domes of the bubble mountains<br />

3 Bold letters swing out of the screen toward us the gift of Acadia<br />

5 At otter point a huge wave crashes onto a square granite rock .. white<br />

spray flying<br />

16 A young woman in blue shirt and shorts lies on her back on a rocky<br />

ridge overlooking the sea below<br />

19 A small brown fawn looks at us twitching his left ear<br />

20 A black-headed loon drifts by<br />

21 A thin black dragonfly on a green leaf opens and closes its wings<br />

22 Two little orange-breasted baby robins wiggle their heads<br />

23 Under water two white-sided dolphins swim smoothly side by side<br />

24 On the quiet surface of the sea two black triangular dorsal fins emerge<br />

then curve back down under water<br />

External appearance can also be conveyed with prepositional attachment as line 16<br />

shows and also through adverbials. It follows that if a consumer is interested in visual<br />

information-- in what things look like normally or in certain situations -- that the<br />

information would be often be provided through appearance representations.<br />

Action<br />

Consistent with the examples from transcript 1, shown above, most utterances are<br />

based on some form of action. Actions can include gestures, movement, and activities. And they<br />

58


can act as the core representation that other representations are clustered around. Transcript<br />

3 contains a typical set of action-oriented sequences.<br />

Transcript 3 - From "Gladiator" 40:26<br />

1 Maximus: At least give me a clean death .. a soldier’s death<br />

2 Describer: One guard moves behind Maximus<br />

3 Then rests his sword point on the back of his neck<br />

4 Maximus bows his head as the guard raises the sword<br />

5 Maximus leaps up and buts the guard off balance then catches the<br />

blade and spears him in the throat<br />

6 Spinning he chases the second guard whose blade sticks in its<br />

scabbard<br />

7 Maximus: The frost .. sometimes it makes the blade stick<br />

8 Describer: With bound hands Maximus slices the sword across the guard's face<br />

9 Nearby two other praetorians sit on restless horses<br />

10 One gallops into the clearing, then twists in his saddle<br />

11 A sword flies at him end over end<br />

12 It buries itself in his back<br />

13 Maximus steps out from the trees glaring<br />

14 Maximus: Praetorian<br />

The two insertions in transcript 3 contain action in every utterance. This is a<br />

representative sample because it shows some of the different ways that action is presented.<br />

Line 2 shows action that relates the position of one person to another. Line 3 shows an<br />

action with an object (sword point) and location of the object. Line 4 is an example of<br />

simultaneous actions and line 5 contains a sequence of actions presented in a list form. Line<br />

6 represents one action (spinning) as part of another action (although it is likely that the<br />

meaning intended is two actions in sequence). Line 8 and 13 describe the manner of an<br />

59


action (with bound hands, glaring), and lines 11 and 12 indicate an action where the agent is<br />

inanimate.<br />

The scope of this study does not allow as full an analysis of the action<br />

representations as would be desirable. The types of meanings associated with different<br />

English verbs would be a good starting point for a more thorough analysis of the action in<br />

AD. But, while actions are represented with verbs, not all verbal forms are expected in AD.<br />

Further a data-driven analysis focusing only on action, the largest part of the AD content<br />

pie, would probably yield a further subset of action types that are relevant to this domain.<br />

Position<br />

Another type of representation that is often associated with actions identifies the<br />

positions or locations for information being described. Positional representations can act as<br />

action setters or scene-shifters as transcript 4 shows<br />

Transcript 4- From "Gladiator" 1:42:30<br />

1. Cassius: People of Rome … on the fourth day of Antioch .. we can celebrate<br />

.. the sixty-forth day of the games<br />

2 Describer: In the crowd Maximus’ servant Cicero looks around<br />

3 Cassius: In his majestic charity the Emperor has deigned this day to favor the<br />

people of Rome with an historical final match<br />

4 Returning to the Coliseum after five years in retirement .. Caesar is<br />

pleased to bring you the only undefeated champion in Roman<br />

History .. the legendary Tigris the Gaul<br />

5 Describer The crowd stands as four galloping horses draw a chariot into the<br />

arena<br />

6 Next to the driver a Gladiator salutes the crowd<br />

7 He wears leather straps across his stocky chest and a metal helmet<br />

shaped like a tiger's head<br />

8 On one of the underground ramps leading to an arena gate,<br />

Maximus swings a short sword<br />

60


9 Proximo: He knows too well how to manipulate the mob<br />

10 Maximus: Marcus Aurelius .. had a dream that was Rome Proximo<br />

Location information working as an action setter will relate characters to each other<br />

and setting as shown in lines 2, 5, and 6. Scene shifting occurs when a complex scene<br />

contains multiple perspectives that are alternatively presented to the audience. The scene<br />

shifts the viewpoint of the audience in but does not advance the action of the movie to a<br />

new scene. In some ways, it is similar to a flashback or dream sequence that allows for a<br />

suspension of action. Line 8 from transcript 4 above shows an example of this, as the main<br />

scene is in the coliseum before a gladiator match, but attention has shifted to a quiet spot<br />

below the arena.<br />

While all location information would seem important to viewers not accessing the<br />

visual component of the text, these scene-shifting descriptions would seem especially<br />

important to allowing the viewers who cannot see the change in context to be able to<br />

comprehend the action as a complex scene (typical of film climaxes) unfolds.<br />

Reading<br />

Reading occurs when some language or recognizable symbols come on the screen<br />

and are literally read “as is” by the describer. Line 3 from transcript 1 above is an example<br />

of information being read. Reading often comes at the beginnings and endings of movies<br />

when there are credits and titles. It also appears quite frequently throughout some movies in<br />

various forms. In transcript 5, line 4 from “Gladiator,” a set of words are introduced and<br />

read to indicate the location that the movie’s action is now in.<br />

61


Transcript 5 - From "Gladiator" 47:49<br />

1 Juba: Better now? Clean you see<br />

2 Describer: Maximus lowers his lolling head back onto the wagon<br />

3 Later the caravan approaches a congested desert town<br />

4 Words appear Zucchabar Roman Province<br />

5 A crude amphitheater dwarfs the surrounding red clay buildings<br />

6 Now in a busy open air tavern an older man with a tough leathered<br />

face sits by himself swaddled in robes his head wrapped in a black<br />

turban<br />

7 He takes careful sips from a small brass cup<br />

8 Trader: Proximo my old friend<br />

Transcripts 6 and 7 from “LA Story,” also show the describer reading signs that are<br />

part of the sets rather than just screen text.<br />

Transcript 6 - From "La Story" 3:38<br />

1 Describer he rides in a park with other stationary bikers<br />

2 a sign reads "stationary bike riding park ...no running”<br />

Transcript 7 - From "LA Story" 19:00<br />

1 large white-lettered signs reading "now" hang on the wall<br />

2. blue lights bathe the hip shoppers<br />

Transcript 8 below shows a case where a sign in the movie becomes like a character<br />

(lines 7, 10) and the reading of it is like the speaking of a character. This same transcript<br />

(lines 13-15) shows that a character is reading from language that is visible in the movie.<br />

A sighted viewer who could read English would certainly understand what was being<br />

communicated in this case, but it is unclear if the same would always be true for an AD<br />

consumer without some descriptive support.<br />

62


Transcript 8 - From "LA Story" 20:36<br />

1 Describer: The car's engine dies<br />

2 He glides off the road stopping behind a digital road sign which<br />

flashes freeway clear<br />

3 Harris climbs out of Trudy's Mercedes and lifts the hood while she<br />

stays seated in the car studying the tilt of her hat in the visor's mirror<br />

4 A wind shifts the leaves of a weeping willow behind them<br />

5 Suddenly the lighted sign goes black<br />

6 Noticing the darkness Harris slowly turns around<br />

7 The sign flashes "Hi ya"<br />

8 Harris frowns and returns his attention to the car engine<br />

9 Harris whips around as the light bulbs explode<br />

10 Then miraculously regroup to spell "I said Hi ya"<br />

11 bewildered, Harris points to himself eyebrows raised skeptically<br />

12 Harris: Hi<br />

13 ruok?<br />

14 don’t make me waste letters<br />

15 R .. U .. O .. K ?<br />

16 Oh.. Are you OK? Yeah I’m Fine<br />

17 Describer: The sign says "hug me"<br />

18 Harris: What?<br />

In a manner very similar to the way that the speech of person is<br />

reported/constructed in conversation (Tannen, 1989), the information that is being read is<br />

introduced through a verb of introduction, for example “read,” “reading,” “says,” “flashes,”<br />

“appear” as shown as underlined in transcripts 6-8 above.<br />

Indexical<br />

Indexical or deictic information is information whose meaning can only be<br />

determined from context (Levinson, 1983). In conversation, words such as “here” and<br />

63


“now ” provide meanings for conversants but understanding the meanings requires<br />

understanding the place and time that the conversation is situated in. In <strong>Audio</strong> <strong>Description</strong>,<br />

a few types of indexical representation were found. In line 5 of transcript 9, the describer<br />

indicates what object the character line 4 had just mentioned. In this case, in order to<br />

recover the meaning of this piece of description, the prior dialog is required.<br />

Transcript 9 - From "A Start is Born" 1:58<br />

1 Father: Well daughter how was the movie tonight?<br />

2 Esther: Lovely<br />

2 She takes off her coat<br />

3 Boy: Mush that’s what it was just a lot of mush .. there wasn’t anybody<br />

killed in the whole thing<br />

4 Father: Oh well then I’ll stick to these.. these don’t talk<br />

5 Describer: Looking at pictures<br />

6 Boy That bog cluck Norman Main was in the picture tonight<br />

Transcript 10 shows another form of indexing where the describer indicates who the<br />

next speaker is. In line 2, the name Quintus is said by the describer and from accessing the<br />

video portion of the source text, it is clear that this statement identifies a character as the<br />

speaker.<br />

Transcript 10 - From "Gladiator" 6:10<br />

1 Describer: Across the battlefield at the edge of the forest hundreds of barbarians<br />

wave their swords<br />

2 Quintus<br />

3 Quintus: Load the catapults<br />

4 Describer: On a hill through a light snow the elderly white bearded man watches<br />

the army prepare<br />

64


Viewpoint<br />

Viewpoint representations relate to what the viewer would perceive as affecting the<br />

entire visual field or text. These include scene changes/shifts, screen and special effects. Scene<br />

changes are commonly indicated with the marker “now” or “later” as discussed above.<br />

Transcript 13 is a kind of scene shifter because at this point in the movie, a number of<br />

different screen effects were appearing in succession and so “next” indicates a change, but in<br />

this case not necessarily a formal scene change.<br />

<strong>Description</strong>s of camera effects were fairly rare in this corpus, but transcripts 11-13<br />

above each show different ways that the viewer’s total perspective can be represented in<br />

description. Another approach is reflected in the beginning of transcript 1 above when the<br />

describer says, “we are moving” and “we continue.” A description that preceded the<br />

transcript began with “we are flying” which reflected what the camera effect was like. It<br />

should be noted that the only use of “we,” (interpreted to be inclusive of the listener and<br />

describer) occurrs at the beginnings or productions.<br />

Transcript 11- From "Gladiator" 1:15:29<br />

1 Describer Now in the palace a blurred face comes into focus<br />

Transcript 12- From "Gladiator" 12:00<br />

1. Describer Surrounded by flames hundreds of men battle in a blur of muted<br />

color<br />

Transcript 13 - From "LA Story" 1:50<br />

1 Describer: Next a montage of funky LA architecture<br />

65


State<br />

<strong>Description</strong> sometimes provide information that is not visually evident but is<br />

available through the describer’s knowledge of the text. Some of the ways this happens is<br />

through providing identity or naming, providing relational information about entities that are<br />

visible, providing internal states including emotions and intention, and specifying time.<br />

Transcript 10 shows the naming of places. While in the movie “Gladiator” locations<br />

can be named with screen text as shown in transcript 5 above, in transcript 14, the location<br />

“Imperial Rome” was not named by the movie producers in this way. This information was<br />

added by the describers. Also, the buildings were not named in the movie; the describer<br />

added this information as well.<br />

Transcript 14 - From "Gladiator" 58:42<br />

1 Describer: As they look at the stands that encircle them the arena seems to spin<br />

like a carousel blurring the cheering crowd<br />

2: Now imperial Rome stretches far below<br />

3: A flock of birds soars over the circus maximus and the coliseum<br />

Transcript 15 below shows both the naming of a character and the relationship of<br />

the character to another character in the same utterance. This type of naming appeared to<br />

occur more with minor characters than main ones. Transcript 15 also shows a common<br />

revelation of time shift with “Later” that indicates it is later in the script. Because movies<br />

can contain flashbacks, when a scene changes, viewers may not always know immediately<br />

that the scene has changed. The use of “later” identifies it as a change that is farther in time.<br />

Transcript 15 - From "LA Story" 7:50<br />

1 Describer: Later in his girlfriend Trudy's apartment<br />

66


Transcript 16 and 176 show example of a character’s internal state being evaluated<br />

and described. Transcript 16 is an example of the state baldly described, while transcript 17<br />

shows it embedded in an action.<br />

Transcript 16 - From "La Story" 90:38<br />

1. Describer: Slowly conditions clear is spelled over the screen<br />

2 Content Harris smiles<br />

3 In an aerial view, other digital road signs along the highways echo the<br />

same message<br />

Transcript 17- From "La Story" 25:36<br />

1. Describer: Now a deluge of mail shoots through the letter slot in Harris' front<br />

door<br />

2. from the kitchen he irritatedly kicks wastebasket underneath the<br />

opening where it catches the streaming mail<br />

A variation of the description of a character’s internal state is by having “appears..”<br />

preceding the evaluative phrase.<br />

4.3.4 Words<br />

The words used in AD are the same as the words used elsewhere, but a subset of the<br />

lexical items normally employed in these other language systems. Because AD only allows<br />

descriptions of what is immediately available in the visual field, large amounts of normal<br />

vocabulary should never be seen in AD, or in VAD for that matter unless being read. For<br />

example, there should be no negation, modals, conditionals, past or future tense, anything<br />

that is hypothetical, or any references outside of the text at the time it is being described.<br />

Further, because the audience can include members of different disability experiences and<br />

different backgrounds, a widely accessible vocabulary is used.<br />

67


Some words in AD seem to be taking on special roles. For example, because the<br />

reference time in AD is understood to always mean the current time of the Source Text,<br />

words such as “now” and “later” can serve new functions as scene changers. Also, words<br />

such as “as” and “while” are markers of parallel action.<br />

68


5. USING THE DEFINITIONS FOR ANALYSIS<br />

The previous section provided some structural and functional definitions of <strong>Audio</strong><br />

<strong>Description</strong> (AD). These definitions and the way they are organized should be viewed as an<br />

initial and proposed framework for understanding a stream of description. This section adds<br />

another dimension, opening new territory and also tying together aspects of the two<br />

previous sections when these structural and functional definitions are presented within the<br />

context of analysis. Naturally, the analysis of any language system is a large conceptual field<br />

and this study has already addressed two significant aspects of this type of communication,<br />

so this section is an abbreviation of what it might have been if the definitions of the<br />

conceptual framework of VAD and the definition of AD existed prior to this research. In<br />

addition to connecting concepts raised in the two previous sections, it also relates the<br />

analysis of this language system to analyses used with other discourse types. In essence, this<br />

section also connects AD (and VAD) to other systems of language use.<br />

5.1 Descriptive Mass<br />

As described in section 4, the insertion is largely a collection of independent<br />

utterances, and insertions are placed mostly where the texts allows them. Because of this<br />

role, they provide a window on the total impact of AD on a text. As table 1 above<br />

illustrated, the amount of time AD takes up in texts is significant, not less that 20 percent<br />

and in one case almost 60 percent of the total span of the text. Because insertions appear<br />

where dialog and other audio cues are not, these figures can conversely be seen as<br />

representing the dialogue free portions of these texts, as the negative space, the portions that<br />

allow for and may require description. The insertions can be viewed as a quantity of<br />

69


epresentations that are distributed where the text allows. This descriptive mass could then be<br />

analyzed according to types and patterns of representations to see the impacts on the<br />

production. If, for example, this descriptive mass does not contain any descriptions of<br />

scenery or facial expressions, consumers wanting this information will probably not get it.<br />

The descriptive mass can also be analyzed in terms of how it occupies the mind of<br />

the consumer compared to the dialog and other content. Frazier (1975) described periods in<br />

a performance with what he called “low audio” periods where dialog and other clues were<br />

lacking and where different types of character, setting, or continuity information (his terms)<br />

could be inserted. He described how the insertions would provide essential information,<br />

mostly at the beginnings of scenes. This study reveals a much more pervasive use of<br />

description than Frazier presented. While Frazier’s 90-minute production, “The<br />

Autobiography of Ms. Jane Pittman,” had 34 insertions or bridges, contemporary described<br />

productions have comparatively more as table 4 illustrates, and are distributed throughtout<br />

scenes.<br />

Table 3 - Comparison of description mass in different four texts<br />

Text Length Insertions Utterances Length<br />

70<br />

Amount<br />

Described<br />

A Star is Born (1937) 111 min 382 692 37 min 20%<br />

Jane Pittman (1976) 10<br />

109 min. 34 Unknown 27min. 25%<br />

LA Story (1991) 90 min. 156 451 29 min. 32%<br />

Gladiator (2000) 148 min. 269 1391 86 min. 58%<br />

Taking these feature films as data points, a trend towards less dialog/more<br />

description is evident, the newer films have a greater need for description and the role of the


describer is increasing historically. Using these films as representative samples, not only is<br />

the amount of description increasing, but the amount of description in each insertion is<br />

increasing as well. The 1937 film had more than half of its description in short insertions of<br />

1-5 utterances in length, while the 2000 production had almost half of its description mass<br />

allocated to sequences with more than 20 utterances and several of these insertions<br />

contained over 70 utterances.<br />

Table 4- Distribution of description mass by insertion length<br />

Number of utterances per insertion<br />

21+ 16-20 11-15 6-10 1-5<br />

Star is Born (1937) 0% 13% 4% 24% 60%<br />

LA Story (1991) 21% 5% 12% 21% 42%<br />

Gladiator (2000) 47% 4% 15% 14% 20%<br />

Figure 8 - Chart version of table 4 data<br />

Allocation<br />

of description mass<br />

utterances per insertion<br />

100%<br />

80%<br />

60%<br />

40%<br />

20%<br />

0%<br />

Star is Born<br />

(1937)<br />

71<br />

LA Story<br />

(1991)<br />

Gladiator<br />

(2000)<br />

21+ 16-20 11-15 6-10 1-5


5.2 The Textual Role of Insertion Content<br />

In addition to looking broadly and quantitatively at the descriptive mass, individual<br />

insertions can be analyzed qualitatively to look at the portion of a text and the roles in the<br />

text that the insertion plays. It is clear that long descriptive sequences occur in these texts.<br />

But, are they conveying information that is supplemental or essential to the text?<br />

In several parts of this corpus, it is clear that insertions do contain essential plot<br />

information that is conveyed without dialog. For example, in “Gladiator,” there is a scene<br />

(35:50) where the Emperor’s son murders his father by suffocation. The dialog and sound<br />

effects do not make clear that this has happened and when the dead man is discovered in the<br />

following scene, there is no indication that the son is responsible. In “LA Story,” there is a<br />

restaurant scene (15:23) where the main romantic characters in the film meet and exchange a<br />

number of non-verbal signals that are described in detail in the AD. Without the AD to fill<br />

in the gaps, it is quite possible that visually impaired viewers would not have access to this<br />

information that is critical and essential to understanding the plot until much later in the<br />

production, if at all.<br />

This aspect of the textual significance of insertions has been integrated into both<br />

NTN and DVS practices, as they will review a movie using only audio cues to determine<br />

areas where the text is not clear without description (Gould, 2002, Stovall, 2002).<br />

5.3 Sample Analysis: Persistent Entity Development 11<br />

Most of the content of AD is about people and things. Many of these entities will<br />

exist over extended parts of a text. How and when information is presented about these<br />

entities, when they get named and when they are referred to as new and given, is a<br />

72


potentially important aspect to modeling the consumer experience. By whatever term is<br />

chosen, whether a text model (Wilson, 1986) or information state that has been applied to<br />

conversation (Schiffrin, 1987, Schiffrin, 1994), there exists in the mind of the consumer a set<br />

of mental representations that reflect their understanding; and one of the main sources of<br />

these representations is the contents of the streams of description in AD. The streams of<br />

description and the order that information is presented then become potentially important<br />

topics of analysis. For example, the main character in “Gladiator” is referred to by five<br />

different terms in the first fifteen minutes of the production. Table 6 shows these different<br />

terms and the location (time) in the text they appear. This type of approach seems designed<br />

to reflect the revelation of information that the authors of the text intended sighted viewers<br />

to experience because it is only after fifteen minutes that this character is referred to by<br />

name in the text.<br />

Table 5 - Referring terms for main character of "Gladiator"<br />

Time Utterance<br />

2.32 Now a warrior lifts his head and blinks as if waking from a dream<br />

3.34 Now the scruffy warrior mounts the earthworks<br />

3.44 The scruffy faced warrior general smiles and nods to his men<br />

3.59 Under a heavy mist the general makes his way through hundreds of soldiers taking<br />

positions in the mud<br />

15.51 Maximus looks away his eyes searching the field<br />

This use of different terms for the same individual, called referring sequences, (Schiffrin,<br />

1994) is also found in spoken discourse. In spoken discourse, new entities that are being<br />

introduced into a conversation are often marked with a specific introduction such as “there<br />

73


is” (Schiffrin, 1994). In AD, however, new entities are not marked this way, they are usually<br />

introduced with “a x” as “a warrior “ from utterance from 2:32 above shows. Subsequently<br />

they can be referred to definitively with “the” as illustrated in utterance at 3.34. It follows<br />

that there may be similar patterns for things and places and that the ways that they are<br />

referred to repeatedly might provide a baseline for how referents can be referred to in <strong>Audio</strong><br />

<strong>Description</strong>.<br />

5.4 Sample Analysis: The Scene as Frame<br />

While the structures of insertions and utterances are significant to understanding<br />

AD, the productions that AD makes accessible (films, television, plays) are structured<br />

according to scenes and shots. Frazier had envisioned that description would be inserted at<br />

the beginning of scenes, but this study reveals a different pattern where insertions occur<br />

throughout the scene. AD insertions occur before and during scenes. And, scene changes<br />

are often contained within an insertion. Analyzing AD in terms of scenes is a potentially<br />

useful perspective because it is through scenes that the author intended their audience to<br />

perceive the text.<br />

An important concept in understanding what a scene is experientially is the concept<br />

of a frame of experience (Goffman, 1974, Tannen, 1993a) also (Bateson, 1972). Tannen<br />

connects Goffman’s frame of experience to the concept of schema that was described in<br />

section 3. For the sighted viewer of a production, much of the information associated with a<br />

scene or frame will likely come mostly with visual cues: they see the scene has changed and<br />

that different characters are present or not. For the consumer of AD, the cues need to be<br />

74


embedded in the description. The following example from transcript 2 shows a scene<br />

change that occurs at the end of an insertion as they sometimes, but not always, do:<br />

1. Describer: He holds out his arm<br />

2 she takes it and they stroll away<br />

3 ----> Later in a bar<br />

-------- New Scene/Frame<br />

4 Danny: That’s right George there’s nothing like a little rum to take away that<br />

milk flavor<br />

5 Describer: The bartender pours two shots of rum into a glass of milk<br />

6 Later Esther and Danny are drinking the drinks<br />

Similar to the restaurant example often used to illustrate schema theory (Schank,<br />

1977), the fact that the action has been shifted into a bar allows the reference to a bartender<br />

as an existing entity (“the bartender”) even though it is the first reference to him. As a<br />

consumer is told about a scene or frame change, it may be important to provide additional<br />

cues to support the conceptual transition the consumer should make including when new<br />

characters or objects become relevant. A challenge to consumers may be when scenes<br />

change in the middle of dialog and there is not the opportunity for an insertion to indicate<br />

the scene change.<br />

The sequence of information that is important is not only that which related to<br />

persistent entities as described above, but also to frames of experience as shown here.<br />

5.5 Sample Analysis: Utterance Patterns<br />

It may be of consequence to consumers how information is presented. Just as in<br />

spoken discourse, the utterance is a unit that is perceived by the listener as a coherent<br />

75


thought 12 . The utterances in AD and their form can be analyzed to create a picture of the<br />

manner that information is being presented. One such analysis that may be useful is to look<br />

at the patterns of representation structure. Table 6 shows short sections from movies by<br />

two different describers: “A Star is Born” by Narrative Television Network (NTN) and<br />

“Gladiator” by the Descriptive Video Service (DVS).<br />

Table 6- Comparison of description styles<br />

“A Star is Born” 14:46 “Gladiator” 132:54<br />

Danny gives Mr. Randall a confused glance<br />

and smiles at Ester<br />

Underground Maximus runs through a<br />

narrow passageway<br />

She steps closer and Danny takes a step back In the yard gladiators ram a praetorian with<br />

a table<br />

Danny and Mr. Randall nod at each other Arrows pierce Hagen in the back then the<br />

chest<br />

Esther turns and runs up the stairs in tears Two soldiers stab him<br />

Danny follows her He kneels blood dripping from his mouth<br />

then falls<br />

He stops her as she begins to enter her room In the passageway Maximus turns a corner<br />

Danny gives Mr. Randall a confused glance<br />

and smiles at Ester<br />

He stops and tosses aside his torch<br />

She steps closer and Danny takes a step back Directly in front of him a stone archway<br />

leads outside<br />

The structures of the utterances reveal different approaches to describing what are<br />

essentially sequences of actions. In the NTN description, each utterance begins with the<br />

subject (representational focus) and then the action is presented. The theme of each of these<br />

utterances is the person or group performing the action (Halliday, 1985), while the DVS<br />

76


selection shows a wider range of representational combinations. The DVS style has a more<br />

varied structure with the theme or focus often different from the actor. A question for<br />

consumers might be whether this variety of representation and utterance variation is helpful<br />

and interesting or confusing and distracting.<br />

5.6 Sample Analysis: Representational Combinations<br />

In the example above showing a change of scenes, two representations were used in<br />

combination, the state representation “later” and the location representation “in a bar,” to<br />

signal the new scene. Within this corpus, a location by itself rarely started a scene; it usually<br />

shifts focus to a different viewpoint of the same scene. Also, “later” by itself, rarely starts a<br />

scene. It indicates time has shifted. While a detailed analysis of these combinations is<br />

beyond the scope of this study, this is an area that deserves more attention because these<br />

combinations may be important for the development of standards that allow a few key<br />

words to operate as markers on a number of levels in AD as they can in conversation<br />

(Schiffrin, 1987).<br />

Representational combinations may also be important because they may reflect a<br />

type of filtering or implicit encoding on the part of the describer that may or may not be<br />

optimal for the consumer. For example, co-occurrences of location and appearance<br />

representations in the same utterance were extremely rare (less than .5%) in this corpus.<br />

Either seemed to exist more easily in the same representation as an action, but not together.<br />

Is this related to the concept of consciousness having a primary and peripheral foci (Chafe,<br />

1994) and that attention to location, reduces the attention to visual appearance? Or, is it<br />

possible that this is a result of the fact that the describers are able to see the visual image of<br />

77


the location as they are describing (something consumers can not do) and so the need to<br />

describe the appearance of a place is not evident to describers when descriptions are created?<br />

5.7 Analysis Challenges: Time, Reality, and Cultural Elements<br />

There are three other areas that will be briefly discussed as analytical challenges. All<br />

three are significant and cannot properly be addressed in the scope of this study. The first<br />

topic is time, and it affects AD in a number of ways. First, it can constrain the amount of<br />

space for description and the location in the text where the descriptions can be inserted. For<br />

some productions this a major issue for describers. Another time aspect comes within an<br />

insertion when certain audio cues (ex: a glass falls) that requires part of the insertion to<br />

reflect the short audio element further constraining the arrangement of utterances. The<br />

second of these is reality, which presents other challenges for describers because while<br />

everything that is described is supposed to be in a direct and non-evaluative manner, certain<br />

visual effects are intended by the producers to reflect an imaginary or dreamlike state.<br />

Language and visual presentations have different approaches to treating reality (Leuween,<br />

2002). Time and reality intersect with flash-backs and flash-forwards that suspend the time<br />

and reality that is active in the text and present another reality and time that is a like a special<br />

frame inserted into some other text frame. Third is the cultural significance of certain<br />

scenes. As many have discussed, images of various celebrities (Barthes, 1957) and culturally<br />

recognized images can be used as independent symbols and their appearance in a cultural<br />

production is often expected by the author of a text to have specific significance (Stephens,<br />

1998). All three of these aspects of AD – time, reality, and cultural significance -- are<br />

important factors relating to the decisions that describers must make.<br />

78


6. SUMMARY: A LANGUAGE SYSTEM<br />

This study began with the assertion that the practices of using language as a<br />

substitute for visual information in electronic textual settings is a specific type of language<br />

use. This study has provided evidence from four different perspectives and in each of these<br />

perspectives, this language system, <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD) and its variant <strong>Audio</strong><br />

<strong>Description</strong> (AD) have been shown to have characteristics in common with and distinct<br />

from other uses of language.<br />

VAD was explored initially in section two as a set of practices that exist for social<br />

reasons, to provide access to visually dependent electronic texts through assistive<br />

technology. These existing social practices act as the foundation for this language system<br />

because language is socially constituted phenomena (Gee, 1999, Halliday, 1978, Scollon, 2001b).<br />

The four practice areas that were discussed in this study are all new additions to human<br />

existence, far newer than the 100,000 years and 5,000 years that speech and writing are<br />

thought to have developed respectively. Even though these practices are performed by<br />

different types of organizations and based on different methods, they serve a common set of<br />

consumers and have the same goals: to provide access to information that without the use of<br />

language would be largely inaccessible; and to provide that information using language in a<br />

way that allows the recipient the maximum opportunity to use the text in a manner that the<br />

authors of the text intended.<br />

In section three, these practices are abstracted to present a definition of this language<br />

system including role types and textual components. The common communicative<br />

properties and participant structure including a describer, consumer, and source and<br />

79


modified texts were presented and compared to other language processes. Similar to spoken<br />

discourse, the product of VAD is usually received as speech. Similar to written<br />

communication, the communication process of VAD is, from author to consumer (through<br />

describer), unidirectional. Similar to interpreted sign language/speech events, VAD requires<br />

an intermediary to enable the process. But, unlike any of these other language systems, in<br />

VAD, the source information is not language, but visual information, which is a different<br />

phenomena, and the describer, rather than just converting visual information to language, is<br />

making decisions that are affected by the multimodal texts that the visual information sits in.<br />

The describer is making additions that are like informational prosthetics to create a new text<br />

that is accessible for the consumer who will process it to create in his own mind a<br />

representation influenced by both descriptive language and residual information including<br />

dialog and auditory cues (Frazier, 1975).<br />

Section four is data driven. It takes one form of VAD, <strong>Audio</strong> <strong>Description</strong> (AD) as<br />

the subject of a descriptive study. A corpus of four films with more than 150 minutes of<br />

AD were transcribed and used in a study that looked at the structural and functional<br />

components of the streams of description. This data provides clear evidence of a type of<br />

language that is different in form and function from much of written and spoken discourse.<br />

While the words in AD are drawn from a select group of the common language, they<br />

represent a restricted set. Because the words used in AD are restricted based on tense, type<br />

and modality -- reflecting only real rather than any unreal (irrealis) states -- the language<br />

system of AD can not be considered a simply a dialog or a register (Gregory, 1978). Further,<br />

AD includes different types of representations that are not relevant to written or spoken<br />

discourse. The world that the describer describes is focused on the visual field of a movie<br />

80


(video) screen and description can cover both the surface appearance of elements inside that<br />

screen and other textual information. In fact, the majority of the AD utterances in this<br />

corpus were about actions rather than appearance; they were mostly descriptions of what<br />

was happening rather than what things look like. This bias towards action is likely connected<br />

to the type of text (movies) used in the study. The descriptions also contained information<br />

that was read, information about the production, the changes of scenes and shifts in time<br />

and focus as well as names of and relationships between visual elements.<br />

In section 5, the data collected in the study of <strong>Audio</strong> <strong>Description</strong> was used to build<br />

on the structural and functional definitions that had been established earlier by using them as<br />

the object of sample quantitative and qualitative analysis. This section shows that when AD<br />

is viewed as linguistic data, it can be understood in terms familiar to spoken discourse<br />

analysis including referring sequences (Schiffrin, 1994) and frames of experience (Tannen,<br />

1993a, Tannen, 1993b). This linguistic data for AD can be used to compare different types<br />

of description to see evidence of different choices that describers make in their creation of a<br />

text. And, it showed how critical AD is to understanding movies, especially those produced<br />

recently that contain long periods with little or no dialog and where essential textual<br />

information is often contained in long sequences with no dialog at all.<br />

As this study concludes with providing the first formal definition of the language<br />

system <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD) based on actual practices, it leaves much work to<br />

be done. VAD is today a concept. It is born as an analytical construct in this thesis, and is<br />

constituted in principles that influence how visual descriptions are created including the<br />

recognition that a description is not an isolated phenomena that relates directly to visual<br />

information, but is actually situated in a multimodal text. The practices and language that form<br />

81


the basis for this study have been addressed with evidence from interviews, published<br />

reports, research into language use and analysis, and a corpus of the productions of language<br />

drawn from one of many representative sources. This study concludes that <strong>Visual</strong> <strong>Assistive</strong><br />

<strong>Discourse</strong> – using language as a surrogate for vision: to augment another text and make it<br />

accessible to the visually impaired -- is a language system that is distinct and can be<br />

described, measured, and taught.<br />

82


7. FUTURE STEPS<br />

This study has placed VAD in a linguistic framework and opened it up to the<br />

benefits of large bodies of linguistics research. The theoretical and empirical base developed<br />

in this study can now be used as a foundation to support further research of AD and VAD<br />

in ways more consistent with established language systems. Below are some discussions of<br />

research paths that are either indicated as a result of this study and/or more practical now<br />

that this study has been completed.<br />

7.1 Consumers Study<br />

Understanding the recipients of communication is an important part of<br />

understanding the effects of any communicative process. This is an aspect that is especially<br />

challenging for VAD because there is no typical visually impaired individual and because at<br />

the time a description is created there is no knowledge of who the specific consumers will be<br />

or how they might fit into the wide range of potential combinations of age, history, and<br />

disability factors. A number of studies have been undertaken into the usability of the<br />

Internet (Gerber, 2002a, Gerber, 2001, Gerber, 2002b), the television viewing habits of the<br />

visually impaired (Kuhn, 1992b), and the benefits of <strong>Audio</strong> <strong>Description</strong> (Lovering, 1993,<br />

Packer, 1996) that consider the perspectives of consumers of described media. Additional<br />

study that builds upon this work in ways to develop a fuller profile(s) of members of the<br />

visually impaired community as users of electronic media would be helpful for both<br />

researchers and practitioners. While there is no typical member of this consumer<br />

community, perhaps there are prototypical types that could be constructed .<br />

83


This study has not specifically supported this important research area other than<br />

reinforce the need for it. Discussions with researchers and advocacy groups during this<br />

study identified many practical challenges to pursuing this line of inquiry including such<br />

basic factors as the fact that many visually impaired individuals are not members of easily<br />

identifiable groups for contact and may not use or have access to technology that would<br />

assist them (Kaye, 2000). This type of research is envisioned as best performed in concert<br />

with advocacy groups and description providers, those organizations with existing contacts<br />

to the consumer community. The methods could be ethnographic (if practical) and could be<br />

supported by surveys, interviews, and linguistic analysis of feedback and comments. It is<br />

important to recognize that the consumers of VAD are not in an empowered position with<br />

respect to VAD, and may be reluctant to offer what would seem to be criticism of a service<br />

that is so clearly important.<br />

7.2 Supporting Further Developments of <strong>Audio</strong> <strong>Description</strong><br />

<strong>Audio</strong> <strong>Description</strong> is an evolving field and is growing in use. It is now practiced by<br />

more organizations in more countries than before. This study supports further investigation<br />

into AD by providing a baseline set of definitions. During the course of this study, certain<br />

questions about AD arose including:<br />

o Are there optimal structural patterns for AD insertions/utterances?<br />

o Does the preponderance of grammaticality in these structures limit the<br />

amount of information that can be transmitted?<br />

84


o Does the quantity of time that AD consumes, usually more than any one<br />

character, present issues in terms of ear fatigue, and would multiple voicing<br />

or different descriptive approaches be useful in optimizing long sequences?<br />

o Is the style of <strong>Audio</strong> <strong>Description</strong> used in movies (and perhaps other recorded<br />

media) weak in areas such as special effects that were not relevant to the<br />

performance roots of the method?<br />

o Are there techniques from one variant of AD that would help the others<br />

such as the creation of program notes as in theatrical AD?<br />

At the time of this study, there have been active discussions regarding standards and<br />

guidelines. While these discussions are the rights and responsibilities of the participants,<br />

especially the consumers, the structures developed in this study could be useful topics for<br />

facilitating these discussions. Another type of follow-on would be to study other forms of<br />

<strong>Audio</strong> <strong>Description</strong> such as television or live performances to add to refine the taxonomy of<br />

representations this study produced and to provide a basis for comparison. There is also<br />

certainly much work to be done in furthering the descriptive study begun in this report and a<br />

more thorough investigation of action representations, referential approaches, and<br />

experiential frames seem useful paths of inquiry and paths with relevant literature from other<br />

linguistics studies.<br />

7.3 Descriptive & Comparative Studies of Other Forms of VAD<br />

The practices of providing audio textbooks and developing accessible websites are<br />

parts of VAD that have significant investments in the description process. But, they have<br />

not been studied in depth and it is unlikely that most of the organizations providing these<br />

85


services would consider what they are primarily doing as describing visual content, but that<br />

rather, it is a component (maybe a lesser one) of their efforts. Not surprisingly, the initial<br />

research into the descriptions in these areas revealed a wide range of approaches, even<br />

within the same text.<br />

Descriptive studies of textbooks and the Internet would provide significant new<br />

information into the range of representations and approaches used in VAD and would<br />

complement the description of AD done in this research. These descriptive studies could be<br />

used for comparison of style and approach to the role of the describer and would certainly<br />

yield important insights.<br />

7.4 Human Subjects Studies with AD<br />

Throughout the course of the research for this study, human subject studies to<br />

measure the effectiveness of AD, were discussed. For a variety of reasons, not the least of<br />

which is that there was no baseline definition of AD that could be used to structure such a<br />

study; this type of research was not attempted. The definitions and analyses provided here<br />

can be used to structure these types of studies to compare styles of description, the<br />

descriptive language itself, and limits and tolerances of listeners when exposed to long<br />

segments of description. There are fundamental questions that almost all describers<br />

interviewed for this study expressed about how description works inside the mind of the<br />

consumer and how those effects could be maximized.<br />

7.5 Educational Materials Study<br />

Educational materials are an especially difficult issue for the blind and visually<br />

impaired. At the time of this study, there are a number of initiatives to address educational<br />

86


materials using digital technology and technology that should allow visual information as<br />

well as language to be transmitted. These efforts include a national initiative to exchange<br />

textbook information electronically (CAST, 2002). This technology, and the human<br />

practices that include creating tactile and other alternative representations for the visual<br />

information, should also be capable of supporting visual descriptions and the range of<br />

descriptions that exist in VAD. A study of the visual description issues associated with<br />

educational material, perhaps involving a subset of the consumer study mentioned above<br />

focusing on prototypical student types, would be an important contribution to the<br />

understanding of some of the challenges that exist in making accessible instructional<br />

materials usable to more students<br />

7.6 <strong>Assistive</strong> Technological Research<br />

Currently, a number of organizations are involved in the development of assistive<br />

technology. This technology, both hardware and software will create opportunities and may<br />

impose limits on VAD consumers in the future. While using language as a replacement for<br />

images is not the only way to provide some of this information to the visually impaired, it is<br />

the only solution for certain types of texts and can be cost a effective solution for many<br />

others. An investigation into the current research and standards efforts into future assistive<br />

technology would provide an opportunity to inform them with an understanding of the<br />

linguistic possibilities of VAD. If these efforts to develop the next wave of assistive<br />

technologies were aware of and understood the linguistic dimensions of the <strong>Visual</strong> <strong>Assistive</strong><br />

<strong>Discourse</strong>, perhaps those technologies could be designed to optimize the language<br />

experience of the consumer. After all of the efforts for regulation and service delivery, and<br />

87


after all of the technology used in attempts to remove barriers and provide an equivalence of<br />

experience, these assistive practices are still fundamentally processes of human<br />

communication.<br />

88


APPENDIX A: GLOSSARY<br />

Accessibility: Refers to whether an individual with a disability can even gain<br />

minimal access to information. Does not imply that the<br />

information is in a form that is meaningful or relevant but<br />

only not unavailable due to a barrier.<br />

<strong>Audio</strong> <strong>Description</strong>: Traditionally refers to human voice description for live events<br />

and performances, including television and movies and<br />

museum exhibits. Occasionally, this term is used to mean any<br />

description done through the human voice for visually<br />

impaired and blind individuals.<br />

Consumer: A person who receives the visual description. Also called a<br />

description consumer.<br />

Describer: The person or organization/group that is responsible for the<br />

modified text and the description.<br />

Described Video: Term used by Federal <strong>Communication</strong>s Commission and<br />

others to refer to a video product that has <strong>Audio</strong> <strong>Description</strong><br />

added to it.<br />

<strong>Description</strong> Process: The events including previewing, writing, editing, and<br />

narrating that creates the described text.<br />

Descriptive Mass: Applies to the total set of descriptive content as a group.<br />

Electronic Text <strong>Description</strong>s: <strong>Description</strong>s that go on Internet sites or other digital<br />

publications. Could be rendered in Braille or synthesized<br />

speech.<br />

Experiential Equivalence: Concept that the experience that a disabled person has, while<br />

not the same as, is equivalent in fundamental ways to the<br />

experience the non-disabled enjoy.<br />

89


Insertion: This is a set of one or more utterances that are read or heard<br />

as a continuous stream by the description consumer. In<br />

<strong>Audio</strong> <strong>Description</strong>, insertions are usually positioned between<br />

actor dialog and other significant audio.<br />

Modified Text: The modified text is the one with visual description, the one<br />

that a consumer would experience.<br />

NTN: Narrative Television Network, a major describer for<br />

broadcast television.<br />

Representation: A piece of information that can be inferred from a<br />

description segment. Can be of a type and have a focus.<br />

Representation Focus: The part of the visual field or element not in the visual field<br />

that the description relates to. Can be a person place or thing<br />

or the entire visual field.<br />

Representation Type: Category of information provided in the representation. For<br />

<strong>Audio</strong> <strong>Description</strong>, seven categories were found.<br />

Restricted <strong>Description</strong>: A description that is limited in size by the text and/or the<br />

technology of the text.<br />

Source Text: The material that is being described before the description is<br />

added. This is the video/film/book/play/event that the<br />

describer sees.<br />

Text: Text in this sense means a composed body of information<br />

such as a story, play, movie, or textbook.<br />

Text Model: A term used in reading research to describe the mental image<br />

of information (propositions and concepts) contained in<br />

something that is being read, or in the case of this study<br />

listened to.<br />

90


Textually Situated: Means that the visual elements, and the descriptions for<br />

them, are not independent entities without context. The<br />

context is the text, such as a movie or an Internet site, that<br />

they are placed in, and properties of the text and how they are<br />

placed will affect their meaning.<br />

Transrepresentation: The creation of one modal artifact to represent another as in<br />

words being used for visual information.<br />

Unrestricted <strong>Description</strong>: A description that can be of any length.<br />

Utterance: A unit of description that is provided as one continuous<br />

speech unit; similar to an utterance in spoken discourse.<br />

Usually is grammatical, but might not be. Can contain one or<br />

more representations.<br />

<strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>: A term introduced in this study to mean the process of<br />

providing visual information through language across the<br />

various text types that the practice is employed in.<br />

<strong>Visual</strong> <strong>Description</strong>: An informal term to indicate either the practice of <strong>Visual</strong><br />

<strong>Assistive</strong> <strong>Discourse</strong> or an instance of description.<br />

<strong>Visual</strong> <strong>Description</strong> Practice: A term used in this study to denote the real-life practices that<br />

are currently based on a type of text or an organization.<br />

Internet textual equivalents for section 508 and <strong>Audio</strong><br />

<strong>Description</strong> are to examples of visual description practices.<br />

91


APPENDIX B: TRANSCRIPTION & MULTIMODAL ISSUES<br />

Transcription Conventions<br />

The transcription conventions used in this investigation are based upon principles<br />

and approaches used in spoken discourse analysis (Tannen, 1989). In transcripts, the<br />

following notation is used:<br />

.. Perceptible pause of less than ½ second<br />

… Perceptible pause of ½ second or more<br />

CAPS Indicates emphatic stress<br />

[ ] Around overlapping speech (extremely rare)<br />

� Arrow indicates significant points<br />

/ / Slashes indicates uncertain transcription (extremely rare)<br />

Multimodal Issues<br />

Many discourses, including face-to-face conversation, operate with concurrent<br />

communication modalities. The non-language parallel properties can include gestures,<br />

Indexical references, and environmental information. In the texts that are used with visual<br />

assistive discourse, these parallel properties are often extremely rich in information that is<br />

key to understanding the nature of the description and how the descriptive statements<br />

interact with the regular source text statements. One option for dealing with the multimodal<br />

issues would be to publish this document with rich media so that the source text could be<br />

presented in the same forum as the description and analysis. The option chosen for this<br />

study takes a more economical approach. Since all of the material used for this study is<br />

92


published, all of the transcriptions will be connected to the published work by reference to<br />

the text and time position of the sequence transcribed.<br />

Sub second timing<br />

The technology used to record and playback the films used in this study did not<br />

show times below a second. As a result, the times reported in transcriptions were reported<br />

with the time showed, which causes some loss of precision. With a corpus over 150 minutes<br />

of transcribed content, any cumulative loss of precision with the transcription of individual<br />

utterances is expected to be negligible.<br />

93


APPENDIX C: VERBAL DESCRIPTIONS FOR FIGURES<br />

Below is a table that has two columns. The first column is the name of the figure<br />

and the second column contains a descriptive text.<br />

Figure 1 –<br />

Different<br />

Practices of<br />

<strong>Visual</strong><br />

<strong>Description</strong><br />

in 2002<br />

Figure 2 -<br />

Conceptual<br />

View of<br />

Internet<br />

<strong>Description</strong>s<br />

Figure 3 -<br />

Overview of<br />

VAD and<br />

other<br />

prototypical<br />

communicati<br />

on processes<br />

This diagram has four circles/ovals to denote the four description areas of<br />

<strong>Audio</strong> <strong>Description</strong>, <strong>Audio</strong> Books, educational software/rich media, and<br />

Internet sites. <strong>Audio</strong> <strong>Description</strong> is larger than the others and has an oval<br />

where the others have circle. Inside the <strong>Audio</strong> <strong>Description</strong> oval are three<br />

circles titled Live <strong>Description</strong> 1981, Described Video Early 1970’s, and<br />

<strong>Audio</strong> Tours 1980s. The three circles to the right of the <strong>Audio</strong><br />

<strong>Description</strong> oval are titled <strong>Audio</strong> Books 1948/1971, Software &<br />

Interactive Media Late 1990s, and The Internet & MultiMedia Late 199s.<br />

This diagram is a set of nested ovals with no overlaps. The outer oval is<br />

titled Internet. Inside this are two ovals titled Multimedia Documents and<br />

Live (Simulcast) Events. The Multimedia oval is further broken down into<br />

four sub ovals titled Text and Hyper Text, Still Images, Moving Images,<br />

and Interactive Elements.<br />

This diagram contains four sub diagrams each with its own label.<br />

The first sub-diagram is labeled Face-to-face conversation and it has two<br />

circles both labeled conversant with a box in between labeled coconstructed<br />

text. Two-directional arrows connect each conversant to the<br />

text.<br />

The second sub-diagram is labeled Interactional Sign Language<br />

Interpretation and this features three circles labeled deaf conversant,<br />

hearing conversant, and interpreter. A box labeled hearing text is in<br />

between the hearing conversant and the interpreter and it is connected<br />

with bi-directional arrows to both. A box labeled signed text connects the<br />

deaf conversant and interpreter and is connected to both with bidirectional<br />

arrows. A third is in between all three and is labeled visual text<br />

and is connected to the interpreter, deaf conversant, and hearing<br />

conversant with bi-directional arrows.<br />

The third sub-diagram is labeled written communication and contains two<br />

circles labeled author and reader. In between is a box labeled composed<br />

text and one-way arrows go from the author to the text and then from the<br />

text to the reader.<br />

The fourth sub-diagram is labeled <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> and contains<br />

three circles labeled author, describer, and consumer. A box labeled source<br />

text is connected with a one-way arrow from the author. The source text<br />

has two arrows leading from it. One goes to the describer and one goes to<br />

another box labeled modified text. From the describer is an arrow and a<br />

94


Figure 4 -<br />

Chafe's View<br />

of Immediate<br />

Mode<br />

Figure 5 -<br />

View of a the<br />

description<br />

process<br />

Figure 6 -<br />

Conceptual<br />

diagram of<br />

consumer's<br />

process<br />

Figure 7 -<br />

Length of<br />

utterances in<br />

corpus<br />

box labeled insertions that is connected to the modified text by a one-way<br />

arrow into the modified text. A one-way arrow goes from the modified<br />

text to the consumer.<br />

This diagram has three boxed. A small one at the top is labeled<br />

environment. From it an arrow leads to a large box labeled<br />

EXTROVERTED CONSCIOUSNESS. Inside this box are two labels:<br />

represented and representing. The arrow has labels for perceiving, acting,<br />

and evaluating next to it. From the large box is an arrow to a small box<br />

labeled language. Next to this arrow is the label speaking. At the bottom<br />

is the label “speaking in the immediate mode.”<br />

This diagram is an expansion of the part of figure 3 dealing with the<br />

describer and is labeled team-based <strong>Description</strong> Process. It has three solid<br />

circles labeled Text Producer, Support Team, and Describers, and four<br />

boxes labeled Source Text, Working Documents, Manuals & Style Guides,<br />

and Modified Text. There is a large dotted circle that enclosed the<br />

Describers, Support Team, Working Documents, and Manuals & Style<br />

Guides boxes. The Source Text box is half inside and half outside the<br />

dotted line.<br />

A one-way arrow goes from Text producer to Source Text and a two-way<br />

arrow goes from text Producer to Support Team. A One-way arrow goes<br />

from Support Team to Working Docs and then to Describers. A one-way<br />

arrow goes from Source Text to Modified Text another to and to<br />

Describers. The describer has arrow coming from Source Text, Working<br />

Docs, and Manuals & Style Guides, and a double arrow coming<br />

connecting to support team. A one-way arrow goes from Describers to<br />

Modified Text.<br />

This diagram is conceptually similar to figure 5, but it focuses on the<br />

consumer.<br />

A circle labeled consumer is surrounded by four boxes labeled Text Model,<br />

Personal History, World Knowledge, and Purpose and Goals that sit<br />

across a dotted outer circle. Outside this outer circle is a box labeled<br />

Modified Text. An arrow comes in from the modified text to the<br />

Consumer. Arrows also come in to consumer from Personal History,<br />

World Knowledge, and Purpose and Goals and a double arrow connects<br />

Consumer to Text Model.<br />

This bar chart contains the following data:<br />

Length of utterances in seconds<br />

Duration 0-.99 1-2 3-4 5-6 7-8<br />

Percentage 7.98% 59.56% 32.46% 8.15% 1.56% 0.25%<br />

95


Figure 8 -<br />

Chart version<br />

of table 3 data<br />

Percentage 7.98% 59.56% 32.46% 8.15% 1.56% 0.25%<br />

Number 195 1455 793 199 38 6<br />

This bar chart contains the following data:<br />

Number of Utterances in Insertion<br />

1-5 6-10 11-15 16-20 21+<br />

Star is Born (1937) 60% 24% 4% 13% 0%<br />

LA Story (1991) 42% 21% 12% 5% 21%<br />

Gladiator (2001) 20% 14% 15% 4% 47%<br />

(Campbell Et.Al, 1934) (Martin/DVS, 1991) (Franzoni, 2000) (NPS/Ear,<br />

2000)<br />

96


REFERENCES<br />

Access Board, United States Government Architectural and Transportation Barriers<br />

Compliance Board. 2001. Electronic and Information Technology Accessibility<br />

Standards. In Section 508 of the Rehabilitation Act Amendments of 1998. Washington DC:<br />

Architectural and Transportation Barriers Compliance Board.<br />

AFB. 1991. A Picture is Worth a Thousand Words For Blind and <strong>Visual</strong>ly Impaired Persons<br />

Too: An Introduction to <strong>Audio</strong>description. New York: American Foundation for the<br />

Blind.<br />

AFB. 2000. Education: An Overview. New York: American Foundation for the Blind.<br />

AFB. 2001a. Quick Facts and Figures on Blindness and Low Vision. New York: American<br />

Foundation for the Blind.<br />

AFB. 2001b. Statistics for Professionals: American Foundation for the Blind.<br />

Alonzo, Adam. 2001. A Picture is Worth 300 Words: Writing <strong>Visual</strong> <strong>Description</strong>s for an Art<br />

Museum Web Site. Paper presented at Center On Disabilities: Technology And Persons<br />

With Disabilities Conference 2001, Northridge.<br />

Artic Technologies, Inc. 2002. What is a Speech Friendly Site?: Artic.<br />

ASTC. 2001. Best Practices<br />

<strong>Audio</strong> <strong>Description</strong>: Association of Science-Technology Centers Incorporated.<br />

Baquis, David. 2002. Meetings and email conversations, with Phil Piety. Washington, DC.<br />

Barthes, Roland. 1957. Garbo's Face.<br />

Bateson, Gregory. 1972. Steps to an Ecology of Mind. Chicago: University of Chicago Press.<br />

Board, Access. 2001a. Web-based Intranet and Internet Information and Applications<br />

(1194.22): United States Access Board.<br />

Board, United States Government Architectural and Transportation Barriers Compliance.<br />

2001b. Electronic and Information Technology Accessibility Standards. In Section<br />

508 of the Rehabilitation Act Amendments of 1998. Washington DC: Architectural and<br />

Transportation Barriers Compliance Board.<br />

Burnham, Betsy. 2002. Email regarding APH descriptions for visual information, with Philip<br />

Piety. Washington, DC.<br />

97


Campbell Et.Al, Narrative TV Network. 1934. A Star is Born, Described by Narrative TV<br />

Network, ed. William A. Wellman.<br />

Carpenter, Patricia A; Marcel Adam Just. 1986. Cognitive Processes in Reading. In Reading<br />

Comprehension: From Theory to Practice, ed. J Orasanu. Hillsdale, NJ: Lawrence Erlbaum.<br />

CAST. 2002. The National File Format Initiative at NCAC: Center for Applied Special<br />

technology.<br />

Chafe, Wallace. 1994. <strong>Discourse</strong>, Consciousness, and Time. Chicago: University of Chicago Press.<br />

Corn, Anne L.; Wall, Robert S. 2002. Access to Multimedia Presentations for Students with<br />

<strong>Visual</strong> Impairments. Journal of <strong>Visual</strong> Impairment and Blindness 96:197.<br />

Dwyer, Francis M. 1978. Strategies for Improved <strong>Visual</strong> Learning: A Handbook for the Effective<br />

Design, and Use of <strong>Visual</strong>ized Materials. State College, Pennsylvania: Learning Services.<br />

Elbers, Loekie; Loon-Vervoorn, Anita van. 1999. Lexical Relationships in Children Who Are<br />

Blind. Journal of <strong>Visual</strong> Impairment and Blindness 93:419.<br />

Franzoni, David et al/DVS. 2000. Gladiator, ed. Ridley Scott.<br />

Frazier, Gregory MA. 1975. The Autobiography of Miss Jane Pitman: An all-audio<br />

adaptation of the teleplay for the blind and visually handicapped, Film and<br />

<strong>Communication</strong>, San Francisco State University: Masters.<br />

Gamma, Erich and Richard Helm, Ralph Johnson, John Vlissedes. 1995. Design Patterns:<br />

Elements of Reusable Object-Oriented Software. New York: Addison-Wesley Longman, Inc.<br />

Gee, James Paul. 1999. An Introduction to <strong>Discourse</strong> Analysis Theory and Method. New York:<br />

Routledge.<br />

Gerber, Elaine Ph.D. 2002a. Surfing by Ear: Usability Concerns of Computer Users Who<br />

Are Blind or <strong>Visual</strong>ly Impaired. In Access World.<br />

Gerber, Elaine and Connie Kirchner. 2001. Who's Surfing? Internet Access and Computer<br />

Use by <strong>Visual</strong>ly Impaired Youths and Adults. New York City: American Federation<br />

of the Blind.<br />

Gerber, Elaine Ph.D. 2002b. Conducting Usability Testing With Computer Users Who Are<br />

Blind or <strong>Visual</strong>ly Impaired. Paper presented at 17th Annual International Conference of<br />

California State University Northridge (CSUN) "Technology and Persons with Disabilities",<br />

March 18-23, 200, New York.<br />

Goffman, Erving. 1963. Behavior in Public Places: Notes on Social Organization of Gatherings. New<br />

York: The Free Press.<br />

98


Goffman, Erving. 1974. Frame Analysis: An Essay on the Organization of Experience. Cambridge,<br />

Massachusetts: Harvard University Press.<br />

Goldberg, Larry. 2002. Email communication, with Phil Piety.<br />

Gould, Bryan. 2002. Conversation at WGBH, DVS, with Phil Piety. Boston, MA.<br />

Gregory, Michael & Susanne Carroll. 1978. Language and Situation: Language and Society.<br />

Boston: Routledge & Kegan Paul.<br />

Haberlandt, Karl. 1988. Component Processes in Reading Comprehension. In Reading<br />

Research: Advances in Theory and Practice, ed. M. Daneman. San Diego: Academic Press.<br />

Halliday, M.A.K. 1978. Language as a social semiotic. Baltimore, Maryland: University Park<br />

Press.<br />

Halliday, M.A.K. 1985. An Introduction to Functional Grammar. New York: Arnold.<br />

Hardy, Steven Thomas. 2000. Vygotsky's Contributions to Mentally Healthy Deaf Adults.<br />

Washington, DC: Gallaudet University.<br />

Harris, Helen. 2000. Reply Comments of Helen Harris, ed. Federal <strong>Communication</strong>s<br />

Commission. Washington DC.<br />

Harris, Zelig. 1951. Methods in Structural Linguistics. Chicago: University of Chicago Press.<br />

Hatim, Basil. 1997. <strong>Communication</strong> Across Cultures: Translation Theory and Contrastive Text<br />

Linguistics: Exeter Linguistics Studies. Exeter.<br />

Holsánová, Jana. 2001. Picture Viewing and Picture <strong>Description</strong>: Two Windows to the Mind. Lund,<br />

Sweden: Lund University Cognitive Science.<br />

Iedema, Rick. 2003. Multimodality, resmiotization: extending the analysis of discourse as<br />

multi-semiotic practice. <strong>Visual</strong> <strong>Communication</strong> 2:29-57.<br />

Kaye, H. Stephen. 2000. Disability and the Digital Divide. Washington, DC: U.S.<br />

Department of Education.<br />

Kerscher, George. 2001a. Converging Standards in Electronic Books: The Daisy<br />

Consortium.<br />

Kerscher, George. 2001b. Theory Behind the DTBook DTD: The Daisy Consortium.<br />

Knowlton, Marie and Robin Wetzel. 1996. Braille Reading Rates as a Function of Reading<br />

Task. Journal of <strong>Visual</strong> Impairment and Blindness.<br />

99


Kress, Gunther & Theo Van Leeuwen. 2001. Multimodal <strong>Discourse</strong>: The Modes and Media of<br />

Contemporary <strong>Communication</strong>. New York: Arnold/Oxford University Press.<br />

Kress, Gunther, Theo van Leeuwen. 1996. Reading Images; The Grammar of <strong>Visual</strong> Design:<br />

Routledge.<br />

Kuhn, David. 1992a. The Use of Descriptive Video in Science Programming. Boston:<br />

WGBH Educational Foundation.<br />

Kuhn, David; Corinne Kirchner. 1992b. Viewing Habits and Interests in Science<br />

Programming of the Blind and <strong>Visual</strong>ly Impaired Television Audience. New<br />

York/Boston: American Foundation for the Blind, WGBH Educational Foundation.<br />

Lemke, Jay. 2002. Travels in Hypermodality. <strong>Visual</strong> <strong>Communication</strong> 1:299-325.<br />

Lessig, Lawrence. 1999. CODE and Other Laws of Cyberspace. New York: Basic Books.<br />

Leuween, Theo Van. 2002. Ten reasons why linguists should pay attention to visual<br />

communication. Paper presented at Georgetown University Roundtable, Georgetown<br />

University.<br />

Levie, W. Howard and Richard Lentz. 1982. Effects of Text Illustrations: A Review of<br />

Research. Educational <strong>Communication</strong> and Technology Journal 30.<br />

Levine, Barry. 2002. Digest Number 267: <strong>Audio</strong> <strong>Description</strong> International.<br />

Levinson, Stephen. 1983. Pragmatics: Cambridge Textbooks in Linguistics. Cambridge,<br />

England: Cambridge University Press.<br />

Lovering, Sharon. 1993. Video <strong>Description</strong> Brings Enjoyment to All. In The Braille Forum:<br />

American Council of the Blind.<br />

Lucas, Ceil. 1989. The Sociolinguistics of the Deaf Community: Academic Press, Inc.<br />

Martin/DVS, Steve. 1991. LA Story, ed. Mick Jackson: WGBH Descriptive Video Service.<br />

Metzger, Melanie. 1999. Sign Language Interpreting: Deconstructing the Myth of Neutrality.<br />

Washington, DC: Gallaudet University Press.<br />

Miller, Lori. 2002. Digest Number 267: <strong>Audio</strong> <strong>Description</strong> International.<br />

NCAM. 2002. Access to Rich Media: WGBH.<br />

NFB. 2000. Blindness Statistics: National Federation of the Blind.<br />

100


Norris, Sigrid. 2002. Multimodal <strong>Discourse</strong> Analysis: A Conceptual Framework. Paper<br />

presented at Georgetown University Round Table on Language and Linguistics, Georgetown<br />

University.<br />

NPS/Ear, National Park Service/Metropolitan Washington Ear. 2000. The Gift of Acadia:<br />

National Park Service.<br />

O'Grady, William and John Archibald, Mark Aronoff, Janie Rees-Miller. 2001. Contemporary<br />

Linguistics. New York: Bedford/St. Martins.<br />

Packer, Jaclyn PdD. 1996. Video <strong>Description</strong> in North America. In New Technologies in the<br />

Education of the <strong>Visual</strong>ly Handicapped, ed. Dominique Berger: John Libbey Eurotext.<br />

Packer, Jaclyn Ph.D. & Corinne Kirchner, Ph.D. 1997a. Who's Watching? A Profile of the<br />

Blind and <strong>Visual</strong>ly Impaired Audience for Television and Video. Journal of <strong>Visual</strong><br />

Impairment and Blindness.<br />

Packer, Jaclyn PhD & Barbara Gutierrez MA, Corrine Kirchner PhD. 1997b. Origins,<br />

Organizations, and Issues in Video <strong>Description</strong>: Results from In-depth Interviews<br />

with Major Players. New York: American Foundation for the Blind.<br />

Perfetti, Charles A. 1988. Verbal Efficiency in Reading Ability. In Reading Research: Advances in<br />

Theory and Practice, ed. M. Daneman. San Diego: Academic Press.<br />

Pfanstiehl, Cody. 2002a. Email <strong>Communication</strong>, with Phil Piety.<br />

Pfanstiehl, Margaret. 2002b. Founder, Washington Metropolitan Ear, with Phil Piety. Silver<br />

Spring, MD 20901.<br />

Pfanstiehl, Margaret R. Ed.D, and Cody. 1984. Unpublished Training Materials. Ms. Silver<br />

Spring, MD 20901.<br />

Pfanstiehl, Margaret R. EdD. 2002c. Discussions Regarding <strong>Audio</strong> <strong>Description</strong>, with Phil<br />

Piety.<br />

Phansteihl, Margaret. 2002. Discussions Regarding <strong>Audio</strong> <strong>Description</strong>, with Phil Piety.<br />

Piety, Philip. 2001. Thamus and Theuth are Dead: The impacts of digital communications on<br />

types of communication (Unpublished research paper), 31. Washington DC:<br />

Georgetown University.<br />

Raman, TV. 1994. <strong>Audio</strong> System for Technical Readings, Computer Science, Cornell<br />

University: PhD.<br />

RFB&D. 2001a. Annual Report. Princeton, New Jersey.<br />

101


RFB&D. 2001b. Recording for the Blind & Dyslexic Annual Report 2001: Recording for the<br />

Blind & Dyslexic.<br />

Rosch, Eleanor. 1978. Principles of categorization. In Cognition and Categorization, ed. Eleanor<br />

Rosch. Hillsdale, N.J.: Erlbaum Associates.<br />

Schank, Roger C. and Robert Abelson. 1977. Scripts, plans, goals, and understanding:An inquiry<br />

into human knowledge structures. Hillsdale, NJ: Erlbaum.<br />

Schiffrin, Deborah. 1987. <strong>Discourse</strong> Markers: Studies in Interactional Sociolinguistics.<br />

Cambridge UK: cambridge University Press.<br />

Schiffrin, Deborah. 1994. Approaches to <strong>Discourse</strong>: Blackwell Textbooks in Linguistics. Malden<br />

MA: Blackwell Publishers.<br />

Schroeder, Fredric K. 1994. Braille Usage: Perspectives of Legally Blind Adults and Policy<br />

Implications for School Administrators, University of New Mexico.<br />

Scollon, Ron. 2001a. Mediated <strong>Discourse</strong>: The Nexus of Practice. new York: Routledge.<br />

Scollon, Ron and Suzanne Wong Scollon. 2001b. Intercultural <strong>Communication</strong>: A <strong>Discourse</strong><br />

Approach: Language in Society. Oxford: Blackwell.<br />

Simpson, John. 2001. Improved TV Access for Blind Viewers in the Digital Era. Paper<br />

presented at Radio, Television, and New Media, Canberra, Australia.<br />

Slatin, John PhD. 2002. A Review Of: "Beyond Alt Text: Making the Web Easy to Use for<br />

Users with Disabilities". Information Technology and Disabilities 8.<br />

Slatin, John PhD & Sharron Rush. 2001. Maximum Accessibility: Making Your Web Site More<br />

Usable for Everyone: Web Design. Boston, MA: Addison-Wesley.<br />

Smith, Chris. 2002. Personal <strong>Communication</strong>: Meeting @ RFB&D, with Phil Piety. Boston,<br />

MA.<br />

Snyder, Joel. 2002a. Discussion, with Phil Piety. McLean, VA.<br />

Snyder, Joel. 2002b. Fundamentals of <strong>Audio</strong> <strong>Description</strong>: <strong>Audio</strong> <strong>Description</strong> Associates.<br />

Stephens, Mitchell. 1998. The rise of the image the fall of the word. Oxford: Oxford University<br />

Press.<br />

Stokoe, William. 1965. A Dictionary of American Sign Language on Linguistic Principles.<br />

Cambridge: Cambridge University Press.<br />

Stovall, Jim. 2002. Conversation, with Phil Piety.<br />

102


Tannen, Deborah. 1981. Introduction. Paper presented at Georgetown University Round Table<br />

1981: <strong>Discourse</strong> Analysis, Georgetown University.<br />

Tannen, Deborah. 1989. Talking Voices: Repetition dialog and imagery in conversational discourse.vol.<br />

6: Studies in Interactional Sociolinguistics. New York: Cambridge University Press.<br />

Tannen, Deborah. 1993a. What's In a Frame?: Surface Evidence for Underlying<br />

Expectations. In Framing in <strong>Discourse</strong>, ed. Deborah Tannen. New York.<br />

Tannen, Deborah and Cynthia Wallat. 1993b. Interactive Frames and Knowledge Schemas<br />

in Interaction. In Framing in <strong>Discourse</strong>, ed. Deborah Tannen. New York.<br />

Townsend, David & Caroline Carrithers, Thomas Bever. 1987. Listening and Reading<br />

Processes in College and Middle School-Age Readers. In Comprehending Oral and<br />

Written Languages, ed. R. Horowitz & J. L. Samuel. New York: Academic Press.<br />

Valli, Ceil Lucas & Clayton. 2001. Sociolinguistic Variation in ASL. Washington DC: Gallaudet<br />

University Press.<br />

Vollmer, Judy. 2002. Personal <strong>Communication</strong>: Meeting, with Phil Piety. Boston, MA.<br />

Vygotsky, Lev. 1934. Thought and Language. Boston, MA: MIT Press.<br />

W3C, World Wide Web Consortium. 1999. Web Content Accessibility Guidelines 1.0.<br />

Wall, Robert S.; Corn, Anne L. 2002. Production of Textbooks and Instructional Materials in<br />

the United States. Journal of <strong>Visual</strong> Impairment and Blindness 96:212, 211.<br />

Warren, David H. 1994. Blindness and Children: An Individual and Differences Approach.<br />

Melbourne Australia: Cambridge University Press.<br />

Weber, John. 2002. NPR Radio Producer, <strong>Audio</strong> <strong>Description</strong> Volunteer Washington Ear,<br />

with Phil Piety. Washington DC.<br />

Wilson, Paul T. and Richard Anderson. 1986. What They Don't Know Will Hurt Them: The<br />

Role of Prior Knowledge in Comprehension. In Reading Comprehension: From Theory to<br />

Practice, ed. J Orasanu. Hillsdale, NJ: Lawrence Erlbaum.<br />

Wlodkowski, Tom. 2002. Access to Convergent Media: Barriers to Convergent Media for<br />

Individuals Who Are Blind or Have Low Vision, 4. Boston: National Center For<br />

Accessible Media (NCAM).<br />

Wyver, Shirly R. and Rosaliyn Markham, and Sonia Hlavacek. 2000. Inferences and Word<br />

Associations of Children with <strong>Visual</strong> Impairments. Journal of <strong>Visual</strong> Impairment and<br />

Blindness:204-217.<br />

103


NOTES<br />

1 Statistics on blindness and visual impairment are a challenge because the condition<br />

often co-occurs with other conditions that might be used to characterize an<br />

individual such as diabetes or mental retardation.<br />

2 This is not to suggest that there are no differences that are related to language. There<br />

have been studies showing that there are substantial differences in development of<br />

concepts and prototypes.<br />

3 The Motion Picture Association of America (MPAA) and others recently challenged<br />

this ruling. The challenge was upheld on technical grounds and an appeal is in<br />

process at the time of this writing.<br />

4 Not all images are described. Prior to recording, someone marks up the text and may<br />

decide to exclude certain images.<br />

5 The act was originally passed in 1973. The 1998 amendment brought in section 508.<br />

6 A 1992 report, “The Use of Descriptive Video in Science Programming,” revealed<br />

indications of benefit Kuhn, David. 1992a. The Use of Descriptive Video in<br />

Science Programming. Boston: WGBH Educational Foundation., but was this<br />

researcher was not able to see an experimental method that could yield measured<br />

results.<br />

7 I am simplifying the characterizations of spoken and written text for the purposes of<br />

comparison.<br />

8 I should be clear that this is an area where I have extremely sketchy information and<br />

may run counter to concept of inclusion within the same speaking community as<br />

sighted individuals. Scollon & Scollon, for example, describe four different<br />

definitions of culture do not include perceptive information.<br />

9 The technology used in transcribing the movies represented time in terms of seconds<br />

and not lower so the determination of gaps and lengths was approximate.<br />

10 This data was recovered from Frazier’s thesis and so it is based on the timings he<br />

presented and not on transcripts as the other films are.<br />

11 This term is not intended to create a direct reference to the a similarly named<br />

concept in software technology although one could imagine a distant connection to<br />

104


technology in the future. The term is used here to represent the concept of real<br />

people places and things that exist over extended periods of time in a text.<br />

12 I am certainly simplifying this issue and basing this aspect of my work on the<br />

definition of Utterance in Schiffrin 1987. This is a convenience for both author and<br />

reader and it is expected that other approaches to spoken discourse analysis that<br />

focus on the unit of production analogous to the utterance could also be applied in<br />

this area.<br />

105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!