Audio Description a Visual Assistive Discourse - Communication ...
Audio Description a Visual Assistive Discourse - Communication ...
Audio Description a Visual Assistive Discourse - Communication ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
AUDIO DESCRIPTION, A VISUAL ASSISTIVE DISCOURSE:<br />
An Investigation into Language Used to<br />
Provide the <strong>Visual</strong>ly Disabled<br />
Access to Information in Electronic Texts<br />
A Thesis<br />
Submitted to the Faculty of the<br />
Graduate School of Arts and Sciences<br />
of Georgetown University<br />
in partial fulfillment of the requirements for the<br />
degree of<br />
Master of Arts<br />
in <strong>Communication</strong>, Culture and Technology<br />
By<br />
Philip J Piety, B.S.<br />
Washington, DC<br />
February 24, 2003
Copyright 2003 By Philip Piety<br />
All Rights Reserved<br />
ii
ABSTRACT<br />
<strong>Visual</strong>ly impaired and blind individuals face challenges in accessing many types of<br />
texts including television, films, textbooks, software, and the Internet because of the rich<br />
visual nature of these media. In order to provide these individuals with access to this visual<br />
information, special assistive technology allows descriptive language to be inserted into the<br />
text to represent the visual content. This study investigates this descriptive language. It<br />
looks at it as a system of human communication and investigates the process of creating<br />
descriptions that includes an intermediary called a describer and the modifications they make<br />
to a text in order to render it accessible and usable. There are different practices of creating<br />
these descriptive insertions and many terms refer to them including <strong>Audio</strong> <strong>Description</strong>,<br />
described video, and textual and verbal equivalents. This study considers these practices as<br />
variants of a type of communication called <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> that has specific and<br />
definable properties. This study is the first academic investigation into the language process<br />
since the practice was conceptually described as a technique for television by Frazier in 1975.<br />
It addresses broad questions about this unique communication form. Who does it? Why it<br />
is unique? What does it look like? And, how can it be analyzed?<br />
The approach taken is structured as a study within a study. The outer study looks at<br />
the theoretical issues of using language as a visual prosthetic and shows it having properties<br />
in common with both prototypical spoken and written discourse as well as communication<br />
like sign language interpretation that relies upon an intermediary. The inner study uses a<br />
corpus of more than 23,000 words of <strong>Audio</strong> <strong>Description</strong> drawn from four movies described<br />
by three organizations proficient in the practice of describing film. Analysis of this data<br />
i
shows it to be a language system with distinctive constituent and discursive structures. This<br />
study shows that the fundamental nature of a unit of an inserted description is not an<br />
isolated representation of isolated visual information, but rather, a semantic unit that is<br />
situated in several definable ways within a multimodal text.<br />
ii
ACKNOWLEDGMENTS<br />
Producing this work has allowed me to learn about language, research, and some of<br />
the challenges that accrue to the study of nontraditional subjects. Many people both inside<br />
and outside the University have contributed to this effort, and I am forever grateful to them.<br />
Within Georgetown University, I must first mention and thank my thesis advisor,<br />
Professor Randy Bass, whose help was invaluable in connecting my research interests to the<br />
completion of this process. Next, I am thankful for knowing and benefiting from the advice<br />
of Professor Ron Scollon; my second reader and an informal advisor almost since the time I<br />
considered this as a potential topic. Ron has provided constant encouragement and support<br />
and, as the process was nearing completion, I found myself recollecting many things he said<br />
about the choices that scholars make that directly related to my work. I am also grateful to<br />
Professors Jay Lemke and John Castellani of the University of Michigan and Johns Hopkins<br />
University Center for Technology in Education. Professor Castellani was the teacher who<br />
exposed me to this area as an educational challenge and Professor Lemke provided<br />
extremely valuable feedback on an early draft of this document.<br />
Also within Georgetown, Professors Shukla, Tannen, Tinkcom, and Tyler were<br />
gracious in spending time discussing the challenges of this topic and provided me with<br />
encouragement and kind thoughts. I also appreciate the many email exchanges with Daniel<br />
Loehr and Kristin Mulrooney and the early discussions with Elisa Everts: all doctoral<br />
students at Georgetown. I also thank Professors Hamilton and Schiffrin whose classes at<br />
Georgetown were memorable, enriching, and relevant to my work. And, of course, Dr.<br />
iii
Suzanne Wong Scollon who gave me the nudge in this direction and then advised me<br />
(correctly) of some of the challenges I would encounter in studying this as a language topic.<br />
Outside of academia, the first person to thank is someone who everyone involved<br />
with the field of <strong>Audio</strong> <strong>Description</strong> owes much to. Nobody, with the exception of her<br />
husband Cody Pfanstiehl, has made a larger contribution to the method and the advocacy of<br />
this emerging field than Dr. Margaret Pfanstiehl. The Pfanstiehls discussed their work many<br />
times, provided documents that relate to their methods, and referred me to others whose<br />
input was essential. My path was made smoother by being able to say, “the Pfanstiehls<br />
suggested I call.” Among those who responded were Joel Snyder, Director of Described<br />
Media with the National Captioning Institute, himself a major contributor to the field of<br />
<strong>Audio</strong> <strong>Description</strong>, and Larry Goldberg, Director of Media Access at the WGBH<br />
Educational Foundation who was helpful as was his staff, including Theresa Maggiore and<br />
Bryan Gould. I also thank Jim Stovall, founder of the Narrative Television Network who<br />
discussed in vivid detail how a person uses <strong>Audio</strong> <strong>Description</strong>.<br />
Within the advocacy community, I owe much to the American Foundation for the<br />
Blind and the many supportive phone conversations with Dr. Elaine Gerber; and the<br />
support of Drs. Jaclyn Packer, and Corrine Kirchner. And, I also thank Curtis Chong from<br />
the National Federation for the Blind who was the first person in the visually impaired<br />
advocacy community that spoke with me about my research and Melanie Brunson of the<br />
American Council of the Blind.<br />
At the Recordings for the Blind and Dyslexic, I am grateful to Judy Vollmer and the<br />
staff in the Boston studio; and Chris Smith in Washington DC who spent many hours with<br />
me and provided me with equipment, tapes, and insights into their process.<br />
iv
From the Center for Applied Special Technology, I appreciate the thoughts, time,<br />
and encouragement of Dr. Robert Dolan. And, from the United States Access Board, both<br />
David Baquis and Doug Wakefield were a tremendous help and spent a good deal of time as<br />
did Dr. Judith Dixon from the Library of Congress. I also thank John Weber, a good<br />
personal friend, National Public Radio producer, and volunteer audio describer who is one<br />
the people who sparked my interest in this area. Bob Regan, Product Manager for<br />
Macromedia Corporation and student with the Trace Center at the University of Wisconsin<br />
at Madison was also helpful as I began this investigation.<br />
My largest thanks must go to the person I dedicate this work to. My wife Sarah has<br />
encouraged me to follow my intellectual pursuits for their value alone. From the beginning<br />
of this process, she has served as a constant reminder in times of doubt that my work is<br />
worthwhile.<br />
v
TABLE OF CONTENTS<br />
1. INTRODUCTION......................................................................................................1<br />
1.1 Overview............................................................................................................. 1<br />
1.2 Research Goals & Questions............................................................................... 3<br />
1.3 Anticipated Benefits and Limitations of this Research....................................... 5<br />
1.4 Organization of the Sections............................................................................... 7<br />
1.5 Note for <strong>Visual</strong>ly Impaired Readers ................................................................... 7<br />
2. BACKGROUND INFORMATION...........................................................................8<br />
2.1 The Primary Consumers of <strong>Visual</strong> <strong>Description</strong>.................................................. 8<br />
2.2 Practices of Describing Images for <strong>Assistive</strong> Technology ............................... 10<br />
2.2.1 <strong>Audio</strong> <strong>Description</strong>..................................................................................... 10<br />
2.2.2 <strong>Audio</strong> Books ............................................................................................. 14<br />
2.2.3 Software and Interactive Media ................................................................ 15<br />
2.2.4 Multimedia and Internet Sites................................................................... 16<br />
2.3 The Goal of Accessible Media: Usability and Experiential Equivalence......... 18<br />
2.4 Previous Qualitative Evaluations of <strong>Description</strong>.............................................. 19<br />
2.5 The Practical Case for a Unified Model............................................................ 20<br />
2.5.1 Reasons to Consider as Separate Practices ............................................... 21<br />
2.5.2 Reasons to Consider as a Single Process .................................................. 21<br />
3. VISUAL ASSISTIVE DISCOURSE.......................................................................24<br />
3.1 <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> Defined.................................................................. 24<br />
3.1.1 The “<strong>Discourse</strong>” in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>........................................ 24<br />
3.1.2 Common Properties of Different Descriptive Practices............................ 25<br />
3.1.3 Comparison of VAD and Other <strong>Communication</strong> Systems ....................... 26<br />
3.1.4 The Components of the VAD <strong>Communication</strong> System............................ 28<br />
3.2 Conceptual Issues with Words for Images........................................................ 29<br />
3.2.1 Sequential vs. Parallel............................................................................... 30<br />
3.2.2 Raw vs. Processed Information................................................................. 30<br />
3.2.3 Schema Theory......................................................................................... 31<br />
3.2.4 Translation-Interpretation-Transrepresentation........................................ 32<br />
3.3 <strong>Description</strong>s: Situated/Constrained in Multimodal Texts................................. 32<br />
3.3.1 Textually Situated <strong>Description</strong>s................................................................ 33<br />
3.3.2 Constraints: Detail vs. Interpretation........................................................ 35<br />
3.4 Discussion: The Role of the Describer ............................................................. 37<br />
3.4.1 Describing as a Way of Thinking ............................................................. 37<br />
3.4.2 Describer as Intermediary......................................................................... 39<br />
3.4.3 <strong>Description</strong> as a Group Process ................................................................ 39<br />
vi
3.5 Discussion: The Role of the Consumer ............................................................ 41<br />
3.5.1 The Consumers: Actively Building a Text Model.................................... 41<br />
3.5.2 The Purposes and Goals of Consumers .................................................... 43<br />
3.6 Cultural Issues with <strong>Description</strong>? ..................................................................... 45<br />
4. STUDY OF AUDIO DESCRIPTION......................................................................47<br />
4.1 The Study Corpus ............................................................................................. 47<br />
4.2 Methodology..................................................................................................... 48<br />
4.3 The Structural Components of <strong>Audio</strong> <strong>Description</strong>........................................... 49<br />
4.3.1 Insertions ................................................................................................... 50<br />
4.3.2 Utterances.................................................................................................. 53<br />
4.3.3 Representations ......................................................................................... 56<br />
4.3.4 Words........................................................................................................ 67<br />
5. USING THE DEFINITIONS FOR ANALYSIS......................................................69<br />
5.1 Descriptive Mass............................................................................................... 69<br />
5.2 The Textual Role of Insertion Content ............................................................. 72<br />
5.3 Sample Analysis: Persistent Entity Development............................................. 72<br />
5.4 Sample Analysis: The Scene as Frame ............................................................. 74<br />
5.5 Sample Analysis: Utterance Patterns ................................................................ 75<br />
5.6 Sample Analysis: Representational Combinations ........................................... 77<br />
5.7 Analysis Challenges: Time, Reality, and Cultural Elements............................ 78<br />
6. SUMMARY: A LANGUAGE SYSTEM ................................................................79<br />
7. FUTURE STEPS......................................................................................................83<br />
7.1 Consumers Study .............................................................................................. 83<br />
7.2 Supporting Further Developments of <strong>Audio</strong> <strong>Description</strong>................................. 84<br />
7.3 Descriptive & Comparative Studies of Other Forms of VAD.......................... 85<br />
7.4 Human Subjects Studies with AD..................................................................... 86<br />
7.5 Educational Materials Study............................................................................. 86<br />
7.6 <strong>Assistive</strong> Technological Research .................................................................... 87<br />
APPENDIX A: GLOSSARY.............................................................................................89<br />
APPENDIX B: TRANSCRIPTION & MULTIMODAL ISSUES...................................92<br />
Transcription Conventions ........................................................................................... 92<br />
Multimodal Issues........................................................................................................ 92<br />
Sub second timing ........................................................................................................ 93<br />
vii
APPENDIX C: VERBAL DESCRIPTIONS FOR FIGURES .........................................94<br />
REFERENCES ..................................................................................................................97<br />
NOTES.............................................................................................................................104<br />
viii
LIST OF FIGURES<br />
Figure 1 – Different practices of visual description in 2002................................................ 10<br />
Figure 2 - Conceptual view of Internet descriptions ........................................................... 16<br />
Figure 3 - Overview of VAD and other prototypical communication processes................. 27<br />
Figure 4 - Chafe's view of immediate mode........................................................................ 38<br />
Figure 5 – Conceptual view of a the description process.................................................... 40<br />
Figure 6 - Conceptual diagram of consumer's process........................................................ 42<br />
Figure 7 - Length of utterances in corpus........................................................................... 54<br />
Figure 8 - Chart version of table 4 data.............................................................................. 71<br />
LIST OF TRANSCRIPTS<br />
Transcript 1- From “The Gift of Acadia” 1:06................................................................... 52<br />
Transcript 2 - From "A Star is Born" 16:20........................................................................ 55<br />
Transcript 3 - From "Gladiator" 40:26............................................................................... 59<br />
Transcript 4- From "Gladiator" 1:42:30.............................................................................. 60<br />
Transcript 5 - From "Gladiator" 47:49............................................................................... 62<br />
Transcript 6 - From "La Story" 3:38................................................................................... 62<br />
Transcript 7 - From "LA Story" 19:00................................................................................ 62<br />
Transcript 8 - From "LA Story" 20:36................................................................................ 63<br />
Transcript 9 - From "A Start is Born" 1:58......................................................................... 64<br />
Transcript 10 - From "Gladiator" 6:10............................................................................... 64<br />
Transcript 11- From "Gladiator" 1:15:29............................................................................ 65<br />
Transcript 12- From "Gladiator" 12:00............................................................................... 65<br />
Transcript 13 - From "LA Story" 1:50................................................................................ 65<br />
Transcript 14 - From "Gladiator" 58:42.............................................................................. 66<br />
Transcript 15 - From "LA Story" 7:50................................................................................ 66<br />
Transcript 16 - From "La Story" 90:38............................................................................... 67<br />
Transcript 17- From "La Story" 25:36................................................................................ 67<br />
ix
LIST OF TABLES<br />
Table 1 - Study corpus material.......................................................................................... 48<br />
Table 2 - Summary structural components of audio description......................................... 50<br />
Table 3 - Comparison of description mass in different four texts....................................... 70<br />
Table 4- Distribution of description mass by insertion length............................................. 71<br />
Table 5 - Referring terms for main character of "Gladiator"............................................... 73<br />
Table 6- Comparison of description styles.......................................................................... 76<br />
x
1.1 Overview<br />
1. INTRODUCTION<br />
The title of this work “<strong>Audio</strong> <strong>Description</strong> a <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>” may<br />
introduce one or two terms into the vocabulary of the readers. The practices that form the<br />
basis for this study are recent having only become established towards the end of the<br />
twentieth century. They are based on communication and digital technologies, and rely<br />
upon language for their success.<br />
<strong>Audio</strong> <strong>Description</strong> is a way to provide a visually impaired person access to visually<br />
rich productions including movies, television programming, plays, live events, and museum<br />
exhibitions. With <strong>Audio</strong> <strong>Description</strong>, a describer inserts spoken words to provide<br />
representations of information contained in the visual field of the production. The inserted<br />
description, when combined with existing audio content, including dialog from the original<br />
production, creates a new text that is more accessible than it would be without the addition<br />
of the description. This study looks at <strong>Audio</strong> <strong>Description</strong> as a language system and<br />
describes its features and structural components and shows how it can be analyzed in a<br />
manner similar to other discourse types.<br />
<strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> is a term introduced in this study to describe the type of<br />
communication process that <strong>Audio</strong> <strong>Description</strong> is. <strong>Audio</strong> <strong>Description</strong> and practices like it<br />
rely upon special assistive technology and <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> is a linguistic view of the<br />
communication processes that operate over and through assistive technology, but is<br />
fundamentally a human communication process.<br />
1
Specific productions and practices of language use can be viewed as parts of broader<br />
classes: a telephone conversation can be viewed as a type of conversation, electronic mail can<br />
be viewed as a form of correspondence, and conversation that utilizes a sign language<br />
interpreter can be viewed by its reliance upon an communicative intermediary. Further,<br />
these classifications are not purely hierarchical: a conversation interpreted with sign language<br />
is also a conversation, while an interpreted unidirectional event such as a political speech is<br />
not.<br />
This study is descriptive. It is the first attempt to look at this area, not as a service or<br />
accessibility issue, but as language system. There is an asymmetry to this domain because<br />
language and vision operate differently and to view this system within a language framework<br />
may require a conceptual shift. That shift is from the perspective of the sighted describer<br />
who works with both vision and language to the perspective of the consumer of this service<br />
who often receives just the language. From that perspective, it is a language issue. While<br />
many human interactions from buying coffee to expressing amity or hostility can be<br />
accomplished with language in a secondary or non-existent role, in this area, language is the<br />
essential medium of exchange. The contents of this study could be expressed as theses, as<br />
propositions, within the framework of problem statements. At the core of these different<br />
ways to frame this study is the belief that <strong>Audio</strong> <strong>Description</strong> is a type of <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />
that is a type of language system. This language system is used by people who have different<br />
perceptual abilities, but are members of the same speech communities. Understanding this<br />
language system can involve many different levels of analysis including the participant<br />
structure, its environmental constraints, the conditions under which it is practiced, and its<br />
external form from small parts such as words to larger discourse units.<br />
2
The sister practices that are also types of <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>, like <strong>Audio</strong><br />
<strong>Description</strong>, are recent and in formative stages. So this study may provide benefits to these<br />
practices by illuminating the techniques and methodologies that <strong>Audio</strong> <strong>Description</strong> employs.<br />
Because this study is situated at an early stage in the lives of these practices that are steadily<br />
growing, and there is little academic work to build on, it takes a broad approach and tries to<br />
address from several important perspectives some fundamental questions about what this<br />
process is like.<br />
Readers of this document should see what the author has become convinced of: that<br />
these practices are vast both in the range of conceptual issues and their impacts on people’s<br />
lives. This is a technical study and it will not address many issues people who are<br />
experienced in visual description consider important such as the interpretive aspects of<br />
description, which content should be described, or describing under specific textual<br />
constraints. These are important areas to be sure, but the focus of this study is to help<br />
define the process characteristics and participants in a way that will support many types of<br />
ongoing qualitative discussions.<br />
1.2 Research Goals & Questions<br />
As a descriptive study, this research can potentially serve a range of purposes from<br />
qualitative analysis to comparative studies that require a formal definition of the language<br />
use. For the purposes of this document, this study’s intent can be formalized in two sets of<br />
research questions in two different types of studies. The first study looks at all of these<br />
communication practices as variations of a type of process that is distinctive from other<br />
3
forms of language use. This study defines the process called <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />
(VAD) through the following questions:<br />
1. What properties are common to the practices of providing visual information<br />
through language in electronic texts?<br />
2. How is this communication process similar to and distinct from other forms<br />
of language use in terms of participants and development of a text?<br />
3. Other than providing a theoretical definition, what other reasons exist for<br />
viewing these practices as variations of a common process?<br />
The method employed in the conceptual study is descriptive and logical. It describes<br />
the properties and components that are broadly at work in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> and<br />
then draws in related literature and concepts to assemble a conceptual view that reflects that<br />
which is common about the practices and the common implications of the structure.<br />
The second or inner study looks at the language produced in one of the visual<br />
description practices. The practice chosen for this study is <strong>Audio</strong> <strong>Description</strong>. While <strong>Audio</strong><br />
<strong>Description</strong> (AD) will not show every possible linguistic form of this descriptive process, it<br />
is a practice area with a strong methodological history. This second study part has two main<br />
research questions:<br />
1. Is there a constituent structure that is different from other language uses?<br />
2. What types of information are provided within <strong>Audio</strong> <strong>Description</strong> and what<br />
patterns in representation exist?<br />
4
These two types of studies support each other to provide a more comprehensive<br />
view of this unique communication process and thereby support the proposition that <strong>Audio</strong><br />
<strong>Description</strong> is an example of a type of language process called <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>.<br />
1.3 Anticipated Benefits and Limitations of this Research<br />
This research is intended as an initial investigation into a way to use language that has<br />
largely been unexplored, linguistically. An anticipated benefit of this research is the<br />
understanding of this process as a formal system with specific characteristics that can be<br />
observed, measured, and taught. At the time that this study is being written a number of<br />
discussions in the community of those providing and those using visual description are<br />
occurring that involve standards and guidelines (Levine, 2002). Historically, these practices<br />
are in their earliest and formative stages. And, as <strong>Audio</strong> <strong>Description</strong> and its sister language<br />
forms are like other language systems, they will evolve through negotiation of the<br />
participants. This research is then positioned at an important stage in this emerging language<br />
system and aims to support the development of it by a description of its process level<br />
characteristics that can support this ongoing negotiation.<br />
As with other areas that benefit a disabled community, much of the previously<br />
available resources to support the blind and visually impaired in accessing visual information<br />
have been focused on service, delivery, and regulatory areas rather than research. Some may<br />
see parallels to the struggles of the deaf community to have acceptance of sign language that<br />
for years was banned as ‘deviant.’ Some may find that compared to other fields, the amount<br />
of previous research in this area is small especially considering the conceptual issues involved<br />
5
and that this study is required to cover a broad area without the ability to focus on specific<br />
and important components.<br />
Will this study yield insights that help the larger population of language users? While<br />
making no claim that this area will have broad relevance, it is important to note the historical<br />
precedence for research into areas of communicative challenge, including the documentation<br />
of American Sign Language (Stokoe, 1965) and Vygotsky’s work with deaf-blind children<br />
(Hardy, 2000) that have provided insights into general properties of language and cognition.<br />
One of the dimensions of this research is that forms of information such as pictures, actions,<br />
and gestures that are not usually considered language or equivalent to language are being<br />
replaced with language. The fact that these non-linguistic forms do not have the same<br />
properties of speech or writing could be a basis to consider this research as not a ‘linguistic’<br />
effort or alternatively may reinforce the need to define the non-linguistic visual information<br />
with the type of structure often used to study language and language use.<br />
Independent of the benefits for the study of language, there is a rich history of<br />
efforts originally intended to benefit a disabled group that resulted in benefits for the general<br />
population. Some examples include illuminated elevator floor lights, originally for the deaf;<br />
ramps and curb cuts originally for wheelchairs that are now used by baby strollers and<br />
bicyclists; and the secondary audio programming (SAP) channel now used for a multilingual<br />
support in television programming. Perhaps because <strong>Audio</strong> <strong>Description</strong> and its sister<br />
practices are so new, there are benefits to those challenged by language or learning a new<br />
language or other groups as yet unknown.<br />
6
1.4 Organization of the Sections<br />
This study is organized from the outside in and starts with current practices and<br />
completes by describing some paths for further study. Section two contains background<br />
information that discusses the blind and visually impaired communities, the existing practices<br />
of providing visual description and some of the relevant literature that exists. Section three<br />
looks specifically at <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD) as a formal system by defining it in<br />
terms of properties and a participant framework as well as implications for the principle<br />
roles and processes. Section four looks at <strong>Audio</strong> <strong>Description</strong> (AD) with a review of several<br />
productions of movies described by different organizations. This section develops a set of<br />
terms for the structural and functional components based on examples from transcripts of<br />
AD. Section five builds upon the definitions in section four with a series of analytic<br />
perspectives that looks at referring sequences (Schiffrin, 1994), experiential frames<br />
(Goffman, 1974, Tannen, 1993a), foci of consciousness (Chafe, 1994), and sites of<br />
engagement (Scollon, 2001a). Section summarizes the findings from AD to the larger<br />
practices of VAD and section seven describes some paths for additional research.<br />
1.5 Note for <strong>Visual</strong>ly Impaired Readers<br />
In order to make this document more usable if it is translated into an accessible<br />
form, the topical headings and subheadings use a numbering system rather than the standard<br />
American Psychological Association (APA) style. Also, all of the images used in this report<br />
have associated verbal descriptions contained in Appendix C. These descriptions will also be<br />
extracted as a separate file that can be accessed concurrently with this document.<br />
7
2. BACKGROUND INFORMATION<br />
The goal of this section is to provide some orientation to readers who will likely have<br />
different experiences and understandings. Below will be a discussion of the consumers of<br />
visual description, the main practices of visual description, the goals of accessible media, and<br />
some practical benefits for viewing these disparate practices as variations of a single process.<br />
2.1 The Primary Consumers of <strong>Visual</strong> <strong>Description</strong><br />
Most, but not all, of the consumers of visual description are the blind or visually<br />
impaired. Because of the variety of factors that impair vision, the effects that occur when<br />
vision loss comes at different ages, and the way statistics on visual impairments are collected,<br />
a simplified description of the communities of visually impaired individuals is not possible in<br />
this forum. Not only is there no typical blind person, there are numerous ways to categorize<br />
the population, including grouping those who have had an impairment from birth<br />
(congenital) or acquired it at some point in life (adventitious), or by looking at the nature of<br />
the impairment such as focus, loss of visual field, or ability to discern details. Current<br />
estimates of the prevalence of visual impairment are that there are 1.3 million citizens<br />
affected with some form of visual impairment in the United States; 98,000 of which are<br />
students (AFB, 2001a). The National Federation of the Blind reports that 3.5% of the<br />
population over 65 years of age has some visual impairment (NFB, 2000), while the<br />
American Foundation For the Blind reports a much higher 16% of the same age group 1 .<br />
Timely access to accessible educational information including textbooks, multimedia,<br />
and educational videos are considered significant impediments to education and training.<br />
Many blind children are educated in mainstream schools, but less than half complete high<br />
8
school. For those that complete high school, they are just as likely to take some college<br />
courses as sighted individuals and much less likely to graduate (AFB, 2001b). Nationally,<br />
unemployment for working age visually impaired individuals is over 50% with less than one<br />
in three of those of working age and legally blind being employed (AFB, 2000).<br />
The visually impaired community does not have a separate language. Braille is a way<br />
to represent a natural language such as English in a tactile form, but most members of this<br />
community do not read Braille, or do not read it well. While Braille is an important mode of<br />
communication for the blind, it is not a universal solution for communicating information.<br />
At present, Braille literacy is in decline (Schroeder, 1994). VIB individuals use language in<br />
essentially the same ways 2 , including visual references, as do those with sight (Warren, 1994)<br />
(Elbers, 1999). For those who lost sight after critical developmental ages, concepts<br />
grounded in vision such as colors, perspective, visually based inferences (Wyver, 2000) are<br />
still relevant even if they cannot see. For the others, including the congenitally blind, there is<br />
also interest in visual information and in understanding the visual world. No research or<br />
other indications emerged in this study that descriptive language for the VIB should be<br />
different from language that would be used with sighted consumers of visual description<br />
who were not accessing the visual information. The differences between regular language<br />
use and the language used in VAD (discussed later in this document) are not related to visual<br />
impairments but to the fundamental nature of the visual descriptive process.<br />
9
2.2 Practices of Describing Images for <strong>Assistive</strong> Technology<br />
Using language in electronic texts to replace vision is practiced by a variety of<br />
organizations and with a variety of text types. This analysis divides these into four mostly<br />
unrelated areas as illustrated in figure 1. Except for serving a common set of customers,<br />
there is little apparent integration of these activities, at the time of this study.<br />
Live<br />
<strong>Description</strong><br />
1981<br />
<strong>Audio</strong> <strong>Description</strong><br />
Described<br />
Video<br />
Figure 1 – Different practices of visual description in 2002<br />
2.2.1 <strong>Audio</strong> <strong>Description</strong><br />
<strong>Audio</strong><br />
Tours<br />
One of the two most developed areas of these practices is called <strong>Audio</strong> <strong>Description</strong><br />
(AD). AD actually encompasses several different usages that have in common that they use<br />
human voice rather than voice synthesis. They also work primarily with texts that are in<br />
motion rather than fixed. Further, they also share a common methodological history.<br />
Beginning with Live <strong>Description</strong><br />
<strong>Audio</strong><br />
Books<br />
1948/1971<br />
Although it appears that the very first described performances were recorded on tape<br />
and broadcast through radio reading services (Packer, 1997b), the first sustained and<br />
standardized AD program began in 1981 with the work of Dr. Margaret Pfanstiehl and her<br />
organization called the Metropolitan Washington Ear. The “Ear” began providing live AD<br />
10<br />
Software &<br />
Interactive<br />
Media<br />
Late 1990s<br />
The Internet<br />
& Multimedia<br />
Late 1990s
for plays at a Washington DC area theater using local FM transmitters and a describer who<br />
would insert narration between sections of dialog. The practices and techniques developed<br />
for live description were influential for all of the major areas of AD that followed.<br />
The most significant difference between live description and the other forms<br />
discussed below is that, as a result of being live, the describer must be sensitive to events in<br />
real time and the description cannot be pre-packaged because even theatrical performance<br />
(sometimes by design) varies in terms of events and timing. This is not to indicate that the<br />
description is fully spontaneous. Before a theatrical production is described, it is previewed<br />
by the describer and in some cases by two describers, one for the program information and<br />
one for the performance, so there is some redundancy as well as preparation in the creation<br />
of the descriptions (Weber, 2002).<br />
Live description is also now used in non-theatrical settings such as weddings and<br />
ceremonies. Occasionally, live events are broadcast simultaneously (simulcast) on the<br />
Internet.<br />
Described Video<br />
The next major development in <strong>Audio</strong> <strong>Description</strong>, and now the most prevalent<br />
form of AD is called described video. Described video includes television, films, and streaming<br />
media. This form of visual description reaches thousands of viewers/hearers daily. And, as<br />
stated earlier, some of the earliest described performances occurred experimentally in the<br />
1970s. In 1975, a theoretical approach was developed as part of a Master’s thesis by Gregory<br />
Frazier (Frazier, 1975). But, it was not until 1982 when the Washington Ear connected with<br />
WGBH, the public broadcasting station in Boston that had pioneered closed captioning and<br />
11
continues to pioneer accessible media including the secondary audio program (SAP) channel,<br />
that described video became a broadcast reality (Packer, 1997b). WBGH, with consultation<br />
from “the Ear” began broadcasting described shows and later, with support from the U.S.<br />
Department of Education, launched the descriptive video service (DVS) in 1990 and the<br />
National Center for Accessible Media (NCAM) in 1993. DVS is now one of the two largest<br />
providers of described video material in the U.S. and, in addition to providing broadcast<br />
products, sells a collection of described videos while NCAM focuses on newer technologies<br />
such as interactive media and devices. WGBH is not the only organization providing<br />
described video products. In 1988, seemingly independent of these efforts on the East<br />
Coast, Jim Stovall, who lost his sight as a young adult, developed another descriptive<br />
approach. Rather than broadcast on a separate channel, his company used the existing audio<br />
channel with description inserted in between dialog so that all viewers receive the same<br />
audio content. His company, the Narrative Television Network (NTN), also began with an<br />
emphasis on television programming and was a commercial enterprise that was sustained by<br />
advertising as well as funding from the U.S. Department of Education. Initially, the<br />
descriptive style used at NTN was sparser than the original description style used at “The<br />
Ear” and WGBH, but by today, Stovall indicates their styles are fairly close (Stovall, 2002).<br />
With a recent Federal <strong>Communication</strong>s Commission (FCC) ruling 3 that requires<br />
several prime time hours of television programming each day to be described, a number of<br />
other organizations have entered the <strong>Audio</strong> <strong>Description</strong> market space. Recently, the<br />
National Captioning Institute (NCI), known for real-time closed captioning, launched a<br />
program of described media led by Joel Snyder who has been active in <strong>Audio</strong> <strong>Description</strong><br />
12
since the early 1980’s, including work with the Ear and the National Endowment for the<br />
Arts, and in developing audio tours of museums.<br />
A significant feature of all of the organizations that provide description for television<br />
and film is that their staffs are usually paid and they employ an extensive pre-description<br />
process including scripts, writers, and editors.<br />
Recorded Tours<br />
Starting in the mid 1980s, museums began offering <strong>Audio</strong> <strong>Description</strong> tours (Snyder,<br />
2002a) and the practices have spread extensively (ASTC, 2001). Since this type of <strong>Audio</strong><br />
<strong>Description</strong> is more geared towards fixed stimuli such as paintings and museum exhibits, it<br />
would seem to be similar in some ways to the content found in books.<br />
Methodological Ancestry<br />
All of the major providers of described video products in the US had early and<br />
substantive consultations with and training from the Washington Ear and these<br />
organizations readily attributed much of their understanding of the principles of description<br />
to the Ear and the Phanstiehls (Goldberg, 2002, Snyder, 2002a, Stovall, 2002). In addition,<br />
at the time of this report, there are active AD programs throughout the world and many of<br />
these programs have had substantive contact with those based in the US (Simpson, 2001)<br />
including from the Ear (Pfanstiehl, 2002a). While the practice of <strong>Audio</strong> <strong>Description</strong> is now<br />
spread over dozens of organizations, there is a common methodological history derived<br />
from the live performances that began 1981.<br />
13
2.2.2 <strong>Audio</strong> Books<br />
Historically, printed material has been a very different medium from film and video<br />
and so it follows that the description practices used for printed material would be unrelated<br />
to <strong>Audio</strong> <strong>Description</strong>. In the United States, there is only one organization, Recording for<br />
the Blind and Dyslexic (RFB&D), that seems to provide most of the descriptions of visual<br />
information in books. The American Printing House (APH) for the Blind may also, but<br />
repeated inquiries provided no confirmation. And, while the APH does provide material in<br />
audio form, providing material in tactile form (Braille and tactile images) seems to be their<br />
focus, so that today, RFB&D is essentially the main supplier of audio textbooks (Burnham,<br />
2002, Wall, 2002).<br />
RFB&D began as Recording for the Blind (RFB) based in Princeton, New Jersey. It<br />
was chartered in 1948 to help visually impaired soldiers returning from World War II and is<br />
funded by government, private industry, and subscription services. This organization<br />
provides textbooks on a range of subjects in audio format to students with documented<br />
visual and/or learning disabilities. These textbooks are read along with descriptions of<br />
images 4 into a digital recording system that allows page-by-page access. The material is then<br />
distributed either digitally or on cassette tape. RFB&D operates 32 recording studios<br />
nationally, using over 5300 volunteers, and serving over 25,000 blind members (RFB&D,<br />
2001a). The RFB&D volunteers usually have deep experience in subject areas that they read<br />
for and are often retired professionals.<br />
The National Braille Association (NBA) produced a manual in 1971 for recording<br />
books on tape that included instructions on descriptions of images, including maps,<br />
diagrams, and charts. RFB&D uses the NBA guidelines and also has developed an extensive<br />
14
set of procedures for describing images in the range of disciplines taught in public schools<br />
and post-secondary institutions. Subjects include chemistry, computer science, social<br />
studies, geography, and math. This process includes volunteer readers and staff that support<br />
those readers by selecting which images will be described and where in the audio stream the<br />
descriptions will be placed. The volunteer who is reading the text then creates the<br />
descriptions. RFB&D policy suggests that the image descriptions be written out prior to<br />
reading by the volunteer, but linguistic evidence indicates that some (and perhaps most) of<br />
the descriptions are spontaneous. Like WGBH, RFB&D has subject focus specialists/areas<br />
and often assigns material of certain types to studios in different cities (Smith, 2002,<br />
Vollmer, 2002).<br />
There is also a worldwide consortium effort underway to develop a digital talking<br />
book (DTB) standard (Kerscher, 2001a). This effort includes publishers and library systems<br />
with the goal of providing a standard document interchange format based on the worldwide<br />
web consortium (W3C) extensible markup language (XML) (Kerscher, 2001b). Similar to<br />
the Internet, these standards use the term textual equivalent in relation to images.<br />
2.2.3 Software and Interactive Media<br />
Software and Interactive media are relatively new areas for visual description. While<br />
the prototypical texts for visual description, plays and books, have been around for<br />
thousands of years, the software industry is less than fifty years old. And, it is only within<br />
the last twenty years (more recently in a major way) that software and interactive media<br />
components have come in contact with the general population that includes the visually<br />
impaired.<br />
15
Interactivity is used here to denote a human-technology interchange where the<br />
human is presented with options to direct the technology to alter its form and/or functions.<br />
The term “Rich Media” is used by NCAM to mean “elements on a web page (or in a<br />
separate player) which exhibit dynamic motion over time or in response to user interaction”<br />
(NCAM, 2002) Convergent Media is another term used at NCAM for interactive elements,<br />
mostly in relation to digital television (Wlodkowski, 2002). The same regulations that cover<br />
the Internet also cover interactive elements on web pages; including navigational graphics<br />
(client & server maps), applets, and other dynamic media. This area is fairly new and there is<br />
little history to study.<br />
2.2.4 Multimedia and Internet Sites<br />
The term multimedia is<br />
often used to describe (like<br />
multimodal) content that is<br />
conveyed through more than one<br />
representational form (Corn,<br />
2002). The Internet can be<br />
viewed as an interconnected<br />
multimedia environment. The<br />
most important differentiator<br />
Figure 2 - Conceptual view of Internet descriptions<br />
Still<br />
Images<br />
Live (Simulcast) Events<br />
between Internet/multimedia and those mentioned before is that all of the earlier<br />
publication types, with the exception of live performances, can be included in multimedia<br />
16<br />
The Internet<br />
Text and Hypertext<br />
Moving<br />
Images<br />
Interactive<br />
Elements
texts. An Internet text can include live elements as well. The Internet, as illustrated in figure<br />
2, is really the superset of all other descriptive practices.<br />
<strong>Description</strong>s of non-textual components on the Internet are addressed through<br />
several standards and guidelines. The most widely known of these guidelines is section 508<br />
of the United States Rehabilitation Act of 1998 5 that covers many Federal government<br />
websites. Non-textual components (still images, moving images, and interactive elements)<br />
on executive branch websites are required under section 508 to have a description. Known<br />
commonly as the “508 standard,” it is often used as a measure for other organizations,<br />
including local governments, non-governmental organizations, and universities who wish to<br />
make their websites accessible but are not bound by the 508 mandate. There is also a<br />
voluntary guideline developed by the Worldwide Web Consortium (W3C, 1999) that covers<br />
essentially the same territory as section 508 but differs in some details. There are several<br />
software tools that check web sites for compliance to these standards that allows web sites to<br />
claim compliance with a standard. There is also a movement called the “Speech Friendly<br />
Sites” that connects to the 508 and W3C guidelines as well (Artic Technologies, 2002).<br />
There have also been many books published just in the last few years on developing<br />
accessible websites. None seems to provide more than a handful of pages related to the<br />
challenges associated with describing images and creating textual equivalents.<br />
All of the standards, guidelines, and books reviewed in this study take a similar<br />
approach to non-textual components. They rely upon features inherent in the hypertext<br />
markup language (HTML) standard for descriptions to be placed on these non-textual<br />
elements. And, they specify these elements should have a “textual equivalent” description.<br />
The guidance for that equivalent specifies that it must convey “the meaning of the image”<br />
17
(Board, 2001a, Board, 2001b). For many non-textual elements such as buttons and audio<br />
files, the description options are straightforward and formulaic. For images that carry<br />
essential communicative content rather than just decoration, for example the images found<br />
in textbooks, the descriptions are more challenging. None of these guidelines really<br />
addresses the range of descriptive options that might exist or how the person ‘surfing’ the<br />
web would cognitively process different descriptive approaches. In short, the standards and<br />
guidelines require the existence of descriptions, but provide little detail as to what those<br />
descriptions should include or how they should be constructed.<br />
No estimates of the number of organizations attempting to make Internet sites<br />
accessible were available. And, since the guidelines are so open, it is likely that there are<br />
hundreds if not thousands of different approaches to describing images in these new media.<br />
2.3 The Goal of Accessible Media: Usability and Experiential Equivalence<br />
The goals of accessibility practices discussed here have been expressed with two<br />
types of approaches, usability and experiential equivalence. Some distinguish usability from<br />
accessibility as: accessibility is based on the technology being able to access the information<br />
while usability is the information being meaningful and that has an intended effect upon the<br />
listener (Baquis, 2002). So, accessibility can be viewed as a component of usability (Slatin,<br />
2001). Experiential equivalence is a term that takes the concept of usability further to consider<br />
the usable experience as one that is equivalent to the experience the non-disabled would<br />
have. Section 508 requirements for the Internet specify that the description should<br />
“communicate the same information as its associated element” (Access Board, 2001). The<br />
18
NBA guidelines used at RFB&D have a similar guideline to ”allow the author to make his<br />
own impression on the listener.”<br />
Each practice approaches this concept with different terminology, but the underlying<br />
principle is the same – detailed pictorial description is not required, but rather the goal of the<br />
process is to provide essential information to allow the consumer to experience the original<br />
text in a similar manner as a sighted consumer would have including the opportunity to<br />
assign their own meanings to the texts rather than use the interpretations of others.<br />
2.4 Previous Qualitative Evaluations of <strong>Description</strong><br />
It seems that little research has been done to measure the benefits of visual<br />
description. Some studies related to <strong>Audio</strong> <strong>Description</strong> in recent years, and based mostly on<br />
subject self reports have indicated that described video and live performances are beneficial<br />
on a number of levels 6 . In “Video <strong>Description</strong> in North America,” Packer cites seven types<br />
of benefits from description (Packer, 1996):<br />
1. Gaining knowledge about the visual world,<br />
2. Gaining better understanding of televised materials,<br />
3. Feeling independent,<br />
4. Experiencing social connection,<br />
5. Feeling equality with fully sighted,<br />
6. Experiencing enjoyment, and<br />
7. Relief of burden on sighted viewers<br />
These findings were based on analysis of DVS customer feedback and are consistent<br />
with other reports that a production with <strong>Audio</strong> <strong>Description</strong> provides conceptual and<br />
19
cultural inclusion that is not possible otherwise (Lovering, 1993, Packer, 1997a, Packer,<br />
1997b, Pfanstiehl, 2002c).<br />
Outside of the realm of <strong>Audio</strong> <strong>Description</strong>, no studies of the benefits of describing<br />
visual information were found by this investigation. It follows that an appreciable portion of<br />
textbook images contain substantive content, and certainly, studies have shown that images<br />
in texts often improve comprehension over the text alone (Levie, 1982). As to whether the<br />
described images have similar benefits does not seem to have been studied. The Internet<br />
and interactive media areas are too new and perhaps too diverse to provide any generalized<br />
practices for comment. And, what little research has been done into this area from the<br />
consumer’s perspective indicates that these areas are much less successful than <strong>Audio</strong><br />
<strong>Description</strong> and audio books (Gerber, 2002a, Gerber, 2001, Gerber, 2002b).<br />
No studies seem to have been conducted as to whether description of other types of<br />
graphics holds benefits for people who are not visually impaired such as those with cognitive<br />
impairments, including Alzheimer’s disease, learning disabilities, and second language<br />
learners who could benefit from additional spoken descriptions to accompany visual<br />
information.<br />
2.5 The Practical Case for a Unified Model<br />
While there are clearly academic reasons to consider these practices as variations of a<br />
single process, it does introduce a new layer of complexity into what are challenging and<br />
little studied fields. There are also practical benefits to viewing this area as a single process<br />
from the perspective of those providing these services. Below considerations for and against<br />
this view are presented.<br />
20
2.5.1 Reasons to Consider as Separate Practices<br />
The two most important sets of facts to support separate analyses of descriptive<br />
discourses are the differences in traditional media types and consumer interests. Books and<br />
dynamic moving media (video, film) operate according to different audience dynamics. A<br />
book is read according to the schedule of the reader and video productions such as<br />
television shows and movies are linear and have their own fixed timescales. Textbooks<br />
contain still images including diagrams and illustrations that convey specific and often<br />
conceptual messages, while cultural video and film rarely contain diagrams or illustrations.<br />
And while both books and dynamic moving media contain segments, scenes and chapters,<br />
the organizational content of these segments is different. Books present subsections/topics<br />
with hierarchical structures (Lemke, 2002, Raman, 1994) and video productions contain<br />
characters, locations, action, and dialogue that interrelate on a continuous basis as the scene<br />
evolves. There are also substantial differences in different types of consumers and their<br />
backgrounds. Congenitally or early adventitiously blind who have little or no experience<br />
with the visual world want different types of descriptions than those who have a memory of<br />
the visual world and usually want much more (Pfanstiehl, 2002b), raising questions about<br />
what areas of description are appropriate.<br />
2.5.2 Reasons to Consider as a Single Process<br />
The reasons for a more integrated model of this unique communication process fall<br />
into four areas. First, while more traditional media such as books and television shows can<br />
be viewed as very different interactional experiences for the consumer, these lines are not so<br />
neatly drawn with respect to newer digital media. Educational software and the Internet<br />
21
have characteristics common to books and videos. Furthermore, educational videos, while<br />
using the same media as cultural productions, contain many of the structural characteristics<br />
of textbooks including purposeful illustrations and the ability for the viewer to navigate the<br />
text’s structure. Textbooks now also frequently come with digital media in the form of a<br />
compact disc or references to an accompanying website. In addition, Internet sites can and<br />
frequently do contain all of the characteristics of both books and video as well as additional<br />
characteristics of interactivity and hypermodality. In other words, the web can be like a<br />
book, or a movie, or both, and more.<br />
Second, regarding the audience differences, one could also view this as an issue of<br />
relative proportions. Both congenitally and adventitiously blind of different age groups are<br />
consumers of books, movies, television, and the Internet, but perhaps in different ratios. It<br />
is more probable that the reader of a textbook will be an adolescent, but it is conceivable<br />
that adults returning to school or helping their children with homework might use the same<br />
instructional texts. Likewise children may choose for personal interest or family<br />
participation to watch a film or video genre geared towards an older audience. The Internet<br />
as a media space is used by all ages. And, while many government websites might be geared<br />
towards older (more likely to be adventitiously blind) populations, some websites, including<br />
those for museums and cultural collections, are often oriented to the needs of a young<br />
audience.<br />
Third, digital technologies present new opportunities for the ways that the<br />
information in managed and delivered. Digital technology allows traditional boundaries text<br />
types based on publishing restrictions to be blurred creating new hybrid text types (Iedema,<br />
2003) (Piety, 2001). Currently descriptive technology is developing (along with almost all<br />
22
media technology) as a digital technology with all of the power for distribution and<br />
dependence on software as other digital media. It is likely that descriptive technology will<br />
evolve as other information technologies have to have not just a dependence upon software<br />
but an architecture that is governed by software (Lessig, 1999). Trends in information technology<br />
continue in the directions of special purpose hardware being replaced by general-purpose<br />
hardware governed by special purpose software; and more recently, general-purpose<br />
software governed by conceptual models (Gamma, 1995). The power of the model-based<br />
technology is that we may see performances of assistive technology in the future derived<br />
from a conceptual model of the fundamental human processes at work that the technology<br />
supports. If this is the case, the more general this model can be, the more widely it can be<br />
applied to different textual and technological situations.<br />
Fourth, a dimension that will be discussed in more detail below is that of the<br />
describer. <strong>Description</strong>s come from sighted individuals who face a number of significant<br />
choices in constructing their descriptions. Once this unique communication system and the<br />
special and powerful role of the describer is understood, a role that is similar in some ways<br />
to a sign-language interpreter, there may reasons to consider this a professional skill that<br />
transcends media type. There are already a number of organizations such as NCAM and<br />
NTN discussed earlier that work to make several types of media accessible. By focusing on<br />
the properties of this system that are common to all media types, perhaps an economy of<br />
scale can be achieved in future training and possibly accreditation.<br />
23
3. VISUAL ASSISTIVE DISCOURSE<br />
The practices described in the previous section share many properties. They are<br />
intended to support the same types of people, they all rely on technology, and they have<br />
essentially the same goals. This allows them, despite their differences, to be viewed as<br />
members of the same family of communication practices. This study introduces a term for<br />
these practices, <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD). Below, it is formally defined, including<br />
the characteristic properties that distinguish it, how it compares with other better known<br />
communication systems, as well as a conceptual discussion that covers issues related to using<br />
words for images and implications for understanding the principle roles of describers and<br />
consumers of description.<br />
3.1 <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> Defined<br />
This process can be defined from a number of perspectives. Below, four different<br />
types of definitions are presented: 1) by the meanings intended in the term discourse, 2) five<br />
distinctive properties of VAD, 3) how VAD compares as a process to other prototypical<br />
communications processes, and 4) the essential components of VAD.<br />
3.1.1 The “<strong>Discourse</strong>” in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />
The term “discourse” is often used in different ways (Scollon, 2001b) and several<br />
different meanings are intended with the use of it in VAD. First, some have used it to<br />
indicate units of language larger than a sentence and language in use (Schiffrin, 1994). These<br />
are two of the senses it is used here because visual descriptive language is often extended<br />
beyond words and clauses to larger text sub-units.. <strong>Discourse</strong>s have also been looked at as<br />
24
social practices where language is but one element used in social activities that exist for<br />
specific purposes (Gee, 1999, Scollon, 2001a). From this perspective the roles and activities<br />
of the describers and receivers are considered to be important parts of the discursive<br />
process. <strong>Discourse</strong> has also been defined as a multimodal process when it includes different<br />
forms of communicative content (Kress, 2001). And finally, discourse is used here as<br />
Tannen defined it as ‘language in context across all forms and modes’ (Tannen, 1981), and<br />
she said, linguists in their study of discourse are concerned “with the central questions of<br />
structure, of meaning, and how these function to create coherence.” The search for<br />
coherence is an important goal of studying this communication process because in order for<br />
visual description to be meaningful and true to its intention, it must be coherent to the<br />
listener and coherent with the intension of the author(s) of the texts. <strong>Discourse</strong> then is used<br />
in <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> to include the linguistic and non-linguistic elements that<br />
interact to create meaning with accessible media using assistive technology.<br />
3.1.2 Common Properties of Different Descriptive Practices<br />
is employed.<br />
There are five basic properties that will be evident in all of the practices where VAD<br />
1. Technology enables the reception of remote descriptions.<br />
2. Descriptive messages are non-interactive.<br />
3. The descriptions are subordinate to another text.<br />
4. The description process is non-transparent.<br />
5. The process is constrained by the text.<br />
25
The first of these properties, that technology enables the reception of remote descriptions,<br />
means that descriptions are created in a different place and possibly different time from<br />
when they are received and it is through technology that these descriptions are brought to<br />
the receiver. The second property is that descriptive messages are non-interactive. This<br />
means that they are received as one-way messages without the listener being able to<br />
influence the form or ask for clarification. The third property is that descriptions are<br />
subordinate to another text. The inserted language is not an independent communication<br />
process, but is a component in a text that plays a very specific role. Fourth, the description<br />
process is non-transparent. The act of describing creates a new text, different from the<br />
original text that is influenced by insertion of descriptive language, the choices reflected in<br />
that language, and the acoustic and prosodic properties of the insertions. Fifth, the process is<br />
constrained by the text. The nature of the descriptions can be constrained by the type of the<br />
text and the technology used to transmit it. And, for <strong>Audio</strong> <strong>Description</strong> only, because<br />
descriptions are inserted in between dialog and other audio content, the types of gaps in the<br />
specific text affect the description.<br />
3.1.3 Comparison of VAD and Other <strong>Communication</strong> Systems<br />
As a class or type of language, the system of VAD has specific components and<br />
roles. Figure 3 shows an abstracted representation of VAD compared to typical written,<br />
conversational, and sign language interpretation processes. Looking at the broad<br />
characteristics of visual description, it is both possible to see it is as a unique communicative<br />
process and how it is related to some other systems. It is like conversation in that it is<br />
26
eceived in a spoken form. It is similar to reading because the receiver of it cannot interact<br />
with the text as they cannot with a book. And, there is an intermediary that facilitates the<br />
communication as sign language interpretation provides. And, unlike any of these other<br />
processes, information is inserted to replace visual information only creating a secondary or<br />
modified text.<br />
Figure 3 - Overview of VAD and other prototypical communication processes<br />
Conversant<br />
Face-to-face Conversation<br />
Author<br />
Written <strong>Communication</strong><br />
Co-constructed<br />
Text<br />
Composed<br />
Text<br />
Conversant<br />
Reader<br />
Hearing<br />
Conversant<br />
The similarities and differences extend beyond the reception of the text to the<br />
creation of texts. In conversation, the text is co-constructed and shared between the<br />
interlocutors (Schiffrin, 1987). It emerges rather than being static. Conversely, in most<br />
written communication, an author creates the text that he expects to be used by a reader<br />
who then reads that work in the form the author produced it 7 . With sign language<br />
interpretation (and all interpretations and translations), there are several texts. The<br />
27<br />
Spoken<br />
Text<br />
Interpreter<br />
<strong>Visual</strong><br />
Text<br />
Interactional Sign Language Interpretation<br />
Author<br />
<strong>Visual</strong><br />
Source<br />
Text<br />
<strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong><br />
Describer<br />
Verbal<br />
Insertions<br />
Modified<br />
Text<br />
Signed<br />
Text<br />
Deaf<br />
Conversant<br />
Consumer
interpreter exchanges spoken and signed texts with the conversant, while all of the<br />
conversants share a common visual text that could also be called part of the context. With<br />
VAD, however, the describer is in a position, in fact is required to, add to and modify the text<br />
created by the author. In order to make the text accessible, the describer must alter it by<br />
inserting language that reflects the describer’s choices for representing the visual<br />
information. Where the sign language (SL) interpreter is responsible for converting discrete<br />
discourse units from sign to speech or visa versa, the visual describer must select certain<br />
elements from the visual field and determine which elements will be described, in what<br />
order, and using what terminology.<br />
3.1.4 The Components of the VAD <strong>Communication</strong> System<br />
Any instance of VAD consists at least four components:<br />
• Source text<br />
• Modified text that includes insertions of descriptive content<br />
• Describer<br />
• Consumer.<br />
The source text is a production that was developed by some individual or<br />
organization to communicate with an audience in a specific way and with specific messages.<br />
Generally, the audience that is anticipated in the development of a source text is a sighted<br />
audience and the source text includes both visual and verbal content. In most cases, the<br />
source text is prewritten and recorded. The modified text contains both original verbal<br />
content produced by the author or the source text and descriptive insertions that are<br />
anonymous and disembodied messages (Goffman, 1963) sent to a receiver who is unknown<br />
28
to the describer. The modified text is created by either a third party describer or by the<br />
author of the source text. The describer is a conceptual role and in practice may be one or<br />
more individuals responsible for creating and placing descriptive insertions. In some cases,<br />
such as audio books, the describer is also presenting other content, while in other cases, such<br />
as described video, the describer presents only the descriptive content. The consumer is also<br />
a conceptual role and equates to a range of individuals. The consumer is generally not<br />
known to the describer; both specifically as the person who will be listening to the<br />
description or by known characteristics because the consumer can be from any many age<br />
group and possessing variable visual impairments, as described earlier. There is no typical<br />
consumer and there may be different preferences for types of information to be provided.<br />
These four components will be present in any VAD event.<br />
3.2 Conceptual Issues with Words for Images<br />
At its core, VAD is a process of using words as a substitute for information available<br />
in visual form. The differences between the visual and acoustic channels and the types of<br />
information usually conveyed through language and the visual field are significant and worth<br />
at least a brief discussion. Language and vision are fundamentally different phenomena with<br />
vision being a primary process and language a secondary process (Bateson, 1972) that<br />
reflects a level of interpretation. As Lemke says, “No text is an image. No text has the exact<br />
same set of meaning affordances as any image” (Lemke, 2002). These differences can be<br />
seen in two aspects: sequential vs. parallel processing, and the ways and volumes that each<br />
provides information. This topic connects to schema theory and it is difficult to classify as<br />
either an interpretation or a translation activity.<br />
29
3.2.1 Sequential vs. Parallel<br />
Language is the result of thoughts being encoded and retrieved in a sequential form<br />
where visual information is perceived in parallel, though a gestalt, as a dynamic whole.<br />
Encoding information in language involves specific representational choices including the<br />
words to use and the sequence in which they are placed. When it is spoken, language<br />
becomes a linear medium. As Slatin says of receiving the description of a web page graphic<br />
“it is an experience of time, not in space” (emphasis in original) (Slatin, 2001). While<br />
language becomes fixed when written or spoken, visual information can change as it is<br />
viewed. It changes in the environment, and moving media such as video and film so that the<br />
person viewing is seeing multiple abstract elements (color, shape, line) and recognizable<br />
objects (people, places, things, actions) concurrently. This dynamic property can also be true<br />
of still images because when viewing a static picture, the eye moves and recognizes different<br />
information over time(Holsánová, 2001, Kress, 1996) and many still images have attentional<br />
foci or cues (Dwyer, 1978) and/or motion called vectors built into them (Kress, 1996).<br />
3.2.2 Raw vs. Processed Information<br />
The second main difference to discuss in relation to this topic is the amount of<br />
information a unit of language and image can provide and the way each provides it. In<br />
general, the visual field can provide vast amounts of information and detail where language<br />
requires specific encoding into words that, by their very nature, summarize and categorize<br />
information, often into prototypical concepts (Rosch, 1978). The following hypothetical<br />
example of two statements of visual description is used to highlight what a description could<br />
be and some of the word/concept issues presented.<br />
30
1. A hobo sits at the counter of a greasy spoon<br />
2. A waitress approaches him<br />
These two statements describe a scene and they show certain choices a describer<br />
might make. One of these choices is vocabulary: the terms “greasy spoon” and “hobo” are<br />
specific to a type of location and person. For a listener who knows what these terms mean,<br />
they likely convey a clearer image with fewer words than the more generic alternatives “run<br />
down lower class dining establishment” and “poor traveler.” Another decision the<br />
describer makes is to keep these characters anonymous. They could have names in a story<br />
that have yet to be revealed, but the description in this case only provides the externally<br />
known information. Yet another decision is the choice to focus on these details (waitress,<br />
hobo, counter, greasy spoon) over others that are present in the same image such as time of<br />
day or other characters. Clearly, these two statements do not equate to a visual scene, but<br />
they may provide the information necessary for a hearer to construct a visual scene.<br />
3.2.3 Schema Theory<br />
At the same time that words cannot provide all of the information available in a<br />
visual field, certain words and concepts associated with them imply a whole range of related<br />
meanings that are often available visually as well. The word “restaurant” for example will<br />
usually imply many physical and conceptual elements (kitchens, wait staff, menus, food) that<br />
do not have to be seen. Complexes of concepts and expectations are thought by some to be<br />
stored in mental structures called schemata (Schank, 1977). When schemata are activated, as<br />
by words and/or visual stimuli, expectations of other information and perhaps assumed<br />
values for information not provided and events that might follow become activated as well<br />
31
(Tannen, 1993a). If these schemas guide expectations in interaction and conversation<br />
(Tannen, 1993b) and are at work in the process of reading (Perfetti, 1988, Wilson, 1986) --<br />
two activities that VAD is similar to, they may play a role in allowing a few words to provide<br />
a large amount of content in VAD.<br />
3.2.4 Translation-Interpretation-Transrepresentation<br />
As a language process, visual description could be seen as a member of a family of<br />
translation and interpretation activities. According to Metzger (Metzger, 1999), “Both<br />
translation and interpretation deal with the rendering of a given text into another language”.<br />
A key issue here is that the text that is being rendered is made up of visual elements (scenes,<br />
signs, smiles, etc.) rather than formalized language. If translation is what happens when the<br />
written is converted into the written and interpretation when the conversational is converted<br />
into the conversational (Hatim, 1997), then visual description might be viewed as trans-<br />
representation because it takes raw visual information and converts it into language. The<br />
transrepresentation is affected by the sequential/parallel, high detail/high concept, and<br />
author encoded/receiver determined meaning boundaries.<br />
3.3 <strong>Description</strong>s: Situated/Constrained in Multimodal Texts<br />
In addition to the general differences between language and visual information<br />
described above, visual descriptions are strongly affected by their environment. They are<br />
both situated in and constrained by where they are placed in a text and the textual nature of<br />
the visual elements that are being represented. While it might seem that descriptions would<br />
relate to those elements that are the most salient visual components in a multimodal text<br />
32
(Norris, 2002), the nature of VAD imposes further effects on the nature of the description;<br />
in ways constraining them.<br />
3.3.1 Textually Situated <strong>Description</strong>s<br />
The selection, meaning, and content of any descriptive insertion is influenced by its<br />
textual position, which can be viewed in three ways: 1) as the textual environment it is<br />
within, 2) its position in the text in relation to other occurrences of the same visual stimuli,<br />
and 3) the relative importance of the stimuli independent of its salience. The hypothetical<br />
description of a hobo in the example above would probably be placed in between other<br />
salient textual information as shown in the example below:<br />
3. <strong>Audio</strong>: Clock chimes 5 times<br />
4. Describer: A hobo sits at the counter of a greasy spoon<br />
5. Describer: A waitress approaches him<br />
6. Woman: Coffee?<br />
In this example, the descriptions from lines 4&5 are contextualized with time and<br />
dialog. The information from the source text in lines 3 and 6 provides context for the<br />
description and equally important, the description provides context for the source text<br />
elements. This mutual relationship between the descriptive insertions and existing source<br />
text content is an essential aspect to the meaning of a description because it is all these<br />
elements that the consumer of a described production will receive. These same lines of<br />
description take on a yet other meanings if part of a textbook as the hypothetical example<br />
below illustrates:<br />
33
7. Reader: During the depression, many people lost their jobs and<br />
traveled searching for work, see picture 1<br />
8. Picture 1 shows a hobo sits at the counter of a greasy spoon<br />
9. a waitress approaches him<br />
These examples should show one of the most important aspects of visual<br />
description, the textual environment and that these descriptions do not come ‘naked’ with a<br />
seemingly unlimited set of meanings as lines 1-2 show. Rather, as lines 3-6 and 7-9 illustrate,<br />
descriptions are textually situated and understanding their meanings involves an understanding<br />
of the information provided in the environment of the source text.<br />
The descriptions are textually situated in a second way by the sequential position of<br />
visual information in the source text. If the examples above were not the first time that the<br />
image or place is described, but a successive occurrence where the audience would be<br />
expected to recognize the place, then a different description could be called for entirely, even<br />
if the visual image was exactly the same. It would be different, if for no other reason that it<br />
was known (“the greasy spoon” rather than “a greasy spoon”) and also because additional<br />
details could be provided if the previously supplied information is expected to be<br />
understood.<br />
An additional factor in textually situating descriptions is the relevance to the text that<br />
visual elements have. Also, if there were details about the scene that were important for<br />
future plot reasons but might not be the most salient feature (as we assume the hobo and<br />
waitress are) but are important for the viewer to know about, the description becomes<br />
situated not only by its enclosing information or its appearance in the text but also because<br />
34
of its significance to the internal structure of the text (Gould, 2002, Pfanstiehl, 1984, Snyder,<br />
2002a) as judged by the describer.<br />
3.3.2 Constraints: Detail vs. Interpretation<br />
Each type of text creates conditions that affect the nature of the description. The<br />
limits or restrictions on visual description are imposed by the technology used in the modified<br />
text and by the methods of the describer. These restrictions can place pressures on the<br />
manner in which the description is rendered. A concept behind these constraints being that,<br />
as the language used in descriptions covers more information with less words, the more the<br />
scene might be evaluated and summarized by the describer leaving the consumer of the<br />
description less opportunity to independently assign meaning (Baquis, 2002). Alternatively,<br />
to provide the building blocks of information to allow the consumer more opportunities to<br />
infer meaning may take more time to describe and hear. In the hypothetical examples of<br />
description from above, there were 14 words and 19 syllables. If the text allowed a smaller<br />
insertion, certain changes would need to be made that would either reduce the amount of<br />
detailed information or summarize the situation with less verbal content. For example,<br />
“hobo” might become “bum” or “man” and “greasy spoon” might become “diner.” Each<br />
of these choices in language might have a slightly different effect on the hearer.<br />
The books recorded by RFB&D can use either a special format audiocassette tape or<br />
a special digital format. In both cases, the image descriptions are placed in specific locations<br />
(usually after a paragraph where they are referenced) in the linear recording. While the<br />
consumer of the RFB&D media can scan forwards and backwards, they cannot access the<br />
images separately from the body of the text because both are combined into the same audio<br />
35
stream. The RFB&D technology does not impose any restrictions on the amount of<br />
description, so they would be called unrestricted descriptions.<br />
Internet sites have built-in functions for describing pictures directly in a property<br />
attached to an image (Alt-text) or in a separate page (Long-Desc) that the user can navigate<br />
to. <strong>Assistive</strong> technology either renders the descriptions into refreshable Braille devices or<br />
uses voice synthesis. Some browsers place limits on the amount of Alt–text that can be<br />
displayed, so direct descriptions are potentially restricted while indirect descriptions are<br />
unrestricted. Slatin, in “Maximum Accessibility,” recommends no more than 150 characters<br />
(Slatin, 2001), including punctuation, while Alonzo recommends no more than 300 words<br />
for long description (Alonzo, 2001). Internet descriptions could be viewed as restricted by<br />
convention.<br />
Restrictions for descriptions in <strong>Audio</strong> <strong>Description</strong> are more complicated. As Frazier<br />
had envisioned, description is mostly inserted into the gaps or bridges (Frazier, 1975) in the<br />
audio. A basic rule for all <strong>Audio</strong> <strong>Description</strong> practices is that descriptions do not impede<br />
dialog. And, then theoretically, that entire span between dialog segments could be used for<br />
description. But, there are cases where non-dialog audio cues (crashes, bumps, etc.) require<br />
the description to be synchronized so that the consumer is able to make sense out of the<br />
original audio material. Also, in some cases such as opera or musicals, music or parts of a<br />
soundtrack are considered more important than description so visual description is further<br />
restricted by both the specific text and the conventions of the describer.<br />
<strong>Description</strong>s of software and interactive elements tend to be short and functional,<br />
describing the interactive effect of a function associated with an image rather than an<br />
36
elaborate description of the image itself (Board, 2001b, Gerber, 2002a, Slatin, 2001, W3C,<br />
1999). These descriptions are unrestricted, but usually brief.<br />
3.4 Discussion: The Role of the Describer<br />
Ultimately communication, or a failure to communicate effectively happens between<br />
people. In VAD, these people are primarily either describers or consumers. Having<br />
considered some of the conceptual issues with representing visual information in language,<br />
the different characteristics of language, and the effect of textual constraints, it is now<br />
appropriate to discuss the role of describer again. Describers are the first human link in the<br />
unidirectional process of making media accessible. What the describer selects for<br />
description, the manner it is described in, and how it is positioned in the modified text is<br />
final. The describer is a gatekeeper of information. It is a role that is both powerful and difficult.<br />
The describer must balance all of the visual and linguistic factors, must select which<br />
information is to be presented and how it will be presented within the textual constraints.<br />
Three dimensions of the describer role worth discussing are the cognitive process for<br />
describing, the role as intermediary, and the practice effects that add other dimensions to<br />
how descriptions are created.<br />
3.4.1 Describing as a Way of Thinking<br />
Practitioners of <strong>Audio</strong> <strong>Description</strong>, one of the most methodologically established<br />
forms of VAD say that providing visual description does not seem to be a natural process<br />
for most people and often requires specific training (Gould, 2002, Phansteihl, 2002, Stovall,<br />
2002). Snyder says, “We must learn to see the world anew” (Snyder, 2002b). Perhaps a<br />
reason why it requires special training is that, while descriptions are produced using the tools<br />
37
of everyday communication (speech and writing), the way language is used in visual<br />
description is not common.<br />
Chafe describes two modes of<br />
conversation: immediate and displaced (Chafe,<br />
1994). Immediate conversation deals with an<br />
extroverted consciousness where the mind is<br />
focused on perceiving, acting on, and evaluating<br />
information that is in the present; while<br />
displaced conversation uses a consciousness<br />
that is introverted to remember and imagine.<br />
Chafe describes the majority of everyday talk as<br />
38<br />
Figure 4 - Chafe's view of immediate mode<br />
perceiving<br />
acting<br />
evaluating<br />
environment<br />
EXTROVERTED CONSCIOUSNESS<br />
represented<br />
representing<br />
speaking<br />
language<br />
Speaking in the Immediate Mode<br />
dealing with the introverted and displaced consciousness: events that have happened, did not<br />
happen, or might happen as well as possible conditions and realities and much of the<br />
grammar of languages such as English are designed to accommodate these unreal (irrealis)<br />
situations. As we can see in figure 4, Chafe’s view of immediate consciousness places<br />
evaluating as a natural and integral part of the perceptual process. But, this conflicts with the<br />
goals of VAD, so the describer is required to suppress (or control) the evaluative process so<br />
that the described information is in such a form as to allow the listener to do evaluation<br />
from the raw informational materials. While describing in some ways may be as old as<br />
language itself, only describing what is actually occurring visually at the moment, and<br />
suppressing evaluation, is like asking the describer to pull an “end-run” around the natural<br />
thought process of extroverted consciousness by proceeding directly to speaking. Further,
the environment of VAD is neither the environment that surrounds the describer or the<br />
consumer, but rather the environment of the text and its visible boundaries. The describer is<br />
also performing this conceptual end run while only focusing on a small window of the<br />
immediate experience, which is the world of the text.<br />
3.4.2 Describer as Intermediary<br />
The describer is in an intermediary position that is similar to both a sign language<br />
interpreter and an author. As an interpretive mediator, the describer is responsible for<br />
creating language that is to serve as an equivalent to visual information. This role, places<br />
the describer in a position of relative power similar to the role that Metzger found<br />
interpreters may take in “Sign Language Interpreting: Deconstructing the Myth of<br />
Neutrality” (Metzger, 1999). Furthermore, since the words of the describer, unlike the<br />
words of conversational SL interpreters, are final, the role of describer has some qualities of<br />
an author (Harris, 2000). While the goal of VAD practices do not call for the describer to<br />
take a substantive role in the communication process and to act as Pfanstiehl says “as a color<br />
video camera,” (1984), the fundamental nature of VAD places the describer in a non-<br />
transparent position that will affect the process through the choices they make.<br />
3.4.3 <strong>Description</strong> as a Group Process<br />
For many producers of description, creating the modified text is not a one-person<br />
task. It involves teams with formalized procedures, reference manuals, and working<br />
documents. While this study has not investigated the nature of description teams and their<br />
documents as illustrated in figure 5, it should follow that the process that gives birth to the<br />
modified text could have significant impacts upon the quality of the result.<br />
39
Both of the <strong>Audio</strong> <strong>Description</strong> organizations interviewed for this study use teams in<br />
the creation of their descriptive content, but the teams seem configured differently. In the<br />
case of RFB&D, each book is pre-read and which images are to be described as well as<br />
where the descriptions will be positioned is marked up for the readers assigned to it. The<br />
reader then only describes those pictures that have been pre-selected (Smith, 2002, Vollmer,<br />
2002). Once a RFB&D reader encounters a picture notation, he or she describes it in a<br />
seemingly individual way. For RFB&D the reader is also the author of the descriptions.<br />
Conversely, the established organizations doing recorded <strong>Audio</strong> <strong>Description</strong> organizations<br />
use a script that can be written and edited by people other than the person who is the voice<br />
of description.<br />
Figure 5 – Conceptual view of a the description process<br />
Text<br />
Producer<br />
Source<br />
Text<br />
Working<br />
Docs<br />
Support<br />
Team<br />
Describers<br />
Original Information<br />
Manuals<br />
& Style<br />
Guides<br />
<strong>Description</strong>s<br />
All of the organizations doing recorded <strong>Audio</strong> <strong>Description</strong> interviewed for this<br />
research use a prescreening process where some members of the description team listen to<br />
just the audio to identify critical points of comprehension failure. And, all of these<br />
organizations indicated that the script writing and editing are extensive processes that<br />
40<br />
Team-based<br />
<strong>Description</strong> Process<br />
Modified Text
include evaluations about the amount and the type of description that can be inserted. As a<br />
result, the <strong>Audio</strong> <strong>Description</strong> insertions rarely show markers of spontaneity such as<br />
nonfluencies or hesitations.<br />
3.5 Discussion: The Role of the Consumer<br />
The role of the consumer is a disadvantaged position in VAD. Where readers of<br />
books and recipients of SL interpretation each have a different communicative challenge,<br />
using a non-interactive text and requiring a communicative intermediary respectively, the<br />
visually impaired face both of these challenges with the described visual information. For<br />
the consumer, they do not experience both a source text and a modified text. The text they<br />
receive is the modified one and it is an amalgam of original material and insertions of<br />
description. Throughout the investigations leading to this report, those involved with<br />
providing descriptions across all media types consistently expressed an interest in the internal<br />
mental process of consumers, in how the descriptions they provide are effective or not in<br />
fulfilling the goals of description. In this way, VAD is similar to the other language systems<br />
described above because there is no consensus on universal principles for how language and<br />
thought actually work inside someone’s mind. There are, however, some indications of what<br />
type of activity listening to VAD could be similar to.<br />
3.5.1 The Consumers: Actively Building a Text Model<br />
Research indicates that the process of listening is very similar in terms of basic<br />
perception and comprehension to reading (Townsend, 1987). So, a person listening to a text<br />
might be engaging in the same or similar cognitive processes as would someone reading the<br />
same text. Reading is a process that involves both the decoding of the graphic, or tactile<br />
41
symbols for Braille (Knowlton, 1996), and cognitive processes of developing a set of mental<br />
representations called a “text model.” (Carpenter, 1986, Haberlandt, 1988). Because of the<br />
structural similarities to reading, it can be deduced that the descriptive insertions are<br />
providing information that contributes to the process of building a text model in the mind of<br />
listener.<br />
If the process of receiving a described text is similar to reading, then a number of<br />
factors will influence the text model that is built. The listener’s personal history and world<br />
knowledge (schemata), as well as their goals will all probably influence what type of a mental<br />
representation is built from the text they receive (Wilson, 1986). Since it is the consumer<br />
that is creating their own understanding as they experience the text, the descriptions should<br />
be viewed as cognitive tools (Vygotsky, 1934), rather than brush strokes to a painting painted<br />
by the describer, as pieces of information that are the building blocks of mental structures in<br />
the mind of the consumer as she actively develops an understanding of the text.<br />
Anecdotal<br />
evidence from those<br />
who practice <strong>Audio</strong><br />
<strong>Description</strong> supports<br />
the view that receiving<br />
description is mentally<br />
active. Frazier (1975)<br />
describes how as an<br />
Figure 6 - Conceptual diagram of consumer's process<br />
Modified Text<br />
Consumer’s<br />
Process<br />
Original<br />
Information<br />
<strong>Description</strong>s<br />
individual listening to a production assembles an understanding of the action from audio<br />
42<br />
Text<br />
Model<br />
Consumer<br />
Purposes<br />
& Goals<br />
Personal<br />
History<br />
World<br />
Knowledge
clues. A quarter century later, DVS customers in feedback sessions reported that when the<br />
appearance of a character is described long after a character is introduced to the audience,<br />
that the new descriptive information can clash with a “mental picture” that the listener has<br />
already created (Gould, 2002). Stovall, himself blind, and the founder of the Narrative<br />
Television Network, one of the two largest U.S. producers of <strong>Audio</strong> <strong>Description</strong>, described<br />
that the listener has a mental picture in his or her mind – it may not be exactly what the<br />
person or place looks like on the film, but it is sufficient for understanding the production<br />
(Stovall, 2002).<br />
3.5.2 The Purposes and Goals of Consumers<br />
Another important dimension in understanding the role of the consumer is<br />
motivation and purpose. Why do people listen to <strong>Audio</strong> Described productions or listen to<br />
textbooks? While these questions may seem elementary, it is important to recognize that this<br />
area is probably the most important part of the study of VAD and also suffers from the least<br />
real data. The formal studies of the consumption of VAD were in <strong>Audio</strong> <strong>Description</strong> and a<br />
small amount on visually impaired people using the Internet. There is little documentation<br />
on the larger practices that people listening to visual description are engaged in. And, the<br />
discussion below provides no more than a few selected examples that may adumbrate some<br />
of the larger issues that remain to be explored in individual motivations and uses for<br />
described material.<br />
As part of the research for this publication, members of an online community for<br />
<strong>Audio</strong> <strong>Description</strong> were polled on this topic and encouraged to provide insight into this<br />
process. Most of the responses received indicated that the service of <strong>Audio</strong> <strong>Description</strong> is<br />
43
essential to providing access to productions, but there was little specificity as to type of<br />
production or information that were of interest. This general view is supported by some<br />
publications from the American Foundation for the Blind that says that <strong>Audio</strong> <strong>Description</strong> is<br />
used both for the enriching, aesthetic experience of the content of the text and for cultural<br />
inclusion (AFB, 1991). <strong>Visual</strong>ly impaired individuals often report watching movies with<br />
sighted friends and family and that the understanding of culturally relevant texts, whether<br />
from individual or group viewing, is useful in social activities.<br />
One post to an <strong>Audio</strong> <strong>Description</strong> online community that predated the poll just<br />
mentioned indicated that facial expressions were specifically interesting to a congenitally<br />
blind listener who said “Even though <strong>Audio</strong> <strong>Description</strong> does not give me a concrete<br />
example of the various ways of smiling, it does provide me with very valuable information<br />
about what kind of expressions may be exchanged between people” (Miller, 2002). In<br />
another area, WGBH and the American Foundation for the Blind conducted a survey of<br />
audio description customers that indicated a strong interest in science programming (Kuhn,<br />
1992b).<br />
In the area of visual descriptions for textbooks, while it is logical that the consumers<br />
are visually impaired individuals using this material for educational purposes, the visually<br />
impaired now make up only 25% of the customers of RFB&D, with the remaining members<br />
having dyslexia and/or other learning disabilities (RFB&D, 2001b). It was also reported<br />
informally that the recorded RFB&D materials have been used in classes of reading<br />
challenged students who are neither blind nor dyslexic. Naturally, these uses are not related<br />
to VAD, but may impact the type of service provided to the visually impaired students.<br />
44
The Internet, being a broad publishing medium can by definition be used for a range<br />
of situations from purely informational to entertainment and education. Research from the<br />
American Foundation for the Blind indicates that much of the Internet is difficult to use for<br />
the visually impaired despite the accessible technology (Gerber, 2002a, Gerber, 2001, Slatin,<br />
2002). But, all indications are that the visually impaired attempt to use the Internet for<br />
similar reasons as the rest of the population: eCommerce, information, entertainment, etc.<br />
3.6 Cultural Issues with <strong>Description</strong>?<br />
Is there a cultural dimension to VAD? <strong>Communication</strong> between the sighted<br />
describers and visually impaired consumers raises interesting challenges regarding traditional<br />
definitions of culture. Culture is often viewed as a phenomenon that both transmits and is<br />
transmitted through language. And, culture can be defined as a phenomenon that operates<br />
on non-linguistic levels(Scollon, 2001b), some of which are influenced by vision.<br />
And while deaf communities have distinct cultural boundaries with linguistically<br />
perceptible features (Lucas, 1989, Valli, 2001), members of the blind and visually impaired<br />
communities, not having a separate language, might not be viewed as a separate culture.<br />
Further since the majority of blind individuals have had sight at one time and all presumably<br />
interact with sighted individuals daily, it is difficult to draw a cultural boundary around the<br />
consumer community. However, within the process of description, certain communication<br />
issues appear that are similar to the types of issues that appear when people of different<br />
cultures try to communicate. For example, if a describer uses language that encodes visual<br />
assumptions (eg: perspective, color, etc.) and the receiver of that description would not<br />
understand other associated/implied meanings, then miscommunication similar to cross-<br />
45
cultural miscommunication, although not according to traditional definitions of culture 8 ,<br />
might occur. Further, as Scollon and Scollon state: one culture does not actually<br />
communicate with another culture; individuals from different cultures do (Scollon, 2001b).<br />
And, when people communicate they do so in places and with purposes that influence the<br />
nature of the communication produced (Scollon, 2001a). Within VAD, the places that the<br />
describers and consumers participate in – the sites of engagement – are very different and, unlike<br />
in face-to-face communication, these sites of engagement are separated physically, and<br />
usually by time. While they are at the outer edge of this study’s focus, these types of<br />
questions are important to ask because, even if the communication span between describers<br />
and consumers cannot be classified as a cultural divide, there may be sufficient differences<br />
between the historical, locational, and perceptual orientations between these two groups to<br />
foster miscommunication similar to the culturally influenced communication failures and<br />
where cross-cultural sensitivities may be important.<br />
46
4. STUDY OF AUDIO DESCRIPTION<br />
The previous section provided a top-down conceptual framework called <strong>Visual</strong><br />
<strong>Assistive</strong> <strong>Discourse</strong> (VAD) with a discussion of specific types of roles and factors that might<br />
affect its success. This section provides a complementary bottom-up and data-driven<br />
analysis of one specific form of VAD called <strong>Audio</strong> <strong>Description</strong>. Of the different varieties of<br />
VAD, <strong>Audio</strong> <strong>Description</strong> (AD) presents the most practical one to study in this forum. Since<br />
its development as an active process in the early 1980s, AD has been practiced mostly with<br />
methods that stem from one source and that adhere to specific principles. The other<br />
established VAD practice, <strong>Audio</strong> Books, was also investigated as part of this study. But, for<br />
a variety of methodological and practical reasons, <strong>Audio</strong> Books was determined to be too<br />
large and have too many complicating issues to make it a good candidate for the detailed<br />
language analysis in the time frame for this study.<br />
This study looks at the stream of descriptive statements in <strong>Audio</strong> <strong>Description</strong> as an<br />
example of language use and aims to answer two questions: 1) what is the constituent<br />
structure in AD and 2) what types of information are provided and what patterns in<br />
representation exist within AD? Because this study is the first analysis of the language of<br />
AD (or VAD for that matter) the information presented will be broad and many important<br />
opportunities for more research will remain.<br />
4.1 The Study Corpus<br />
Within practice of <strong>Audio</strong> <strong>Description</strong>, there are a number of important sub-practices,<br />
each with its own specific challenges. The sub-practice chosen for this study is the<br />
47
description of films. The data for this study comes from four different video productions<br />
described by three different description organizations as shown in table 1.<br />
Table 1 - Study corpus material<br />
Source Text<br />
Producer<br />
Describer<br />
48<br />
<strong>Description</strong> Content<br />
Length<br />
Words Length<br />
A Star Is Born (1937) Selznick International Narrative TV Network 114 Min 6110 37 Min<br />
L.A. Story (1991) Artisan Entertainment WGBH /DVS 90 Min 4123 29 Min<br />
Gladiator (2000) DreamWorks SGK WGBH /DVS 148 Min 12337 86 Min<br />
Gift of Acadia (2000) National Parks<br />
4.2 Methodology<br />
Service<br />
The Washington Ear 14 Min 763 4 Min<br />
The techniques used to analyze this data are based in large part on spoken discourse<br />
analysis where the descriptive language is transcribed and then analyzed for structural and<br />
functional properties. Much of traditional spoken discourse analysis deals with interactional<br />
conditions. While the process studied is certainly not interactional, there are important<br />
reasons that these analysis techniques were used as the starting point for the analysis of AD.<br />
First, the messages the consumer receives are units of speech and so will display properties<br />
specific to speech. Second, the level of detail used in spoken discourse analysis, that focuses<br />
on words and utterances in larger contextualized units is a useful way to view units of AD.<br />
Third, since the creators and consumers of visual description speak the same language and<br />
are members of the same types of speech communities, this language use can be considered<br />
a form of spoken discourse, although a very special one.
This approach prioritizes the words of description and does not focus on the<br />
multimodal issues involved with movies. These multimodal properties are significant, but in<br />
the interests of space and for publishing concerns, they are subordinated in this analysis to<br />
the surface representations of the description and relevant dialog. Specific transcription<br />
conventions and technical issues related to the multimodal issues of these productions are<br />
discussed in Appendix B.<br />
4.3 The Structural Components of <strong>Audio</strong> <strong>Description</strong><br />
Some basic structural definitions are necessary to begin this study. These definitions<br />
have been derived from the analysis of this AD corpus and also other corpuses of textbooks<br />
and Internet sites that are not included in this publication with the hope that the terms and<br />
definitions would be generalizable.<br />
It would have been convenient and desirable to use the same structural definitions<br />
used in other areas of linguistic inquiry for VAD. And, in some ways, the language found in<br />
AD is similar to other language uses. Below the word, at the morphological and<br />
phonological (word parts and sound component) level, the constituents in AD are identical<br />
to other forms of spoken language. Above that level, however, at the level that can be<br />
considered the discourse, different types of structures clearly appear.<br />
This study proposes a discourse constituent structure based on four components:<br />
insertions, utterances, representations, and words. Table 2, below, provides some definitions for<br />
these components.<br />
49
Table 2 - Summary structural components of audio description<br />
Basic Structural Hierarchy<br />
Element Definition<br />
Insertion: A contiguous stretch of description (Analogous to paragraphs and<br />
turns in written and spoken discourse) uninterrupted by other<br />
significant audio content such as dialog.<br />
Utterance: A continuous stream of words (similar to a sentence) containing one<br />
or more representations separated by more than ½ second of time 9 .<br />
Often a gap of 1 second or more separated utterances.<br />
Representation: An interpreted component of an utterance that conveys information<br />
about the visual field. Representations having different properties,<br />
including a focus and a type.<br />
Word: The words used in <strong>Audio</strong> <strong>Description</strong>, presumably like other forms of<br />
4.3.1 Insertions<br />
VAD, are a subset of words used in normal spoken/written language.<br />
They can provide content or function.<br />
The largest contiguous unit of description is an insertion. This is what Frazier called<br />
a “bridge” and is almost always bounded by dialog. The term insertion was chosen because<br />
this is fundamentally what is being done in VAD: descriptions are inserted into another text.<br />
As examples below will show, this unit could not be considered a paragraph in the traditional<br />
sense because it does not always express a consistent unit of thought. There are no topic<br />
sentences or summaries, and the only cohesion devices within them are based on common<br />
pronominal reference (Halliday, 1985). Neither could it be called a turn or other structure<br />
50
often considered part of spoken discourse. Without digressing into the issues associated<br />
with coherence in spoken and written discourse (well beyond the scope of this study), what<br />
is evident from the transcripts of AD in this study is that insertions are essentially collections of<br />
utterances. There are no differences in structure or function between the first utterance in an<br />
insertion and the last or those in between. The utterances inside an insertion are essentially<br />
interchangeable. Insertions can be of variable length. This corpus contains 842 insertions.<br />
The shortest are less than one second in length and the longest is over five minutes. The<br />
mean duration of an insertion in this corpus is 11.09 seconds.<br />
Transcript 1 below shows these properties of an insertion. In this section taken<br />
from one movie, there are three insertions in (lines 2-5, 16-17, and 19-24). The only words<br />
in this production come from a narrator and a describer. The narrator is part of the original<br />
audio from the source text and a describer is speaking the AD insertions. The describer and<br />
narrator alternate in a structure that appears similar to turn taking. But, unlike the turn<br />
structure of a conversation, these two voices are not speaking to each other but each is<br />
speaking to the audience independently. The describer does not address any of the topics<br />
in the narration and the narration does not respond to any of the information that is in the<br />
description. Further, within the descriptions, each utterance reflects an independent<br />
thought. An analysis of the narrator’s words reveals an expository structure with<br />
elaboration, contrast, hypothetical constructions and other features of language used to<br />
convey a range of ideas, while an analysis of the describers words reveals a very different<br />
type of language use that will be the focus of later portions of this study.<br />
51
Transcript 1- From “The Gift of Acadia” 1:06<br />
1 Narrator: The gifts of Acadia and they are many…are simple<br />
2 Describer:We move over the still waters of Jordan pond toward the twin bare<br />
domes of the bubble mountains<br />
3 Bold letters swing out of the screen toward us the gift of Acadia<br />
4 We continue across Jordan pond toward south bubble mountain<br />
5 At otter point a huge wave crashes onto a square granite rock ..<br />
white spray flying<br />
6 Narrator: It is many ways a gift<br />
7 First a gift of NATURE .. crafted by the sea<br />
8 By 500 million years of sediment pressed into rock<br />
9 Rock rising then subsiding<br />
10 Glaciers overwhelming and scouring the tops of that rock .. until today<br />
some scoured rock tops are held by the sea called islands<br />
11 While some rise free as MOUNtains<br />
12 Mountains .. westerners laugh .. but in fact that up thrust of granite called<br />
Cadillac .. its wrinkled bald head gazing out at the sea from 1500 and 30<br />
feet up is the highest mountain on our nation’s east coast.<br />
13 But forget that .. because Acadia is not a place for superlatives<br />
14 On the contrary .. Acadia reminds a society sated with superlatives …<br />
highest biggest fastest richest … that there are other BETTER values<br />
15 The value of solitude .. and in solitude contemplation<br />
16 Describer: A young woman in blue shirt and shorts lies on her back on a rocky<br />
ridge overlooking the sea below<br />
17 She is reading a book<br />
18 Narrator: The value of diversity and in diversity harmony<br />
19 Describer: A small brown fawn looks at us twitching his left ear<br />
20 A black-headed loon drifts by<br />
21 A thin black dragonfly on a green leaf opens and closes its wings<br />
22 Two little orange-breasted baby robins wiggle their heads<br />
23 Under water two white-sided dolphins swim smoothly side by side<br />
24 On the quiet surface of the sea two black triangular dorsal fins<br />
emerge then curve back down under water<br />
25 Narrator: Acadia is a meeting ground<br />
52
This transcript shows that the describer’s language consists mostly of separate<br />
thoughts. There are only two places where one statement is worded depending on another.<br />
The first one occurs in line 4 that says “we continue” in reference to line 2 that says “we are<br />
flying.” The second is the use of the pronoun “she” in line 17 to refer the same woman<br />
shown in line 16.<br />
4.3.2 Utterances<br />
Once a descriptive insertion begins, the listener will encounter a series of one or<br />
more utterances. The term utterance was chosen because it is a unit of analysis that is<br />
relevant to the range of speech productions found in conversation (Schiffrin, 1987).<br />
Utterances can, but need not be, grammatical and were initially defined by Harris as “any<br />
stretch of talk by one person, before and after which there is silence on the part of that<br />
person” (Harris, 1951).<br />
Utterances in AD represent a set of representations about the visible field. Because<br />
they are independent structures and could usually be rearranged without becoming<br />
incoherent (in form not in terms of actions), they can be considered like a series of<br />
snapshots. Utterances can usually be arranged in any way to fill the time available in the<br />
insertion, and they can be as long or as short as the describer chooses them to be within the<br />
insertion space. But, as figure 7 shows, most are very short. Almost 60% of utterances are<br />
between one and two seconds in length and more than 30% are between three and four<br />
seconds in length. The effect for the listener is that these snapshots of visual information are<br />
produced as if by a strobe effect where the field is visible for a short period of time and then<br />
53
epresented as language and then visible again and represented again until dialog or<br />
meaningful audio from the source text takes over.<br />
Figure 7 - Length of utterances in corpus<br />
70.00%<br />
60.00%<br />
50.00%<br />
40.00%<br />
30.00%<br />
20.00%<br />
10.00%<br />
0.00%<br />
7.98%<br />
59.56%<br />
32.46%<br />
8.15%<br />
Much like spoken discourse, utterances are often grammatical, but need not always<br />
be because context often makes their meanings clear when they are not. Below are two<br />
examples from transcript 1 that are not grammatical, but meaningful.<br />
3 Bold letters swing out of the screen toward us the gift of Acadia<br />
5 At otter point a huge wave crashes onto a square granite rock<br />
white spray flying<br />
In line 3, the first part of the utterance describes that something is being read on the<br />
screen (a type of representation that will be covered below) with “Bold letters swing out of<br />
the screen towards us.” The describer then continues with the content of what was read<br />
with “the gift of Acadia.” If an introduction, for example “it reads, ” preceded the part that<br />
was read, the statement would become grammatical and it would also then consume a few<br />
more syllables. The second example from line 5 is similar. The first part describes a scene<br />
54<br />
1.56% 0.25%<br />
0-.99 1-2 3-4 5-6 7-8 10+
with action “At otter point a huge wave crashes onto a square granite rock” and the<br />
ungrammatical clause “white spray flying” is appended without any introduction. But, here<br />
also the meaning is clear that the white spray relates to the wave crashing that precedes it. In<br />
these ways, much as is spoken discourse, the elimination of words that are unnecessary for<br />
meaning to be conveyed can result in more efficient but technically ungrammatical forms.<br />
Furthermore, utterances can come in many patterns. They can contain a single visual<br />
feature or action or can include several pieces of information a sequence. For example line 2<br />
in transcript 2 below indicates a simple action: one character makes a gesture. In line 3<br />
however, there are two actions: 1) she takes the arm 2) they stroll away. These two actions<br />
are joined by the connective “and,” but the same effect could have been achieved with the<br />
use of “then.” The combination of actions need not be sequential. For example, line 9<br />
shows two actions that occur simultaneously. In this case, they are joined by “as,” but<br />
simultaneous action is also indicated with “while” preceding the first action.<br />
Transcript 2 - From "A Star is Born" 16:20<br />
1 Danny: You’re going to by me a drink come on<br />
2 Describer: He holds out his arm<br />
3 she takes it and they stroll away<br />
4 Later in a bar<br />
5 Danny: That’s right George there’s nothing like a little rum to take away that<br />
milk flavor<br />
6 Describer: The bartender pours two shots of rum into a glass of milk<br />
7 Later Esther and Danny are drinking the drinks<br />
8 She playfully punches him<br />
9 He punches back and catches her as she falls off her stool<br />
10 Danny: I beg your pardon<br />
11 Esther: Certainly<br />
55
4.3.3 Representations<br />
The previous discussion of utterances introduced the fact that an utterance can<br />
contain more than one unit of information. In the examples discussed above, the units of<br />
information were primarily actions presented in sequence or actions that occurred<br />
simultaneously; or in the case of line 9, two actions envelop a third. But even within an<br />
utterance that is action based, there can be other types of information. For example, line 20<br />
from transcript 1 reads:<br />
20 A black-headed loon drifts by<br />
This single action is of an object in motion. It also contains a description of a visual<br />
appearance that tells the listener that this object (a loon) has a black head. Assessing the<br />
complete sets of meanings any unit of language provides is much larger than the scope of<br />
this study. That discussion, includes semantics and lexical semantics that are the studies of<br />
meanings encoded in sentences and words (O'Grady, 2001) as well as pragmatics, the study<br />
of meanings received by the listener that are not contained within the semantic content<br />
(Levinson, 1983) and further understanding that is communicated at the discourse level.<br />
Because the language used in AD is so specialized and does not include vast amounts of<br />
structures found in the language forms that semantics and pragmatics often draw upon, a<br />
simplified classification of the different types of information is proposed in this study.<br />
While it departs somewhat from established linguistic meaning analysis, it is more<br />
appropriate to the restricted nature of AD. The term chosen for this classification of<br />
meaning is representation.<br />
The term representation rather than other linguistic constructs such as “phrase” or<br />
“clause” is used because, as the data presented below will show, identifiable units of meaning<br />
56
can come from a range of linguistic forms from words to sentences. The concept of<br />
representation presented here is the visual information that has been selected by the<br />
describer to be sent over the auditory channel to the consumer. Representations have both a<br />
focus and a type. The focus is the person place or thing that the representation is about and<br />
the type will fall into one of the following seven categories:<br />
1. Appearance: The external appearance of a person, place, or thing.<br />
2. Action: Something in motion or changing.<br />
3. Position: Location of description, location of characters.<br />
4. Reading: Written or understood information being literally read,<br />
summarized, or paraphrased.<br />
5. Indexical: Indicates who is speaking or what is making some sound.<br />
6. Viewpoint: Relates to text-level information and the viewer as viewer.<br />
7. State: Not always visible information, but known to the describer<br />
and conveyed in response to visual information.<br />
Some examples of these types of representations have already been shown in the first<br />
two transcripts. Below, each of these types of representations will be discussed using<br />
additional transcripts.<br />
Appearance<br />
Appearance is in some ways the antecedent of all of the other types of representation<br />
because all representations require an appearance of something in the source text order to be<br />
realized in the description. But the others, with the possible exception of actions, do not<br />
convey properties that are externally describable with accuracy.<br />
Appearance representations are the subset of description that provide information<br />
about the direct visual properties of something in the source text including luminance, color,<br />
57
size, and shape. Appearance is realized usually through adjectives and the nouns they modify<br />
as some examples from transcript 1 illustrate.<br />
From Transcript 1<br />
2 We move over the still waters of Jordan pond toward the twin bare<br />
domes of the bubble mountains<br />
3 Bold letters swing out of the screen toward us the gift of Acadia<br />
5 At otter point a huge wave crashes onto a square granite rock .. white<br />
spray flying<br />
16 A young woman in blue shirt and shorts lies on her back on a rocky<br />
ridge overlooking the sea below<br />
19 A small brown fawn looks at us twitching his left ear<br />
20 A black-headed loon drifts by<br />
21 A thin black dragonfly on a green leaf opens and closes its wings<br />
22 Two little orange-breasted baby robins wiggle their heads<br />
23 Under water two white-sided dolphins swim smoothly side by side<br />
24 On the quiet surface of the sea two black triangular dorsal fins emerge<br />
then curve back down under water<br />
External appearance can also be conveyed with prepositional attachment as line 16<br />
shows and also through adverbials. It follows that if a consumer is interested in visual<br />
information-- in what things look like normally or in certain situations -- that the<br />
information would be often be provided through appearance representations.<br />
Action<br />
Consistent with the examples from transcript 1, shown above, most utterances are<br />
based on some form of action. Actions can include gestures, movement, and activities. And they<br />
58
can act as the core representation that other representations are clustered around. Transcript<br />
3 contains a typical set of action-oriented sequences.<br />
Transcript 3 - From "Gladiator" 40:26<br />
1 Maximus: At least give me a clean death .. a soldier’s death<br />
2 Describer: One guard moves behind Maximus<br />
3 Then rests his sword point on the back of his neck<br />
4 Maximus bows his head as the guard raises the sword<br />
5 Maximus leaps up and buts the guard off balance then catches the<br />
blade and spears him in the throat<br />
6 Spinning he chases the second guard whose blade sticks in its<br />
scabbard<br />
7 Maximus: The frost .. sometimes it makes the blade stick<br />
8 Describer: With bound hands Maximus slices the sword across the guard's face<br />
9 Nearby two other praetorians sit on restless horses<br />
10 One gallops into the clearing, then twists in his saddle<br />
11 A sword flies at him end over end<br />
12 It buries itself in his back<br />
13 Maximus steps out from the trees glaring<br />
14 Maximus: Praetorian<br />
The two insertions in transcript 3 contain action in every utterance. This is a<br />
representative sample because it shows some of the different ways that action is presented.<br />
Line 2 shows action that relates the position of one person to another. Line 3 shows an<br />
action with an object (sword point) and location of the object. Line 4 is an example of<br />
simultaneous actions and line 5 contains a sequence of actions presented in a list form. Line<br />
6 represents one action (spinning) as part of another action (although it is likely that the<br />
meaning intended is two actions in sequence). Line 8 and 13 describe the manner of an<br />
59
action (with bound hands, glaring), and lines 11 and 12 indicate an action where the agent is<br />
inanimate.<br />
The scope of this study does not allow as full an analysis of the action<br />
representations as would be desirable. The types of meanings associated with different<br />
English verbs would be a good starting point for a more thorough analysis of the action in<br />
AD. But, while actions are represented with verbs, not all verbal forms are expected in AD.<br />
Further a data-driven analysis focusing only on action, the largest part of the AD content<br />
pie, would probably yield a further subset of action types that are relevant to this domain.<br />
Position<br />
Another type of representation that is often associated with actions identifies the<br />
positions or locations for information being described. Positional representations can act as<br />
action setters or scene-shifters as transcript 4 shows<br />
Transcript 4- From "Gladiator" 1:42:30<br />
1. Cassius: People of Rome … on the fourth day of Antioch .. we can celebrate<br />
.. the sixty-forth day of the games<br />
2 Describer: In the crowd Maximus’ servant Cicero looks around<br />
3 Cassius: In his majestic charity the Emperor has deigned this day to favor the<br />
people of Rome with an historical final match<br />
4 Returning to the Coliseum after five years in retirement .. Caesar is<br />
pleased to bring you the only undefeated champion in Roman<br />
History .. the legendary Tigris the Gaul<br />
5 Describer The crowd stands as four galloping horses draw a chariot into the<br />
arena<br />
6 Next to the driver a Gladiator salutes the crowd<br />
7 He wears leather straps across his stocky chest and a metal helmet<br />
shaped like a tiger's head<br />
8 On one of the underground ramps leading to an arena gate,<br />
Maximus swings a short sword<br />
60
9 Proximo: He knows too well how to manipulate the mob<br />
10 Maximus: Marcus Aurelius .. had a dream that was Rome Proximo<br />
Location information working as an action setter will relate characters to each other<br />
and setting as shown in lines 2, 5, and 6. Scene shifting occurs when a complex scene<br />
contains multiple perspectives that are alternatively presented to the audience. The scene<br />
shifts the viewpoint of the audience in but does not advance the action of the movie to a<br />
new scene. In some ways, it is similar to a flashback or dream sequence that allows for a<br />
suspension of action. Line 8 from transcript 4 above shows an example of this, as the main<br />
scene is in the coliseum before a gladiator match, but attention has shifted to a quiet spot<br />
below the arena.<br />
While all location information would seem important to viewers not accessing the<br />
visual component of the text, these scene-shifting descriptions would seem especially<br />
important to allowing the viewers who cannot see the change in context to be able to<br />
comprehend the action as a complex scene (typical of film climaxes) unfolds.<br />
Reading<br />
Reading occurs when some language or recognizable symbols come on the screen<br />
and are literally read “as is” by the describer. Line 3 from transcript 1 above is an example<br />
of information being read. Reading often comes at the beginnings and endings of movies<br />
when there are credits and titles. It also appears quite frequently throughout some movies in<br />
various forms. In transcript 5, line 4 from “Gladiator,” a set of words are introduced and<br />
read to indicate the location that the movie’s action is now in.<br />
61
Transcript 5 - From "Gladiator" 47:49<br />
1 Juba: Better now? Clean you see<br />
2 Describer: Maximus lowers his lolling head back onto the wagon<br />
3 Later the caravan approaches a congested desert town<br />
4 Words appear Zucchabar Roman Province<br />
5 A crude amphitheater dwarfs the surrounding red clay buildings<br />
6 Now in a busy open air tavern an older man with a tough leathered<br />
face sits by himself swaddled in robes his head wrapped in a black<br />
turban<br />
7 He takes careful sips from a small brass cup<br />
8 Trader: Proximo my old friend<br />
Transcripts 6 and 7 from “LA Story,” also show the describer reading signs that are<br />
part of the sets rather than just screen text.<br />
Transcript 6 - From "La Story" 3:38<br />
1 Describer he rides in a park with other stationary bikers<br />
2 a sign reads "stationary bike riding park ...no running”<br />
Transcript 7 - From "LA Story" 19:00<br />
1 large white-lettered signs reading "now" hang on the wall<br />
2. blue lights bathe the hip shoppers<br />
Transcript 8 below shows a case where a sign in the movie becomes like a character<br />
(lines 7, 10) and the reading of it is like the speaking of a character. This same transcript<br />
(lines 13-15) shows that a character is reading from language that is visible in the movie.<br />
A sighted viewer who could read English would certainly understand what was being<br />
communicated in this case, but it is unclear if the same would always be true for an AD<br />
consumer without some descriptive support.<br />
62
Transcript 8 - From "LA Story" 20:36<br />
1 Describer: The car's engine dies<br />
2 He glides off the road stopping behind a digital road sign which<br />
flashes freeway clear<br />
3 Harris climbs out of Trudy's Mercedes and lifts the hood while she<br />
stays seated in the car studying the tilt of her hat in the visor's mirror<br />
4 A wind shifts the leaves of a weeping willow behind them<br />
5 Suddenly the lighted sign goes black<br />
6 Noticing the darkness Harris slowly turns around<br />
7 The sign flashes "Hi ya"<br />
8 Harris frowns and returns his attention to the car engine<br />
9 Harris whips around as the light bulbs explode<br />
10 Then miraculously regroup to spell "I said Hi ya"<br />
11 bewildered, Harris points to himself eyebrows raised skeptically<br />
12 Harris: Hi<br />
13 ruok?<br />
14 don’t make me waste letters<br />
15 R .. U .. O .. K ?<br />
16 Oh.. Are you OK? Yeah I’m Fine<br />
17 Describer: The sign says "hug me"<br />
18 Harris: What?<br />
In a manner very similar to the way that the speech of person is<br />
reported/constructed in conversation (Tannen, 1989), the information that is being read is<br />
introduced through a verb of introduction, for example “read,” “reading,” “says,” “flashes,”<br />
“appear” as shown as underlined in transcripts 6-8 above.<br />
Indexical<br />
Indexical or deictic information is information whose meaning can only be<br />
determined from context (Levinson, 1983). In conversation, words such as “here” and<br />
63
“now ” provide meanings for conversants but understanding the meanings requires<br />
understanding the place and time that the conversation is situated in. In <strong>Audio</strong> <strong>Description</strong>,<br />
a few types of indexical representation were found. In line 5 of transcript 9, the describer<br />
indicates what object the character line 4 had just mentioned. In this case, in order to<br />
recover the meaning of this piece of description, the prior dialog is required.<br />
Transcript 9 - From "A Start is Born" 1:58<br />
1 Father: Well daughter how was the movie tonight?<br />
2 Esther: Lovely<br />
2 She takes off her coat<br />
3 Boy: Mush that’s what it was just a lot of mush .. there wasn’t anybody<br />
killed in the whole thing<br />
4 Father: Oh well then I’ll stick to these.. these don’t talk<br />
5 Describer: Looking at pictures<br />
6 Boy That bog cluck Norman Main was in the picture tonight<br />
Transcript 10 shows another form of indexing where the describer indicates who the<br />
next speaker is. In line 2, the name Quintus is said by the describer and from accessing the<br />
video portion of the source text, it is clear that this statement identifies a character as the<br />
speaker.<br />
Transcript 10 - From "Gladiator" 6:10<br />
1 Describer: Across the battlefield at the edge of the forest hundreds of barbarians<br />
wave their swords<br />
2 Quintus<br />
3 Quintus: Load the catapults<br />
4 Describer: On a hill through a light snow the elderly white bearded man watches<br />
the army prepare<br />
64
Viewpoint<br />
Viewpoint representations relate to what the viewer would perceive as affecting the<br />
entire visual field or text. These include scene changes/shifts, screen and special effects. Scene<br />
changes are commonly indicated with the marker “now” or “later” as discussed above.<br />
Transcript 13 is a kind of scene shifter because at this point in the movie, a number of<br />
different screen effects were appearing in succession and so “next” indicates a change, but in<br />
this case not necessarily a formal scene change.<br />
<strong>Description</strong>s of camera effects were fairly rare in this corpus, but transcripts 11-13<br />
above each show different ways that the viewer’s total perspective can be represented in<br />
description. Another approach is reflected in the beginning of transcript 1 above when the<br />
describer says, “we are moving” and “we continue.” A description that preceded the<br />
transcript began with “we are flying” which reflected what the camera effect was like. It<br />
should be noted that the only use of “we,” (interpreted to be inclusive of the listener and<br />
describer) occurrs at the beginnings or productions.<br />
Transcript 11- From "Gladiator" 1:15:29<br />
1 Describer Now in the palace a blurred face comes into focus<br />
Transcript 12- From "Gladiator" 12:00<br />
1. Describer Surrounded by flames hundreds of men battle in a blur of muted<br />
color<br />
Transcript 13 - From "LA Story" 1:50<br />
1 Describer: Next a montage of funky LA architecture<br />
65
State<br />
<strong>Description</strong> sometimes provide information that is not visually evident but is<br />
available through the describer’s knowledge of the text. Some of the ways this happens is<br />
through providing identity or naming, providing relational information about entities that are<br />
visible, providing internal states including emotions and intention, and specifying time.<br />
Transcript 10 shows the naming of places. While in the movie “Gladiator” locations<br />
can be named with screen text as shown in transcript 5 above, in transcript 14, the location<br />
“Imperial Rome” was not named by the movie producers in this way. This information was<br />
added by the describers. Also, the buildings were not named in the movie; the describer<br />
added this information as well.<br />
Transcript 14 - From "Gladiator" 58:42<br />
1 Describer: As they look at the stands that encircle them the arena seems to spin<br />
like a carousel blurring the cheering crowd<br />
2: Now imperial Rome stretches far below<br />
3: A flock of birds soars over the circus maximus and the coliseum<br />
Transcript 15 below shows both the naming of a character and the relationship of<br />
the character to another character in the same utterance. This type of naming appeared to<br />
occur more with minor characters than main ones. Transcript 15 also shows a common<br />
revelation of time shift with “Later” that indicates it is later in the script. Because movies<br />
can contain flashbacks, when a scene changes, viewers may not always know immediately<br />
that the scene has changed. The use of “later” identifies it as a change that is farther in time.<br />
Transcript 15 - From "LA Story" 7:50<br />
1 Describer: Later in his girlfriend Trudy's apartment<br />
66
Transcript 16 and 176 show example of a character’s internal state being evaluated<br />
and described. Transcript 16 is an example of the state baldly described, while transcript 17<br />
shows it embedded in an action.<br />
Transcript 16 - From "La Story" 90:38<br />
1. Describer: Slowly conditions clear is spelled over the screen<br />
2 Content Harris smiles<br />
3 In an aerial view, other digital road signs along the highways echo the<br />
same message<br />
Transcript 17- From "La Story" 25:36<br />
1. Describer: Now a deluge of mail shoots through the letter slot in Harris' front<br />
door<br />
2. from the kitchen he irritatedly kicks wastebasket underneath the<br />
opening where it catches the streaming mail<br />
A variation of the description of a character’s internal state is by having “appears..”<br />
preceding the evaluative phrase.<br />
4.3.4 Words<br />
The words used in AD are the same as the words used elsewhere, but a subset of the<br />
lexical items normally employed in these other language systems. Because AD only allows<br />
descriptions of what is immediately available in the visual field, large amounts of normal<br />
vocabulary should never be seen in AD, or in VAD for that matter unless being read. For<br />
example, there should be no negation, modals, conditionals, past or future tense, anything<br />
that is hypothetical, or any references outside of the text at the time it is being described.<br />
Further, because the audience can include members of different disability experiences and<br />
different backgrounds, a widely accessible vocabulary is used.<br />
67
Some words in AD seem to be taking on special roles. For example, because the<br />
reference time in AD is understood to always mean the current time of the Source Text,<br />
words such as “now” and “later” can serve new functions as scene changers. Also, words<br />
such as “as” and “while” are markers of parallel action.<br />
68
5. USING THE DEFINITIONS FOR ANALYSIS<br />
The previous section provided some structural and functional definitions of <strong>Audio</strong><br />
<strong>Description</strong> (AD). These definitions and the way they are organized should be viewed as an<br />
initial and proposed framework for understanding a stream of description. This section adds<br />
another dimension, opening new territory and also tying together aspects of the two<br />
previous sections when these structural and functional definitions are presented within the<br />
context of analysis. Naturally, the analysis of any language system is a large conceptual field<br />
and this study has already addressed two significant aspects of this type of communication,<br />
so this section is an abbreviation of what it might have been if the definitions of the<br />
conceptual framework of VAD and the definition of AD existed prior to this research. In<br />
addition to connecting concepts raised in the two previous sections, it also relates the<br />
analysis of this language system to analyses used with other discourse types. In essence, this<br />
section also connects AD (and VAD) to other systems of language use.<br />
5.1 Descriptive Mass<br />
As described in section 4, the insertion is largely a collection of independent<br />
utterances, and insertions are placed mostly where the texts allows them. Because of this<br />
role, they provide a window on the total impact of AD on a text. As table 1 above<br />
illustrated, the amount of time AD takes up in texts is significant, not less that 20 percent<br />
and in one case almost 60 percent of the total span of the text. Because insertions appear<br />
where dialog and other audio cues are not, these figures can conversely be seen as<br />
representing the dialogue free portions of these texts, as the negative space, the portions that<br />
allow for and may require description. The insertions can be viewed as a quantity of<br />
69
epresentations that are distributed where the text allows. This descriptive mass could then be<br />
analyzed according to types and patterns of representations to see the impacts on the<br />
production. If, for example, this descriptive mass does not contain any descriptions of<br />
scenery or facial expressions, consumers wanting this information will probably not get it.<br />
The descriptive mass can also be analyzed in terms of how it occupies the mind of<br />
the consumer compared to the dialog and other content. Frazier (1975) described periods in<br />
a performance with what he called “low audio” periods where dialog and other clues were<br />
lacking and where different types of character, setting, or continuity information (his terms)<br />
could be inserted. He described how the insertions would provide essential information,<br />
mostly at the beginnings of scenes. This study reveals a much more pervasive use of<br />
description than Frazier presented. While Frazier’s 90-minute production, “The<br />
Autobiography of Ms. Jane Pittman,” had 34 insertions or bridges, contemporary described<br />
productions have comparatively more as table 4 illustrates, and are distributed throughtout<br />
scenes.<br />
Table 3 - Comparison of description mass in different four texts<br />
Text Length Insertions Utterances Length<br />
70<br />
Amount<br />
Described<br />
A Star is Born (1937) 111 min 382 692 37 min 20%<br />
Jane Pittman (1976) 10<br />
109 min. 34 Unknown 27min. 25%<br />
LA Story (1991) 90 min. 156 451 29 min. 32%<br />
Gladiator (2000) 148 min. 269 1391 86 min. 58%<br />
Taking these feature films as data points, a trend towards less dialog/more<br />
description is evident, the newer films have a greater need for description and the role of the
describer is increasing historically. Using these films as representative samples, not only is<br />
the amount of description increasing, but the amount of description in each insertion is<br />
increasing as well. The 1937 film had more than half of its description in short insertions of<br />
1-5 utterances in length, while the 2000 production had almost half of its description mass<br />
allocated to sequences with more than 20 utterances and several of these insertions<br />
contained over 70 utterances.<br />
Table 4- Distribution of description mass by insertion length<br />
Number of utterances per insertion<br />
21+ 16-20 11-15 6-10 1-5<br />
Star is Born (1937) 0% 13% 4% 24% 60%<br />
LA Story (1991) 21% 5% 12% 21% 42%<br />
Gladiator (2000) 47% 4% 15% 14% 20%<br />
Figure 8 - Chart version of table 4 data<br />
Allocation<br />
of description mass<br />
utterances per insertion<br />
100%<br />
80%<br />
60%<br />
40%<br />
20%<br />
0%<br />
Star is Born<br />
(1937)<br />
71<br />
LA Story<br />
(1991)<br />
Gladiator<br />
(2000)<br />
21+ 16-20 11-15 6-10 1-5
5.2 The Textual Role of Insertion Content<br />
In addition to looking broadly and quantitatively at the descriptive mass, individual<br />
insertions can be analyzed qualitatively to look at the portion of a text and the roles in the<br />
text that the insertion plays. It is clear that long descriptive sequences occur in these texts.<br />
But, are they conveying information that is supplemental or essential to the text?<br />
In several parts of this corpus, it is clear that insertions do contain essential plot<br />
information that is conveyed without dialog. For example, in “Gladiator,” there is a scene<br />
(35:50) where the Emperor’s son murders his father by suffocation. The dialog and sound<br />
effects do not make clear that this has happened and when the dead man is discovered in the<br />
following scene, there is no indication that the son is responsible. In “LA Story,” there is a<br />
restaurant scene (15:23) where the main romantic characters in the film meet and exchange a<br />
number of non-verbal signals that are described in detail in the AD. Without the AD to fill<br />
in the gaps, it is quite possible that visually impaired viewers would not have access to this<br />
information that is critical and essential to understanding the plot until much later in the<br />
production, if at all.<br />
This aspect of the textual significance of insertions has been integrated into both<br />
NTN and DVS practices, as they will review a movie using only audio cues to determine<br />
areas where the text is not clear without description (Gould, 2002, Stovall, 2002).<br />
5.3 Sample Analysis: Persistent Entity Development 11<br />
Most of the content of AD is about people and things. Many of these entities will<br />
exist over extended parts of a text. How and when information is presented about these<br />
entities, when they get named and when they are referred to as new and given, is a<br />
72
potentially important aspect to modeling the consumer experience. By whatever term is<br />
chosen, whether a text model (Wilson, 1986) or information state that has been applied to<br />
conversation (Schiffrin, 1987, Schiffrin, 1994), there exists in the mind of the consumer a set<br />
of mental representations that reflect their understanding; and one of the main sources of<br />
these representations is the contents of the streams of description in AD. The streams of<br />
description and the order that information is presented then become potentially important<br />
topics of analysis. For example, the main character in “Gladiator” is referred to by five<br />
different terms in the first fifteen minutes of the production. Table 6 shows these different<br />
terms and the location (time) in the text they appear. This type of approach seems designed<br />
to reflect the revelation of information that the authors of the text intended sighted viewers<br />
to experience because it is only after fifteen minutes that this character is referred to by<br />
name in the text.<br />
Table 5 - Referring terms for main character of "Gladiator"<br />
Time Utterance<br />
2.32 Now a warrior lifts his head and blinks as if waking from a dream<br />
3.34 Now the scruffy warrior mounts the earthworks<br />
3.44 The scruffy faced warrior general smiles and nods to his men<br />
3.59 Under a heavy mist the general makes his way through hundreds of soldiers taking<br />
positions in the mud<br />
15.51 Maximus looks away his eyes searching the field<br />
This use of different terms for the same individual, called referring sequences, (Schiffrin,<br />
1994) is also found in spoken discourse. In spoken discourse, new entities that are being<br />
introduced into a conversation are often marked with a specific introduction such as “there<br />
73
is” (Schiffrin, 1994). In AD, however, new entities are not marked this way, they are usually<br />
introduced with “a x” as “a warrior “ from utterance from 2:32 above shows. Subsequently<br />
they can be referred to definitively with “the” as illustrated in utterance at 3.34. It follows<br />
that there may be similar patterns for things and places and that the ways that they are<br />
referred to repeatedly might provide a baseline for how referents can be referred to in <strong>Audio</strong><br />
<strong>Description</strong>.<br />
5.4 Sample Analysis: The Scene as Frame<br />
While the structures of insertions and utterances are significant to understanding<br />
AD, the productions that AD makes accessible (films, television, plays) are structured<br />
according to scenes and shots. Frazier had envisioned that description would be inserted at<br />
the beginning of scenes, but this study reveals a different pattern where insertions occur<br />
throughout the scene. AD insertions occur before and during scenes. And, scene changes<br />
are often contained within an insertion. Analyzing AD in terms of scenes is a potentially<br />
useful perspective because it is through scenes that the author intended their audience to<br />
perceive the text.<br />
An important concept in understanding what a scene is experientially is the concept<br />
of a frame of experience (Goffman, 1974, Tannen, 1993a) also (Bateson, 1972). Tannen<br />
connects Goffman’s frame of experience to the concept of schema that was described in<br />
section 3. For the sighted viewer of a production, much of the information associated with a<br />
scene or frame will likely come mostly with visual cues: they see the scene has changed and<br />
that different characters are present or not. For the consumer of AD, the cues need to be<br />
74
embedded in the description. The following example from transcript 2 shows a scene<br />
change that occurs at the end of an insertion as they sometimes, but not always, do:<br />
1. Describer: He holds out his arm<br />
2 she takes it and they stroll away<br />
3 ----> Later in a bar<br />
-------- New Scene/Frame<br />
4 Danny: That’s right George there’s nothing like a little rum to take away that<br />
milk flavor<br />
5 Describer: The bartender pours two shots of rum into a glass of milk<br />
6 Later Esther and Danny are drinking the drinks<br />
Similar to the restaurant example often used to illustrate schema theory (Schank,<br />
1977), the fact that the action has been shifted into a bar allows the reference to a bartender<br />
as an existing entity (“the bartender”) even though it is the first reference to him. As a<br />
consumer is told about a scene or frame change, it may be important to provide additional<br />
cues to support the conceptual transition the consumer should make including when new<br />
characters or objects become relevant. A challenge to consumers may be when scenes<br />
change in the middle of dialog and there is not the opportunity for an insertion to indicate<br />
the scene change.<br />
The sequence of information that is important is not only that which related to<br />
persistent entities as described above, but also to frames of experience as shown here.<br />
5.5 Sample Analysis: Utterance Patterns<br />
It may be of consequence to consumers how information is presented. Just as in<br />
spoken discourse, the utterance is a unit that is perceived by the listener as a coherent<br />
75
thought 12 . The utterances in AD and their form can be analyzed to create a picture of the<br />
manner that information is being presented. One such analysis that may be useful is to look<br />
at the patterns of representation structure. Table 6 shows short sections from movies by<br />
two different describers: “A Star is Born” by Narrative Television Network (NTN) and<br />
“Gladiator” by the Descriptive Video Service (DVS).<br />
Table 6- Comparison of description styles<br />
“A Star is Born” 14:46 “Gladiator” 132:54<br />
Danny gives Mr. Randall a confused glance<br />
and smiles at Ester<br />
Underground Maximus runs through a<br />
narrow passageway<br />
She steps closer and Danny takes a step back In the yard gladiators ram a praetorian with<br />
a table<br />
Danny and Mr. Randall nod at each other Arrows pierce Hagen in the back then the<br />
chest<br />
Esther turns and runs up the stairs in tears Two soldiers stab him<br />
Danny follows her He kneels blood dripping from his mouth<br />
then falls<br />
He stops her as she begins to enter her room In the passageway Maximus turns a corner<br />
Danny gives Mr. Randall a confused glance<br />
and smiles at Ester<br />
He stops and tosses aside his torch<br />
She steps closer and Danny takes a step back Directly in front of him a stone archway<br />
leads outside<br />
The structures of the utterances reveal different approaches to describing what are<br />
essentially sequences of actions. In the NTN description, each utterance begins with the<br />
subject (representational focus) and then the action is presented. The theme of each of these<br />
utterances is the person or group performing the action (Halliday, 1985), while the DVS<br />
76
selection shows a wider range of representational combinations. The DVS style has a more<br />
varied structure with the theme or focus often different from the actor. A question for<br />
consumers might be whether this variety of representation and utterance variation is helpful<br />
and interesting or confusing and distracting.<br />
5.6 Sample Analysis: Representational Combinations<br />
In the example above showing a change of scenes, two representations were used in<br />
combination, the state representation “later” and the location representation “in a bar,” to<br />
signal the new scene. Within this corpus, a location by itself rarely started a scene; it usually<br />
shifts focus to a different viewpoint of the same scene. Also, “later” by itself, rarely starts a<br />
scene. It indicates time has shifted. While a detailed analysis of these combinations is<br />
beyond the scope of this study, this is an area that deserves more attention because these<br />
combinations may be important for the development of standards that allow a few key<br />
words to operate as markers on a number of levels in AD as they can in conversation<br />
(Schiffrin, 1987).<br />
Representational combinations may also be important because they may reflect a<br />
type of filtering or implicit encoding on the part of the describer that may or may not be<br />
optimal for the consumer. For example, co-occurrences of location and appearance<br />
representations in the same utterance were extremely rare (less than .5%) in this corpus.<br />
Either seemed to exist more easily in the same representation as an action, but not together.<br />
Is this related to the concept of consciousness having a primary and peripheral foci (Chafe,<br />
1994) and that attention to location, reduces the attention to visual appearance? Or, is it<br />
possible that this is a result of the fact that the describers are able to see the visual image of<br />
77
the location as they are describing (something consumers can not do) and so the need to<br />
describe the appearance of a place is not evident to describers when descriptions are created?<br />
5.7 Analysis Challenges: Time, Reality, and Cultural Elements<br />
There are three other areas that will be briefly discussed as analytical challenges. All<br />
three are significant and cannot properly be addressed in the scope of this study. The first<br />
topic is time, and it affects AD in a number of ways. First, it can constrain the amount of<br />
space for description and the location in the text where the descriptions can be inserted. For<br />
some productions this a major issue for describers. Another time aspect comes within an<br />
insertion when certain audio cues (ex: a glass falls) that requires part of the insertion to<br />
reflect the short audio element further constraining the arrangement of utterances. The<br />
second of these is reality, which presents other challenges for describers because while<br />
everything that is described is supposed to be in a direct and non-evaluative manner, certain<br />
visual effects are intended by the producers to reflect an imaginary or dreamlike state.<br />
Language and visual presentations have different approaches to treating reality (Leuween,<br />
2002). Time and reality intersect with flash-backs and flash-forwards that suspend the time<br />
and reality that is active in the text and present another reality and time that is a like a special<br />
frame inserted into some other text frame. Third is the cultural significance of certain<br />
scenes. As many have discussed, images of various celebrities (Barthes, 1957) and culturally<br />
recognized images can be used as independent symbols and their appearance in a cultural<br />
production is often expected by the author of a text to have specific significance (Stephens,<br />
1998). All three of these aspects of AD – time, reality, and cultural significance -- are<br />
important factors relating to the decisions that describers must make.<br />
78
6. SUMMARY: A LANGUAGE SYSTEM<br />
This study began with the assertion that the practices of using language as a<br />
substitute for visual information in electronic textual settings is a specific type of language<br />
use. This study has provided evidence from four different perspectives and in each of these<br />
perspectives, this language system, <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD) and its variant <strong>Audio</strong><br />
<strong>Description</strong> (AD) have been shown to have characteristics in common with and distinct<br />
from other uses of language.<br />
VAD was explored initially in section two as a set of practices that exist for social<br />
reasons, to provide access to visually dependent electronic texts through assistive<br />
technology. These existing social practices act as the foundation for this language system<br />
because language is socially constituted phenomena (Gee, 1999, Halliday, 1978, Scollon, 2001b).<br />
The four practice areas that were discussed in this study are all new additions to human<br />
existence, far newer than the 100,000 years and 5,000 years that speech and writing are<br />
thought to have developed respectively. Even though these practices are performed by<br />
different types of organizations and based on different methods, they serve a common set of<br />
consumers and have the same goals: to provide access to information that without the use of<br />
language would be largely inaccessible; and to provide that information using language in a<br />
way that allows the recipient the maximum opportunity to use the text in a manner that the<br />
authors of the text intended.<br />
In section three, these practices are abstracted to present a definition of this language<br />
system including role types and textual components. The common communicative<br />
properties and participant structure including a describer, consumer, and source and<br />
79
modified texts were presented and compared to other language processes. Similar to spoken<br />
discourse, the product of VAD is usually received as speech. Similar to written<br />
communication, the communication process of VAD is, from author to consumer (through<br />
describer), unidirectional. Similar to interpreted sign language/speech events, VAD requires<br />
an intermediary to enable the process. But, unlike any of these other language systems, in<br />
VAD, the source information is not language, but visual information, which is a different<br />
phenomena, and the describer, rather than just converting visual information to language, is<br />
making decisions that are affected by the multimodal texts that the visual information sits in.<br />
The describer is making additions that are like informational prosthetics to create a new text<br />
that is accessible for the consumer who will process it to create in his own mind a<br />
representation influenced by both descriptive language and residual information including<br />
dialog and auditory cues (Frazier, 1975).<br />
Section four is data driven. It takes one form of VAD, <strong>Audio</strong> <strong>Description</strong> (AD) as<br />
the subject of a descriptive study. A corpus of four films with more than 150 minutes of<br />
AD were transcribed and used in a study that looked at the structural and functional<br />
components of the streams of description. This data provides clear evidence of a type of<br />
language that is different in form and function from much of written and spoken discourse.<br />
While the words in AD are drawn from a select group of the common language, they<br />
represent a restricted set. Because the words used in AD are restricted based on tense, type<br />
and modality -- reflecting only real rather than any unreal (irrealis) states -- the language<br />
system of AD can not be considered a simply a dialog or a register (Gregory, 1978). Further,<br />
AD includes different types of representations that are not relevant to written or spoken<br />
discourse. The world that the describer describes is focused on the visual field of a movie<br />
80
(video) screen and description can cover both the surface appearance of elements inside that<br />
screen and other textual information. In fact, the majority of the AD utterances in this<br />
corpus were about actions rather than appearance; they were mostly descriptions of what<br />
was happening rather than what things look like. This bias towards action is likely connected<br />
to the type of text (movies) used in the study. The descriptions also contained information<br />
that was read, information about the production, the changes of scenes and shifts in time<br />
and focus as well as names of and relationships between visual elements.<br />
In section 5, the data collected in the study of <strong>Audio</strong> <strong>Description</strong> was used to build<br />
on the structural and functional definitions that had been established earlier by using them as<br />
the object of sample quantitative and qualitative analysis. This section shows that when AD<br />
is viewed as linguistic data, it can be understood in terms familiar to spoken discourse<br />
analysis including referring sequences (Schiffrin, 1994) and frames of experience (Tannen,<br />
1993a, Tannen, 1993b). This linguistic data for AD can be used to compare different types<br />
of description to see evidence of different choices that describers make in their creation of a<br />
text. And, it showed how critical AD is to understanding movies, especially those produced<br />
recently that contain long periods with little or no dialog and where essential textual<br />
information is often contained in long sequences with no dialog at all.<br />
As this study concludes with providing the first formal definition of the language<br />
system <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> (VAD) based on actual practices, it leaves much work to<br />
be done. VAD is today a concept. It is born as an analytical construct in this thesis, and is<br />
constituted in principles that influence how visual descriptions are created including the<br />
recognition that a description is not an isolated phenomena that relates directly to visual<br />
information, but is actually situated in a multimodal text. The practices and language that form<br />
81
the basis for this study have been addressed with evidence from interviews, published<br />
reports, research into language use and analysis, and a corpus of the productions of language<br />
drawn from one of many representative sources. This study concludes that <strong>Visual</strong> <strong>Assistive</strong><br />
<strong>Discourse</strong> – using language as a surrogate for vision: to augment another text and make it<br />
accessible to the visually impaired -- is a language system that is distinct and can be<br />
described, measured, and taught.<br />
82
7. FUTURE STEPS<br />
This study has placed VAD in a linguistic framework and opened it up to the<br />
benefits of large bodies of linguistics research. The theoretical and empirical base developed<br />
in this study can now be used as a foundation to support further research of AD and VAD<br />
in ways more consistent with established language systems. Below are some discussions of<br />
research paths that are either indicated as a result of this study and/or more practical now<br />
that this study has been completed.<br />
7.1 Consumers Study<br />
Understanding the recipients of communication is an important part of<br />
understanding the effects of any communicative process. This is an aspect that is especially<br />
challenging for VAD because there is no typical visually impaired individual and because at<br />
the time a description is created there is no knowledge of who the specific consumers will be<br />
or how they might fit into the wide range of potential combinations of age, history, and<br />
disability factors. A number of studies have been undertaken into the usability of the<br />
Internet (Gerber, 2002a, Gerber, 2001, Gerber, 2002b), the television viewing habits of the<br />
visually impaired (Kuhn, 1992b), and the benefits of <strong>Audio</strong> <strong>Description</strong> (Lovering, 1993,<br />
Packer, 1996) that consider the perspectives of consumers of described media. Additional<br />
study that builds upon this work in ways to develop a fuller profile(s) of members of the<br />
visually impaired community as users of electronic media would be helpful for both<br />
researchers and practitioners. While there is no typical member of this consumer<br />
community, perhaps there are prototypical types that could be constructed .<br />
83
This study has not specifically supported this important research area other than<br />
reinforce the need for it. Discussions with researchers and advocacy groups during this<br />
study identified many practical challenges to pursuing this line of inquiry including such<br />
basic factors as the fact that many visually impaired individuals are not members of easily<br />
identifiable groups for contact and may not use or have access to technology that would<br />
assist them (Kaye, 2000). This type of research is envisioned as best performed in concert<br />
with advocacy groups and description providers, those organizations with existing contacts<br />
to the consumer community. The methods could be ethnographic (if practical) and could be<br />
supported by surveys, interviews, and linguistic analysis of feedback and comments. It is<br />
important to recognize that the consumers of VAD are not in an empowered position with<br />
respect to VAD, and may be reluctant to offer what would seem to be criticism of a service<br />
that is so clearly important.<br />
7.2 Supporting Further Developments of <strong>Audio</strong> <strong>Description</strong><br />
<strong>Audio</strong> <strong>Description</strong> is an evolving field and is growing in use. It is now practiced by<br />
more organizations in more countries than before. This study supports further investigation<br />
into AD by providing a baseline set of definitions. During the course of this study, certain<br />
questions about AD arose including:<br />
o Are there optimal structural patterns for AD insertions/utterances?<br />
o Does the preponderance of grammaticality in these structures limit the<br />
amount of information that can be transmitted?<br />
84
o Does the quantity of time that AD consumes, usually more than any one<br />
character, present issues in terms of ear fatigue, and would multiple voicing<br />
or different descriptive approaches be useful in optimizing long sequences?<br />
o Is the style of <strong>Audio</strong> <strong>Description</strong> used in movies (and perhaps other recorded<br />
media) weak in areas such as special effects that were not relevant to the<br />
performance roots of the method?<br />
o Are there techniques from one variant of AD that would help the others<br />
such as the creation of program notes as in theatrical AD?<br />
At the time of this study, there have been active discussions regarding standards and<br />
guidelines. While these discussions are the rights and responsibilities of the participants,<br />
especially the consumers, the structures developed in this study could be useful topics for<br />
facilitating these discussions. Another type of follow-on would be to study other forms of<br />
<strong>Audio</strong> <strong>Description</strong> such as television or live performances to add to refine the taxonomy of<br />
representations this study produced and to provide a basis for comparison. There is also<br />
certainly much work to be done in furthering the descriptive study begun in this report and a<br />
more thorough investigation of action representations, referential approaches, and<br />
experiential frames seem useful paths of inquiry and paths with relevant literature from other<br />
linguistics studies.<br />
7.3 Descriptive & Comparative Studies of Other Forms of VAD<br />
The practices of providing audio textbooks and developing accessible websites are<br />
parts of VAD that have significant investments in the description process. But, they have<br />
not been studied in depth and it is unlikely that most of the organizations providing these<br />
85
services would consider what they are primarily doing as describing visual content, but that<br />
rather, it is a component (maybe a lesser one) of their efforts. Not surprisingly, the initial<br />
research into the descriptions in these areas revealed a wide range of approaches, even<br />
within the same text.<br />
Descriptive studies of textbooks and the Internet would provide significant new<br />
information into the range of representations and approaches used in VAD and would<br />
complement the description of AD done in this research. These descriptive studies could be<br />
used for comparison of style and approach to the role of the describer and would certainly<br />
yield important insights.<br />
7.4 Human Subjects Studies with AD<br />
Throughout the course of the research for this study, human subject studies to<br />
measure the effectiveness of AD, were discussed. For a variety of reasons, not the least of<br />
which is that there was no baseline definition of AD that could be used to structure such a<br />
study; this type of research was not attempted. The definitions and analyses provided here<br />
can be used to structure these types of studies to compare styles of description, the<br />
descriptive language itself, and limits and tolerances of listeners when exposed to long<br />
segments of description. There are fundamental questions that almost all describers<br />
interviewed for this study expressed about how description works inside the mind of the<br />
consumer and how those effects could be maximized.<br />
7.5 Educational Materials Study<br />
Educational materials are an especially difficult issue for the blind and visually<br />
impaired. At the time of this study, there are a number of initiatives to address educational<br />
86
materials using digital technology and technology that should allow visual information as<br />
well as language to be transmitted. These efforts include a national initiative to exchange<br />
textbook information electronically (CAST, 2002). This technology, and the human<br />
practices that include creating tactile and other alternative representations for the visual<br />
information, should also be capable of supporting visual descriptions and the range of<br />
descriptions that exist in VAD. A study of the visual description issues associated with<br />
educational material, perhaps involving a subset of the consumer study mentioned above<br />
focusing on prototypical student types, would be an important contribution to the<br />
understanding of some of the challenges that exist in making accessible instructional<br />
materials usable to more students<br />
7.6 <strong>Assistive</strong> Technological Research<br />
Currently, a number of organizations are involved in the development of assistive<br />
technology. This technology, both hardware and software will create opportunities and may<br />
impose limits on VAD consumers in the future. While using language as a replacement for<br />
images is not the only way to provide some of this information to the visually impaired, it is<br />
the only solution for certain types of texts and can be cost a effective solution for many<br />
others. An investigation into the current research and standards efforts into future assistive<br />
technology would provide an opportunity to inform them with an understanding of the<br />
linguistic possibilities of VAD. If these efforts to develop the next wave of assistive<br />
technologies were aware of and understood the linguistic dimensions of the <strong>Visual</strong> <strong>Assistive</strong><br />
<strong>Discourse</strong>, perhaps those technologies could be designed to optimize the language<br />
experience of the consumer. After all of the efforts for regulation and service delivery, and<br />
87
after all of the technology used in attempts to remove barriers and provide an equivalence of<br />
experience, these assistive practices are still fundamentally processes of human<br />
communication.<br />
88
APPENDIX A: GLOSSARY<br />
Accessibility: Refers to whether an individual with a disability can even gain<br />
minimal access to information. Does not imply that the<br />
information is in a form that is meaningful or relevant but<br />
only not unavailable due to a barrier.<br />
<strong>Audio</strong> <strong>Description</strong>: Traditionally refers to human voice description for live events<br />
and performances, including television and movies and<br />
museum exhibits. Occasionally, this term is used to mean any<br />
description done through the human voice for visually<br />
impaired and blind individuals.<br />
Consumer: A person who receives the visual description. Also called a<br />
description consumer.<br />
Describer: The person or organization/group that is responsible for the<br />
modified text and the description.<br />
Described Video: Term used by Federal <strong>Communication</strong>s Commission and<br />
others to refer to a video product that has <strong>Audio</strong> <strong>Description</strong><br />
added to it.<br />
<strong>Description</strong> Process: The events including previewing, writing, editing, and<br />
narrating that creates the described text.<br />
Descriptive Mass: Applies to the total set of descriptive content as a group.<br />
Electronic Text <strong>Description</strong>s: <strong>Description</strong>s that go on Internet sites or other digital<br />
publications. Could be rendered in Braille or synthesized<br />
speech.<br />
Experiential Equivalence: Concept that the experience that a disabled person has, while<br />
not the same as, is equivalent in fundamental ways to the<br />
experience the non-disabled enjoy.<br />
89
Insertion: This is a set of one or more utterances that are read or heard<br />
as a continuous stream by the description consumer. In<br />
<strong>Audio</strong> <strong>Description</strong>, insertions are usually positioned between<br />
actor dialog and other significant audio.<br />
Modified Text: The modified text is the one with visual description, the one<br />
that a consumer would experience.<br />
NTN: Narrative Television Network, a major describer for<br />
broadcast television.<br />
Representation: A piece of information that can be inferred from a<br />
description segment. Can be of a type and have a focus.<br />
Representation Focus: The part of the visual field or element not in the visual field<br />
that the description relates to. Can be a person place or thing<br />
or the entire visual field.<br />
Representation Type: Category of information provided in the representation. For<br />
<strong>Audio</strong> <strong>Description</strong>, seven categories were found.<br />
Restricted <strong>Description</strong>: A description that is limited in size by the text and/or the<br />
technology of the text.<br />
Source Text: The material that is being described before the description is<br />
added. This is the video/film/book/play/event that the<br />
describer sees.<br />
Text: Text in this sense means a composed body of information<br />
such as a story, play, movie, or textbook.<br />
Text Model: A term used in reading research to describe the mental image<br />
of information (propositions and concepts) contained in<br />
something that is being read, or in the case of this study<br />
listened to.<br />
90
Textually Situated: Means that the visual elements, and the descriptions for<br />
them, are not independent entities without context. The<br />
context is the text, such as a movie or an Internet site, that<br />
they are placed in, and properties of the text and how they are<br />
placed will affect their meaning.<br />
Transrepresentation: The creation of one modal artifact to represent another as in<br />
words being used for visual information.<br />
Unrestricted <strong>Description</strong>: A description that can be of any length.<br />
Utterance: A unit of description that is provided as one continuous<br />
speech unit; similar to an utterance in spoken discourse.<br />
Usually is grammatical, but might not be. Can contain one or<br />
more representations.<br />
<strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong>: A term introduced in this study to mean the process of<br />
providing visual information through language across the<br />
various text types that the practice is employed in.<br />
<strong>Visual</strong> <strong>Description</strong>: An informal term to indicate either the practice of <strong>Visual</strong><br />
<strong>Assistive</strong> <strong>Discourse</strong> or an instance of description.<br />
<strong>Visual</strong> <strong>Description</strong> Practice: A term used in this study to denote the real-life practices that<br />
are currently based on a type of text or an organization.<br />
Internet textual equivalents for section 508 and <strong>Audio</strong><br />
<strong>Description</strong> are to examples of visual description practices.<br />
91
APPENDIX B: TRANSCRIPTION & MULTIMODAL ISSUES<br />
Transcription Conventions<br />
The transcription conventions used in this investigation are based upon principles<br />
and approaches used in spoken discourse analysis (Tannen, 1989). In transcripts, the<br />
following notation is used:<br />
.. Perceptible pause of less than ½ second<br />
… Perceptible pause of ½ second or more<br />
CAPS Indicates emphatic stress<br />
[ ] Around overlapping speech (extremely rare)<br />
� Arrow indicates significant points<br />
/ / Slashes indicates uncertain transcription (extremely rare)<br />
Multimodal Issues<br />
Many discourses, including face-to-face conversation, operate with concurrent<br />
communication modalities. The non-language parallel properties can include gestures,<br />
Indexical references, and environmental information. In the texts that are used with visual<br />
assistive discourse, these parallel properties are often extremely rich in information that is<br />
key to understanding the nature of the description and how the descriptive statements<br />
interact with the regular source text statements. One option for dealing with the multimodal<br />
issues would be to publish this document with rich media so that the source text could be<br />
presented in the same forum as the description and analysis. The option chosen for this<br />
study takes a more economical approach. Since all of the material used for this study is<br />
92
published, all of the transcriptions will be connected to the published work by reference to<br />
the text and time position of the sequence transcribed.<br />
Sub second timing<br />
The technology used to record and playback the films used in this study did not<br />
show times below a second. As a result, the times reported in transcriptions were reported<br />
with the time showed, which causes some loss of precision. With a corpus over 150 minutes<br />
of transcribed content, any cumulative loss of precision with the transcription of individual<br />
utterances is expected to be negligible.<br />
93
APPENDIX C: VERBAL DESCRIPTIONS FOR FIGURES<br />
Below is a table that has two columns. The first column is the name of the figure<br />
and the second column contains a descriptive text.<br />
Figure 1 –<br />
Different<br />
Practices of<br />
<strong>Visual</strong><br />
<strong>Description</strong><br />
in 2002<br />
Figure 2 -<br />
Conceptual<br />
View of<br />
Internet<br />
<strong>Description</strong>s<br />
Figure 3 -<br />
Overview of<br />
VAD and<br />
other<br />
prototypical<br />
communicati<br />
on processes<br />
This diagram has four circles/ovals to denote the four description areas of<br />
<strong>Audio</strong> <strong>Description</strong>, <strong>Audio</strong> Books, educational software/rich media, and<br />
Internet sites. <strong>Audio</strong> <strong>Description</strong> is larger than the others and has an oval<br />
where the others have circle. Inside the <strong>Audio</strong> <strong>Description</strong> oval are three<br />
circles titled Live <strong>Description</strong> 1981, Described Video Early 1970’s, and<br />
<strong>Audio</strong> Tours 1980s. The three circles to the right of the <strong>Audio</strong><br />
<strong>Description</strong> oval are titled <strong>Audio</strong> Books 1948/1971, Software &<br />
Interactive Media Late 1990s, and The Internet & MultiMedia Late 199s.<br />
This diagram is a set of nested ovals with no overlaps. The outer oval is<br />
titled Internet. Inside this are two ovals titled Multimedia Documents and<br />
Live (Simulcast) Events. The Multimedia oval is further broken down into<br />
four sub ovals titled Text and Hyper Text, Still Images, Moving Images,<br />
and Interactive Elements.<br />
This diagram contains four sub diagrams each with its own label.<br />
The first sub-diagram is labeled Face-to-face conversation and it has two<br />
circles both labeled conversant with a box in between labeled coconstructed<br />
text. Two-directional arrows connect each conversant to the<br />
text.<br />
The second sub-diagram is labeled Interactional Sign Language<br />
Interpretation and this features three circles labeled deaf conversant,<br />
hearing conversant, and interpreter. A box labeled hearing text is in<br />
between the hearing conversant and the interpreter and it is connected<br />
with bi-directional arrows to both. A box labeled signed text connects the<br />
deaf conversant and interpreter and is connected to both with bidirectional<br />
arrows. A third is in between all three and is labeled visual text<br />
and is connected to the interpreter, deaf conversant, and hearing<br />
conversant with bi-directional arrows.<br />
The third sub-diagram is labeled written communication and contains two<br />
circles labeled author and reader. In between is a box labeled composed<br />
text and one-way arrows go from the author to the text and then from the<br />
text to the reader.<br />
The fourth sub-diagram is labeled <strong>Visual</strong> <strong>Assistive</strong> <strong>Discourse</strong> and contains<br />
three circles labeled author, describer, and consumer. A box labeled source<br />
text is connected with a one-way arrow from the author. The source text<br />
has two arrows leading from it. One goes to the describer and one goes to<br />
another box labeled modified text. From the describer is an arrow and a<br />
94
Figure 4 -<br />
Chafe's View<br />
of Immediate<br />
Mode<br />
Figure 5 -<br />
View of a the<br />
description<br />
process<br />
Figure 6 -<br />
Conceptual<br />
diagram of<br />
consumer's<br />
process<br />
Figure 7 -<br />
Length of<br />
utterances in<br />
corpus<br />
box labeled insertions that is connected to the modified text by a one-way<br />
arrow into the modified text. A one-way arrow goes from the modified<br />
text to the consumer.<br />
This diagram has three boxed. A small one at the top is labeled<br />
environment. From it an arrow leads to a large box labeled<br />
EXTROVERTED CONSCIOUSNESS. Inside this box are two labels:<br />
represented and representing. The arrow has labels for perceiving, acting,<br />
and evaluating next to it. From the large box is an arrow to a small box<br />
labeled language. Next to this arrow is the label speaking. At the bottom<br />
is the label “speaking in the immediate mode.”<br />
This diagram is an expansion of the part of figure 3 dealing with the<br />
describer and is labeled team-based <strong>Description</strong> Process. It has three solid<br />
circles labeled Text Producer, Support Team, and Describers, and four<br />
boxes labeled Source Text, Working Documents, Manuals & Style Guides,<br />
and Modified Text. There is a large dotted circle that enclosed the<br />
Describers, Support Team, Working Documents, and Manuals & Style<br />
Guides boxes. The Source Text box is half inside and half outside the<br />
dotted line.<br />
A one-way arrow goes from Text producer to Source Text and a two-way<br />
arrow goes from text Producer to Support Team. A One-way arrow goes<br />
from Support Team to Working Docs and then to Describers. A one-way<br />
arrow goes from Source Text to Modified Text another to and to<br />
Describers. The describer has arrow coming from Source Text, Working<br />
Docs, and Manuals & Style Guides, and a double arrow coming<br />
connecting to support team. A one-way arrow goes from Describers to<br />
Modified Text.<br />
This diagram is conceptually similar to figure 5, but it focuses on the<br />
consumer.<br />
A circle labeled consumer is surrounded by four boxes labeled Text Model,<br />
Personal History, World Knowledge, and Purpose and Goals that sit<br />
across a dotted outer circle. Outside this outer circle is a box labeled<br />
Modified Text. An arrow comes in from the modified text to the<br />
Consumer. Arrows also come in to consumer from Personal History,<br />
World Knowledge, and Purpose and Goals and a double arrow connects<br />
Consumer to Text Model.<br />
This bar chart contains the following data:<br />
Length of utterances in seconds<br />
Duration 0-.99 1-2 3-4 5-6 7-8<br />
Percentage 7.98% 59.56% 32.46% 8.15% 1.56% 0.25%<br />
95
Figure 8 -<br />
Chart version<br />
of table 3 data<br />
Percentage 7.98% 59.56% 32.46% 8.15% 1.56% 0.25%<br />
Number 195 1455 793 199 38 6<br />
This bar chart contains the following data:<br />
Number of Utterances in Insertion<br />
1-5 6-10 11-15 16-20 21+<br />
Star is Born (1937) 60% 24% 4% 13% 0%<br />
LA Story (1991) 42% 21% 12% 5% 21%<br />
Gladiator (2001) 20% 14% 15% 4% 47%<br />
(Campbell Et.Al, 1934) (Martin/DVS, 1991) (Franzoni, 2000) (NPS/Ear,<br />
2000)<br />
96
REFERENCES<br />
Access Board, United States Government Architectural and Transportation Barriers<br />
Compliance Board. 2001. Electronic and Information Technology Accessibility<br />
Standards. In Section 508 of the Rehabilitation Act Amendments of 1998. Washington DC:<br />
Architectural and Transportation Barriers Compliance Board.<br />
AFB. 1991. A Picture is Worth a Thousand Words For Blind and <strong>Visual</strong>ly Impaired Persons<br />
Too: An Introduction to <strong>Audio</strong>description. New York: American Foundation for the<br />
Blind.<br />
AFB. 2000. Education: An Overview. New York: American Foundation for the Blind.<br />
AFB. 2001a. Quick Facts and Figures on Blindness and Low Vision. New York: American<br />
Foundation for the Blind.<br />
AFB. 2001b. Statistics for Professionals: American Foundation for the Blind.<br />
Alonzo, Adam. 2001. A Picture is Worth 300 Words: Writing <strong>Visual</strong> <strong>Description</strong>s for an Art<br />
Museum Web Site. Paper presented at Center On Disabilities: Technology And Persons<br />
With Disabilities Conference 2001, Northridge.<br />
Artic Technologies, Inc. 2002. What is a Speech Friendly Site?: Artic.<br />
ASTC. 2001. Best Practices<br />
<strong>Audio</strong> <strong>Description</strong>: Association of Science-Technology Centers Incorporated.<br />
Baquis, David. 2002. Meetings and email conversations, with Phil Piety. Washington, DC.<br />
Barthes, Roland. 1957. Garbo's Face.<br />
Bateson, Gregory. 1972. Steps to an Ecology of Mind. Chicago: University of Chicago Press.<br />
Board, Access. 2001a. Web-based Intranet and Internet Information and Applications<br />
(1194.22): United States Access Board.<br />
Board, United States Government Architectural and Transportation Barriers Compliance.<br />
2001b. Electronic and Information Technology Accessibility Standards. In Section<br />
508 of the Rehabilitation Act Amendments of 1998. Washington DC: Architectural and<br />
Transportation Barriers Compliance Board.<br />
Burnham, Betsy. 2002. Email regarding APH descriptions for visual information, with Philip<br />
Piety. Washington, DC.<br />
97
Campbell Et.Al, Narrative TV Network. 1934. A Star is Born, Described by Narrative TV<br />
Network, ed. William A. Wellman.<br />
Carpenter, Patricia A; Marcel Adam Just. 1986. Cognitive Processes in Reading. In Reading<br />
Comprehension: From Theory to Practice, ed. J Orasanu. Hillsdale, NJ: Lawrence Erlbaum.<br />
CAST. 2002. The National File Format Initiative at NCAC: Center for Applied Special<br />
technology.<br />
Chafe, Wallace. 1994. <strong>Discourse</strong>, Consciousness, and Time. Chicago: University of Chicago Press.<br />
Corn, Anne L.; Wall, Robert S. 2002. Access to Multimedia Presentations for Students with<br />
<strong>Visual</strong> Impairments. Journal of <strong>Visual</strong> Impairment and Blindness 96:197.<br />
Dwyer, Francis M. 1978. Strategies for Improved <strong>Visual</strong> Learning: A Handbook for the Effective<br />
Design, and Use of <strong>Visual</strong>ized Materials. State College, Pennsylvania: Learning Services.<br />
Elbers, Loekie; Loon-Vervoorn, Anita van. 1999. Lexical Relationships in Children Who Are<br />
Blind. Journal of <strong>Visual</strong> Impairment and Blindness 93:419.<br />
Franzoni, David et al/DVS. 2000. Gladiator, ed. Ridley Scott.<br />
Frazier, Gregory MA. 1975. The Autobiography of Miss Jane Pitman: An all-audio<br />
adaptation of the teleplay for the blind and visually handicapped, Film and<br />
<strong>Communication</strong>, San Francisco State University: Masters.<br />
Gamma, Erich and Richard Helm, Ralph Johnson, John Vlissedes. 1995. Design Patterns:<br />
Elements of Reusable Object-Oriented Software. New York: Addison-Wesley Longman, Inc.<br />
Gee, James Paul. 1999. An Introduction to <strong>Discourse</strong> Analysis Theory and Method. New York:<br />
Routledge.<br />
Gerber, Elaine Ph.D. 2002a. Surfing by Ear: Usability Concerns of Computer Users Who<br />
Are Blind or <strong>Visual</strong>ly Impaired. In Access World.<br />
Gerber, Elaine and Connie Kirchner. 2001. Who's Surfing? Internet Access and Computer<br />
Use by <strong>Visual</strong>ly Impaired Youths and Adults. New York City: American Federation<br />
of the Blind.<br />
Gerber, Elaine Ph.D. 2002b. Conducting Usability Testing With Computer Users Who Are<br />
Blind or <strong>Visual</strong>ly Impaired. Paper presented at 17th Annual International Conference of<br />
California State University Northridge (CSUN) "Technology and Persons with Disabilities",<br />
March 18-23, 200, New York.<br />
Goffman, Erving. 1963. Behavior in Public Places: Notes on Social Organization of Gatherings. New<br />
York: The Free Press.<br />
98
Goffman, Erving. 1974. Frame Analysis: An Essay on the Organization of Experience. Cambridge,<br />
Massachusetts: Harvard University Press.<br />
Goldberg, Larry. 2002. Email communication, with Phil Piety.<br />
Gould, Bryan. 2002. Conversation at WGBH, DVS, with Phil Piety. Boston, MA.<br />
Gregory, Michael & Susanne Carroll. 1978. Language and Situation: Language and Society.<br />
Boston: Routledge & Kegan Paul.<br />
Haberlandt, Karl. 1988. Component Processes in Reading Comprehension. In Reading<br />
Research: Advances in Theory and Practice, ed. M. Daneman. San Diego: Academic Press.<br />
Halliday, M.A.K. 1978. Language as a social semiotic. Baltimore, Maryland: University Park<br />
Press.<br />
Halliday, M.A.K. 1985. An Introduction to Functional Grammar. New York: Arnold.<br />
Hardy, Steven Thomas. 2000. Vygotsky's Contributions to Mentally Healthy Deaf Adults.<br />
Washington, DC: Gallaudet University.<br />
Harris, Helen. 2000. Reply Comments of Helen Harris, ed. Federal <strong>Communication</strong>s<br />
Commission. Washington DC.<br />
Harris, Zelig. 1951. Methods in Structural Linguistics. Chicago: University of Chicago Press.<br />
Hatim, Basil. 1997. <strong>Communication</strong> Across Cultures: Translation Theory and Contrastive Text<br />
Linguistics: Exeter Linguistics Studies. Exeter.<br />
Holsánová, Jana. 2001. Picture Viewing and Picture <strong>Description</strong>: Two Windows to the Mind. Lund,<br />
Sweden: Lund University Cognitive Science.<br />
Iedema, Rick. 2003. Multimodality, resmiotization: extending the analysis of discourse as<br />
multi-semiotic practice. <strong>Visual</strong> <strong>Communication</strong> 2:29-57.<br />
Kaye, H. Stephen. 2000. Disability and the Digital Divide. Washington, DC: U.S.<br />
Department of Education.<br />
Kerscher, George. 2001a. Converging Standards in Electronic Books: The Daisy<br />
Consortium.<br />
Kerscher, George. 2001b. Theory Behind the DTBook DTD: The Daisy Consortium.<br />
Knowlton, Marie and Robin Wetzel. 1996. Braille Reading Rates as a Function of Reading<br />
Task. Journal of <strong>Visual</strong> Impairment and Blindness.<br />
99
Kress, Gunther & Theo Van Leeuwen. 2001. Multimodal <strong>Discourse</strong>: The Modes and Media of<br />
Contemporary <strong>Communication</strong>. New York: Arnold/Oxford University Press.<br />
Kress, Gunther, Theo van Leeuwen. 1996. Reading Images; The Grammar of <strong>Visual</strong> Design:<br />
Routledge.<br />
Kuhn, David. 1992a. The Use of Descriptive Video in Science Programming. Boston:<br />
WGBH Educational Foundation.<br />
Kuhn, David; Corinne Kirchner. 1992b. Viewing Habits and Interests in Science<br />
Programming of the Blind and <strong>Visual</strong>ly Impaired Television Audience. New<br />
York/Boston: American Foundation for the Blind, WGBH Educational Foundation.<br />
Lemke, Jay. 2002. Travels in Hypermodality. <strong>Visual</strong> <strong>Communication</strong> 1:299-325.<br />
Lessig, Lawrence. 1999. CODE and Other Laws of Cyberspace. New York: Basic Books.<br />
Leuween, Theo Van. 2002. Ten reasons why linguists should pay attention to visual<br />
communication. Paper presented at Georgetown University Roundtable, Georgetown<br />
University.<br />
Levie, W. Howard and Richard Lentz. 1982. Effects of Text Illustrations: A Review of<br />
Research. Educational <strong>Communication</strong> and Technology Journal 30.<br />
Levine, Barry. 2002. Digest Number 267: <strong>Audio</strong> <strong>Description</strong> International.<br />
Levinson, Stephen. 1983. Pragmatics: Cambridge Textbooks in Linguistics. Cambridge,<br />
England: Cambridge University Press.<br />
Lovering, Sharon. 1993. Video <strong>Description</strong> Brings Enjoyment to All. In The Braille Forum:<br />
American Council of the Blind.<br />
Lucas, Ceil. 1989. The Sociolinguistics of the Deaf Community: Academic Press, Inc.<br />
Martin/DVS, Steve. 1991. LA Story, ed. Mick Jackson: WGBH Descriptive Video Service.<br />
Metzger, Melanie. 1999. Sign Language Interpreting: Deconstructing the Myth of Neutrality.<br />
Washington, DC: Gallaudet University Press.<br />
Miller, Lori. 2002. Digest Number 267: <strong>Audio</strong> <strong>Description</strong> International.<br />
NCAM. 2002. Access to Rich Media: WGBH.<br />
NFB. 2000. Blindness Statistics: National Federation of the Blind.<br />
100
Norris, Sigrid. 2002. Multimodal <strong>Discourse</strong> Analysis: A Conceptual Framework. Paper<br />
presented at Georgetown University Round Table on Language and Linguistics, Georgetown<br />
University.<br />
NPS/Ear, National Park Service/Metropolitan Washington Ear. 2000. The Gift of Acadia:<br />
National Park Service.<br />
O'Grady, William and John Archibald, Mark Aronoff, Janie Rees-Miller. 2001. Contemporary<br />
Linguistics. New York: Bedford/St. Martins.<br />
Packer, Jaclyn PdD. 1996. Video <strong>Description</strong> in North America. In New Technologies in the<br />
Education of the <strong>Visual</strong>ly Handicapped, ed. Dominique Berger: John Libbey Eurotext.<br />
Packer, Jaclyn Ph.D. & Corinne Kirchner, Ph.D. 1997a. Who's Watching? A Profile of the<br />
Blind and <strong>Visual</strong>ly Impaired Audience for Television and Video. Journal of <strong>Visual</strong><br />
Impairment and Blindness.<br />
Packer, Jaclyn PhD & Barbara Gutierrez MA, Corrine Kirchner PhD. 1997b. Origins,<br />
Organizations, and Issues in Video <strong>Description</strong>: Results from In-depth Interviews<br />
with Major Players. New York: American Foundation for the Blind.<br />
Perfetti, Charles A. 1988. Verbal Efficiency in Reading Ability. In Reading Research: Advances in<br />
Theory and Practice, ed. M. Daneman. San Diego: Academic Press.<br />
Pfanstiehl, Cody. 2002a. Email <strong>Communication</strong>, with Phil Piety.<br />
Pfanstiehl, Margaret. 2002b. Founder, Washington Metropolitan Ear, with Phil Piety. Silver<br />
Spring, MD 20901.<br />
Pfanstiehl, Margaret R. Ed.D, and Cody. 1984. Unpublished Training Materials. Ms. Silver<br />
Spring, MD 20901.<br />
Pfanstiehl, Margaret R. EdD. 2002c. Discussions Regarding <strong>Audio</strong> <strong>Description</strong>, with Phil<br />
Piety.<br />
Phansteihl, Margaret. 2002. Discussions Regarding <strong>Audio</strong> <strong>Description</strong>, with Phil Piety.<br />
Piety, Philip. 2001. Thamus and Theuth are Dead: The impacts of digital communications on<br />
types of communication (Unpublished research paper), 31. Washington DC:<br />
Georgetown University.<br />
Raman, TV. 1994. <strong>Audio</strong> System for Technical Readings, Computer Science, Cornell<br />
University: PhD.<br />
RFB&D. 2001a. Annual Report. Princeton, New Jersey.<br />
101
RFB&D. 2001b. Recording for the Blind & Dyslexic Annual Report 2001: Recording for the<br />
Blind & Dyslexic.<br />
Rosch, Eleanor. 1978. Principles of categorization. In Cognition and Categorization, ed. Eleanor<br />
Rosch. Hillsdale, N.J.: Erlbaum Associates.<br />
Schank, Roger C. and Robert Abelson. 1977. Scripts, plans, goals, and understanding:An inquiry<br />
into human knowledge structures. Hillsdale, NJ: Erlbaum.<br />
Schiffrin, Deborah. 1987. <strong>Discourse</strong> Markers: Studies in Interactional Sociolinguistics.<br />
Cambridge UK: cambridge University Press.<br />
Schiffrin, Deborah. 1994. Approaches to <strong>Discourse</strong>: Blackwell Textbooks in Linguistics. Malden<br />
MA: Blackwell Publishers.<br />
Schroeder, Fredric K. 1994. Braille Usage: Perspectives of Legally Blind Adults and Policy<br />
Implications for School Administrators, University of New Mexico.<br />
Scollon, Ron. 2001a. Mediated <strong>Discourse</strong>: The Nexus of Practice. new York: Routledge.<br />
Scollon, Ron and Suzanne Wong Scollon. 2001b. Intercultural <strong>Communication</strong>: A <strong>Discourse</strong><br />
Approach: Language in Society. Oxford: Blackwell.<br />
Simpson, John. 2001. Improved TV Access for Blind Viewers in the Digital Era. Paper<br />
presented at Radio, Television, and New Media, Canberra, Australia.<br />
Slatin, John PhD. 2002. A Review Of: "Beyond Alt Text: Making the Web Easy to Use for<br />
Users with Disabilities". Information Technology and Disabilities 8.<br />
Slatin, John PhD & Sharron Rush. 2001. Maximum Accessibility: Making Your Web Site More<br />
Usable for Everyone: Web Design. Boston, MA: Addison-Wesley.<br />
Smith, Chris. 2002. Personal <strong>Communication</strong>: Meeting @ RFB&D, with Phil Piety. Boston,<br />
MA.<br />
Snyder, Joel. 2002a. Discussion, with Phil Piety. McLean, VA.<br />
Snyder, Joel. 2002b. Fundamentals of <strong>Audio</strong> <strong>Description</strong>: <strong>Audio</strong> <strong>Description</strong> Associates.<br />
Stephens, Mitchell. 1998. The rise of the image the fall of the word. Oxford: Oxford University<br />
Press.<br />
Stokoe, William. 1965. A Dictionary of American Sign Language on Linguistic Principles.<br />
Cambridge: Cambridge University Press.<br />
Stovall, Jim. 2002. Conversation, with Phil Piety.<br />
102
Tannen, Deborah. 1981. Introduction. Paper presented at Georgetown University Round Table<br />
1981: <strong>Discourse</strong> Analysis, Georgetown University.<br />
Tannen, Deborah. 1989. Talking Voices: Repetition dialog and imagery in conversational discourse.vol.<br />
6: Studies in Interactional Sociolinguistics. New York: Cambridge University Press.<br />
Tannen, Deborah. 1993a. What's In a Frame?: Surface Evidence for Underlying<br />
Expectations. In Framing in <strong>Discourse</strong>, ed. Deborah Tannen. New York.<br />
Tannen, Deborah and Cynthia Wallat. 1993b. Interactive Frames and Knowledge Schemas<br />
in Interaction. In Framing in <strong>Discourse</strong>, ed. Deborah Tannen. New York.<br />
Townsend, David & Caroline Carrithers, Thomas Bever. 1987. Listening and Reading<br />
Processes in College and Middle School-Age Readers. In Comprehending Oral and<br />
Written Languages, ed. R. Horowitz & J. L. Samuel. New York: Academic Press.<br />
Valli, Ceil Lucas & Clayton. 2001. Sociolinguistic Variation in ASL. Washington DC: Gallaudet<br />
University Press.<br />
Vollmer, Judy. 2002. Personal <strong>Communication</strong>: Meeting, with Phil Piety. Boston, MA.<br />
Vygotsky, Lev. 1934. Thought and Language. Boston, MA: MIT Press.<br />
W3C, World Wide Web Consortium. 1999. Web Content Accessibility Guidelines 1.0.<br />
Wall, Robert S.; Corn, Anne L. 2002. Production of Textbooks and Instructional Materials in<br />
the United States. Journal of <strong>Visual</strong> Impairment and Blindness 96:212, 211.<br />
Warren, David H. 1994. Blindness and Children: An Individual and Differences Approach.<br />
Melbourne Australia: Cambridge University Press.<br />
Weber, John. 2002. NPR Radio Producer, <strong>Audio</strong> <strong>Description</strong> Volunteer Washington Ear,<br />
with Phil Piety. Washington DC.<br />
Wilson, Paul T. and Richard Anderson. 1986. What They Don't Know Will Hurt Them: The<br />
Role of Prior Knowledge in Comprehension. In Reading Comprehension: From Theory to<br />
Practice, ed. J Orasanu. Hillsdale, NJ: Lawrence Erlbaum.<br />
Wlodkowski, Tom. 2002. Access to Convergent Media: Barriers to Convergent Media for<br />
Individuals Who Are Blind or Have Low Vision, 4. Boston: National Center For<br />
Accessible Media (NCAM).<br />
Wyver, Shirly R. and Rosaliyn Markham, and Sonia Hlavacek. 2000. Inferences and Word<br />
Associations of Children with <strong>Visual</strong> Impairments. Journal of <strong>Visual</strong> Impairment and<br />
Blindness:204-217.<br />
103
NOTES<br />
1 Statistics on blindness and visual impairment are a challenge because the condition<br />
often co-occurs with other conditions that might be used to characterize an<br />
individual such as diabetes or mental retardation.<br />
2 This is not to suggest that there are no differences that are related to language. There<br />
have been studies showing that there are substantial differences in development of<br />
concepts and prototypes.<br />
3 The Motion Picture Association of America (MPAA) and others recently challenged<br />
this ruling. The challenge was upheld on technical grounds and an appeal is in<br />
process at the time of this writing.<br />
4 Not all images are described. Prior to recording, someone marks up the text and may<br />
decide to exclude certain images.<br />
5 The act was originally passed in 1973. The 1998 amendment brought in section 508.<br />
6 A 1992 report, “The Use of Descriptive Video in Science Programming,” revealed<br />
indications of benefit Kuhn, David. 1992a. The Use of Descriptive Video in<br />
Science Programming. Boston: WGBH Educational Foundation., but was this<br />
researcher was not able to see an experimental method that could yield measured<br />
results.<br />
7 I am simplifying the characterizations of spoken and written text for the purposes of<br />
comparison.<br />
8 I should be clear that this is an area where I have extremely sketchy information and<br />
may run counter to concept of inclusion within the same speaking community as<br />
sighted individuals. Scollon & Scollon, for example, describe four different<br />
definitions of culture do not include perceptive information.<br />
9 The technology used in transcribing the movies represented time in terms of seconds<br />
and not lower so the determination of gaps and lengths was approximate.<br />
10 This data was recovered from Frazier’s thesis and so it is based on the timings he<br />
presented and not on transcripts as the other films are.<br />
11 This term is not intended to create a direct reference to the a similarly named<br />
concept in software technology although one could imagine a distant connection to<br />
104
technology in the future. The term is used here to represent the concept of real<br />
people places and things that exist over extended periods of time in a text.<br />
12 I am certainly simplifying this issue and basing this aspect of my work on the<br />
definition of Utterance in Schiffrin 1987. This is a convenience for both author and<br />
reader and it is expected that other approaches to spoken discourse analysis that<br />
focus on the unit of production analogous to the utterance could also be applied in<br />
this area.<br />
105