07.12.2012 Views

Introduction to the Special Issue on Summarization

Introduction to the Special Issue on Summarization

Introduction to the Special Issue on Summarization

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Introducti<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>Special</str<strong>on</strong>g> <str<strong>on</strong>g>Issue</str<strong>on</strong>g> <strong>on</strong><br />

Summarizati<strong>on</strong><br />

Dragomir R. Radev ∗ Eduard Hovy †<br />

University of Michigan USC/ISI<br />

Kathleen McKeown ‡<br />

Columbia University<br />

1. <str<strong>on</strong>g>Introducti<strong>on</strong></str<strong>on</strong>g> and Definiti<strong>on</strong>s<br />

As <str<strong>on</strong>g>the</str<strong>on</strong>g> amount of <strong>on</strong>-line informati<strong>on</strong> increases, systems that can au<str<strong>on</strong>g>to</str<strong>on</strong>g>matically summarize<br />

<strong>on</strong>e or more documents become increasingly desirable. Recent research has<br />

investigated types of summaries, methods <str<strong>on</strong>g>to</str<strong>on</strong>g> create <str<strong>on</strong>g>the</str<strong>on</strong>g>m, and methods <str<strong>on</strong>g>to</str<strong>on</strong>g> evaluate<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>m. Several evaluati<strong>on</strong> competiti<strong>on</strong>s (in <str<strong>on</strong>g>the</str<strong>on</strong>g> style of <str<strong>on</strong>g>the</str<strong>on</strong>g> Nati<strong>on</strong>al Institute of Standards<br />

and Technology’s [NIST’s] Text Retrieval C<strong>on</strong>ference [TREC]) have helped determine<br />

baseline performance levels and provide a limited set of training material.<br />

Frequent workshops and symposia reflect <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>going interest of researchers around<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> world. The volume of papers edited by Mani and Maybury (1999) and a book<br />

(Mani 2001) provide good introducti<strong>on</strong>s <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> state of <str<strong>on</strong>g>the</str<strong>on</strong>g> art in this rapidly evolving<br />

subfield.<br />

A summary can be loosely defined as a text that is produced from <strong>on</strong>e or more<br />

texts, that c<strong>on</strong>veys important informati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> original text(s), and that is no l<strong>on</strong>ger<br />

than half of <str<strong>on</strong>g>the</str<strong>on</strong>g> original text(s) and usually significantly less than that. Text here is<br />

used ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r loosely and can refer <str<strong>on</strong>g>to</str<strong>on</strong>g> speech, multimedia documents, hypertext, etc.<br />

The main goal of a summary is <str<strong>on</strong>g>to</str<strong>on</strong>g> present <str<strong>on</strong>g>the</str<strong>on</strong>g> main ideas in a document in less<br />

space. If all sentences in a text document were of equal importance, producing a summary<br />

would not be very effective, as any reducti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> size of a document would<br />

carry a proporti<strong>on</strong>al decrease in its informativeness. Luckily, informati<strong>on</strong> c<strong>on</strong>tent in a<br />

document appears in bursts, and <strong>on</strong>e can <str<strong>on</strong>g>the</str<strong>on</strong>g>refore distinguish between more and less<br />

informative segments. Identifying <str<strong>on</strong>g>the</str<strong>on</strong>g> informative segments at <str<strong>on</strong>g>the</str<strong>on</strong>g> expense of <str<strong>on</strong>g>the</str<strong>on</strong>g> rest<br />

is <str<strong>on</strong>g>the</str<strong>on</strong>g> main challenge in summarizati<strong>on</strong>.<br />

Of <str<strong>on</strong>g>the</str<strong>on</strong>g> many types of summary that have been identified (Borko and Bernier 1975;<br />

Cremmins 1996; Sparck J<strong>on</strong>es 1999; Hovy and Lin 1999), indicative summaries provide<br />

an idea of what <str<strong>on</strong>g>the</str<strong>on</strong>g> text is about without c<strong>on</strong>veying specific c<strong>on</strong>tent, and informative<br />

<strong>on</strong>es provide some shortened versi<strong>on</strong> of <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent. Topic-oriented summaries c<strong>on</strong>centrate<br />

<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> reader’s desired <str<strong>on</strong>g>to</str<strong>on</strong>g>pic(s) of interest, whereas generic summaries reflect<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> author’s point of view. Extracts are summaries created by reusing porti<strong>on</strong>s (words,<br />

sentences, etc.) of <str<strong>on</strong>g>the</str<strong>on</strong>g> input text verbatim, while abstracts are created by regenerating<br />

∗ Assistant Professor, School of Informati<strong>on</strong>, Department of Electrical Engineering and Computer Science<br />

and Department of Linguistics, University of Michigan, Ann Arbor. E-mail: radev@umich.edu.<br />

† ISI Fellow and Senior Project Leader, Informati<strong>on</strong> Sciences Institute of <str<strong>on</strong>g>the</str<strong>on</strong>g> University of Sou<str<strong>on</strong>g>the</str<strong>on</strong>g>rn<br />

California, Marina del Rey, CA. E-mail: hovy@isi.edu.<br />

‡ Professor, Department of Computer Science, New York University, New York, NY. E-mail:<br />

kathy@cs.columbia.edu.<br />

c○ 2002 Associati<strong>on</strong> for Computati<strong>on</strong>al Linguistics


Computati<strong>on</strong>al Linguistics Volume 28, Number 4<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> extracted c<strong>on</strong>tent. Extracti<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> process of identifying important material in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> text, abstracti<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> process of reformulating it in novel terms, fusi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> process<br />

of combining extracted porti<strong>on</strong>s, and compressi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> process of squeezing out unimportant<br />

material. The need <str<strong>on</strong>g>to</str<strong>on</strong>g> maintain some degree of grammaticality and coherence<br />

plays a role in all four processes.<br />

The obvious overlap of text summarizati<strong>on</strong> with informati<strong>on</strong> extracti<strong>on</strong>, and c<strong>on</strong>necti<strong>on</strong>s<br />

from summarizati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> both au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated questi<strong>on</strong> answering and natural language<br />

generati<strong>on</strong>, suggest that summarizati<strong>on</strong> is actually a part of a larger picture.<br />

In fact, whereas early approaches drew more from informati<strong>on</strong> retrieval, more recent<br />

approaches draw from <str<strong>on</strong>g>the</str<strong>on</strong>g> natural language field. Natural language generati<strong>on</strong><br />

techniques have been adapted <str<strong>on</strong>g>to</str<strong>on</strong>g> work with typed textual phrases, in place of semantics,<br />

as input, and this allows researchers <str<strong>on</strong>g>to</str<strong>on</strong>g> experiment with approaches <str<strong>on</strong>g>to</str<strong>on</strong>g> abstracti<strong>on</strong>.<br />

Techniques that have been developed for <str<strong>on</strong>g>to</str<strong>on</strong>g>pic-oriented summaries are now<br />

being pushed fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r so that <str<strong>on</strong>g>the</str<strong>on</strong>g>y can be applied <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> producti<strong>on</strong> of l<strong>on</strong>g answers<br />

for <str<strong>on</strong>g>the</str<strong>on</strong>g> questi<strong>on</strong>-answering task. However, as <str<strong>on</strong>g>the</str<strong>on</strong>g> articles in this special issue show,<br />

domain-independent summarizati<strong>on</strong> has several specific, difficult aspects that make it<br />

a research <str<strong>on</strong>g>to</str<strong>on</strong>g>pic in its own right.<br />

2. Major Approaches<br />

We provide a sketch of <str<strong>on</strong>g>the</str<strong>on</strong>g> current state of <str<strong>on</strong>g>the</str<strong>on</strong>g> art of summarizati<strong>on</strong> by describing<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> general areas of research, including single-document summarizati<strong>on</strong> through extracti<strong>on</strong>,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> beginnings of abstractive approaches <str<strong>on</strong>g>to</str<strong>on</strong>g> single-document summarizati<strong>on</strong>,<br />

and a variety of approaches <str<strong>on</strong>g>to</str<strong>on</strong>g> multidocument summarizati<strong>on</strong>.<br />

2.1 Single-Document Summarizati<strong>on</strong> through Extracti<strong>on</strong><br />

Despite <str<strong>on</strong>g>the</str<strong>on</strong>g> beginnings of research <strong>on</strong> alternatives <str<strong>on</strong>g>to</str<strong>on</strong>g> extracti<strong>on</strong>, most work <str<strong>on</strong>g>to</str<strong>on</strong>g>day<br />

still relies <strong>on</strong> extracti<strong>on</strong> of sentences from <str<strong>on</strong>g>the</str<strong>on</strong>g> original document <str<strong>on</strong>g>to</str<strong>on</strong>g> form a summary.<br />

The majority of early extracti<strong>on</strong> research focused <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> development of relatively<br />

simple surface-level techniques that tend <str<strong>on</strong>g>to</str<strong>on</strong>g> signal important passages in <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />

text. Although most systems use sentences as units, some work with larger passages,<br />

typically paragraphs. Typically, a set of features is computed for each passage, and<br />

ultimately <str<strong>on</strong>g>the</str<strong>on</strong>g>se features are normalized and summed. The passages with <str<strong>on</strong>g>the</str<strong>on</strong>g> highest<br />

resulting scores are sorted and returned as <str<strong>on</strong>g>the</str<strong>on</strong>g> extract.<br />

Early techniques for sentence extracti<strong>on</strong> computed a score for each sentence based<br />

<strong>on</strong> features such as positi<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> text (Baxendale 1958; Edmunds<strong>on</strong> 1969), word<br />

and phrase frequency (Luhn 1958), key phrases (e.g., “it is important <str<strong>on</strong>g>to</str<strong>on</strong>g> note”) (Edmunds<strong>on</strong><br />

1969). Recent extracti<strong>on</strong> approaches use more sophisticated techniques for<br />

deciding which sentences <str<strong>on</strong>g>to</str<strong>on</strong>g> extract; <str<strong>on</strong>g>the</str<strong>on</strong>g>se techniques often rely <strong>on</strong> machine learning<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> identify important features, <strong>on</strong> natural language analysis <str<strong>on</strong>g>to</str<strong>on</strong>g> identify key passages,<br />

or <strong>on</strong> relati<strong>on</strong>s between words ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than bags of words.<br />

The applicati<strong>on</strong> of machine learning <str<strong>on</strong>g>to</str<strong>on</strong>g> summarizati<strong>on</strong> was pi<strong>on</strong>eered by Kupiec,<br />

Pedersen, and Chen (1995), who developed a summarizer using a Bayesian classifier<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> combine features from a corpus of scientific articles and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir abstracts. A<strong>on</strong>e et<br />

al. (1999) and Lin (1999) experimented with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r forms of machine learning and its<br />

effectiveness. Machine learning has also been applied <str<strong>on</strong>g>to</str<strong>on</strong>g> learning individual features;<br />

for example, Lin and Hovy (1997) applied machine learning <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> problem of determining<br />

how sentence positi<strong>on</strong> affects <str<strong>on</strong>g>the</str<strong>on</strong>g> selecti<strong>on</strong> of sentences, and Witbrock and<br />

Mittal (1999) used statistical approaches <str<strong>on</strong>g>to</str<strong>on</strong>g> choose important words and phrases and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>ir syntactic c<strong>on</strong>text.<br />

400


Radev, Hovy, and McKeown Summarizati<strong>on</strong>: <str<strong>on</strong>g>Introducti<strong>on</strong></str<strong>on</strong>g><br />

Approaches involving more sophisticated natural language analysis <str<strong>on</strong>g>to</str<strong>on</strong>g> identify key<br />

passages rely <strong>on</strong> analysis ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r of word relatedness or of discourse structure. Some<br />

research uses <str<strong>on</strong>g>the</str<strong>on</strong>g> degree of lexical c<strong>on</strong>nectedness between potential passages and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

remainder of <str<strong>on</strong>g>the</str<strong>on</strong>g> text; c<strong>on</strong>nectedness may be measured by <str<strong>on</strong>g>the</str<strong>on</strong>g> number of shared words,<br />

syn<strong>on</strong>yms, or anaphora (e.g., Sal<str<strong>on</strong>g>to</str<strong>on</strong>g>n et al. 1997; Mani and Bloedorn 1997; Barzilay<br />

and Elhadad 1999). O<str<strong>on</strong>g>the</str<strong>on</strong>g>r research rewards passages that include <str<strong>on</strong>g>to</str<strong>on</strong>g>pic words, that is,<br />

words that have been determined <str<strong>on</strong>g>to</str<strong>on</strong>g> correlate well with <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g>pic of interest <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user<br />

(for <str<strong>on</strong>g>to</str<strong>on</strong>g>pic-oriented summaries) or with <str<strong>on</strong>g>the</str<strong>on</strong>g> general <str<strong>on</strong>g>the</str<strong>on</strong>g>me of <str<strong>on</strong>g>the</str<strong>on</strong>g> source text (Buckley<br />

and Cardie 1997; Strzalkowski et al. 1999; Radev, Jing, and Budzikowska 2000).<br />

Alternatively, a summarizer may reward passages that occupy important positi<strong>on</strong>s<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> discourse structure of <str<strong>on</strong>g>the</str<strong>on</strong>g> text (Ono, Sumita, and Miike 1994; Marcu 1997b). This<br />

method requires a system <str<strong>on</strong>g>to</str<strong>on</strong>g> compute discourse structure reliably, which is not possible<br />

in all genres. This technique is <str<strong>on</strong>g>the</str<strong>on</strong>g> focus of <strong>on</strong>e of <str<strong>on</strong>g>the</str<strong>on</strong>g> articles in this special issue (Teufel<br />

and Moens 2002), which shows how particular types of rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical relati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> genre<br />

of scientific journal articles can be reliably identified through <str<strong>on</strong>g>the</str<strong>on</strong>g> use of classificati<strong>on</strong>.<br />

An open-source summarizati<strong>on</strong> envir<strong>on</strong>ment, MEAD, was recently developed at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Johns Hopkins summer workshop (Radev et al. 2002). MEAD allows researchers <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

experiment with different features and methods for combinati<strong>on</strong>.<br />

Some recent work (C<strong>on</strong>roy and O’Leary 2001) has turned <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> use of hidden<br />

Markov models (HMMs) and pivoted QR decompositi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> reflect <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

probability of inclusi<strong>on</strong> of a sentence in an extract depends <strong>on</strong> whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />

sentence has been included as well.<br />

2.2 Single-Document Summarizati<strong>on</strong> through Abstracti<strong>on</strong><br />

At this early stage in research <strong>on</strong> summarizati<strong>on</strong>, we categorize any approach that<br />

does not use extracti<strong>on</strong> as an abstractive approach. Abstractive approaches have used<br />

informati<strong>on</strong> extracti<strong>on</strong>, <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical informati<strong>on</strong>, informati<strong>on</strong> fusi<strong>on</strong>, and compressi<strong>on</strong>.<br />

Informati<strong>on</strong> extracti<strong>on</strong> approaches can be characterized as “<str<strong>on</strong>g>to</str<strong>on</strong>g>p-down,” since <str<strong>on</strong>g>the</str<strong>on</strong>g>y<br />

look for a set of predefined informati<strong>on</strong> types <str<strong>on</strong>g>to</str<strong>on</strong>g> include in <str<strong>on</strong>g>the</str<strong>on</strong>g> summary (in c<strong>on</strong>trast,<br />

extractive approaches are more data-driven). For each <str<strong>on</strong>g>to</str<strong>on</strong>g>pic, <str<strong>on</strong>g>the</str<strong>on</strong>g> user predefines<br />

frames of expected informati<strong>on</strong> types, <str<strong>on</strong>g>to</str<strong>on</strong>g>ge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with recogniti<strong>on</strong> criteria. For example,<br />

an earthquake frame may c<strong>on</strong>tain slots for locati<strong>on</strong>, earthquake magnitude, number of<br />

casualties, etc. The summarizati<strong>on</strong> engine must <str<strong>on</strong>g>the</str<strong>on</strong>g>n locate <str<strong>on</strong>g>the</str<strong>on</strong>g> desired pieces of informati<strong>on</strong>,<br />

fill <str<strong>on</strong>g>the</str<strong>on</strong>g>m in, and generate a summary with <str<strong>on</strong>g>the</str<strong>on</strong>g> results (DeJ<strong>on</strong>g 1978; Rau and<br />

Jacobs 1991). This method can produce high-quality and accurate summaries, albeit in<br />

restricted domains <strong>on</strong>ly.<br />

Compressive summarizati<strong>on</strong> results from approaching <str<strong>on</strong>g>the</str<strong>on</strong>g> problem from <str<strong>on</strong>g>the</str<strong>on</strong>g> point<br />

of view of language generati<strong>on</strong>. Using <str<strong>on</strong>g>the</str<strong>on</strong>g> smallest units from <str<strong>on</strong>g>the</str<strong>on</strong>g> original document,<br />

Witbrock and Mittal (1999) extract a set of words from <str<strong>on</strong>g>the</str<strong>on</strong>g> input document and <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />

order <str<strong>on</strong>g>the</str<strong>on</strong>g> words in<str<strong>on</strong>g>to</str<strong>on</strong>g> sentences using a bigram language model. Jing and McKeown<br />

(1999) point out that human summaries are often c<strong>on</strong>structed from <str<strong>on</strong>g>the</str<strong>on</strong>g> source document<br />

by a process of cutting and pasting document fragments that are <str<strong>on</strong>g>the</str<strong>on</strong>g>n combined<br />

and regenerated as summary sentences. Hence a summarizer can be developed <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

extract sentences, reduce <str<strong>on</strong>g>the</str<strong>on</strong>g>m by dropping unimportant fragments, and <str<strong>on</strong>g>the</str<strong>on</strong>g>n use informati<strong>on</strong><br />

fusi<strong>on</strong> and generati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> combine <str<strong>on</strong>g>the</str<strong>on</strong>g> remaining fragments. In this special<br />

issue, Jing (2002) reports <strong>on</strong> au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated techniques <str<strong>on</strong>g>to</str<strong>on</strong>g> build a corpus representing <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

cut-and-paste process used by humans; such a corpus can <str<strong>on</strong>g>the</str<strong>on</strong>g>n be used <str<strong>on</strong>g>to</str<strong>on</strong>g> train an<br />

au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated summarizer.<br />

O<str<strong>on</strong>g>the</str<strong>on</strong>g>r researchers focus <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> reducti<strong>on</strong> process. In an attempt <str<strong>on</strong>g>to</str<strong>on</strong>g> learn rules for<br />

reducti<strong>on</strong>, Knight and Marcu (2000) use expectati<strong>on</strong> maximizati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> train a system<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> compress <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactic parse tree of a sentence in order <str<strong>on</strong>g>to</str<strong>on</strong>g> produce a shorter but<br />

401


Computati<strong>on</strong>al Linguistics Volume 28, Number 4<br />

still maximally grammatical versi<strong>on</strong>. Ultimately, this approach can likely be used for<br />

shortening two sentences in<str<strong>on</strong>g>to</str<strong>on</strong>g> <strong>on</strong>e, three in<str<strong>on</strong>g>to</str<strong>on</strong>g> two (or <strong>on</strong>e), and so <strong>on</strong>.<br />

Of course, true abstracti<strong>on</strong> involves taking <str<strong>on</strong>g>the</str<strong>on</strong>g> process <strong>on</strong>e step fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r. Abstracti<strong>on</strong><br />

involves recognizing that a set of extracted passages <str<strong>on</strong>g>to</str<strong>on</strong>g>ge<str<strong>on</strong>g>the</str<strong>on</strong>g>r c<strong>on</strong>stitute something<br />

new, something that is not explicitly menti<strong>on</strong>ed in <str<strong>on</strong>g>the</str<strong>on</strong>g> source, and <str<strong>on</strong>g>the</str<strong>on</strong>g>n replacing <str<strong>on</strong>g>the</str<strong>on</strong>g>m<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> summary with <str<strong>on</strong>g>the</str<strong>on</strong>g> (ideally more c<strong>on</strong>cise) new c<strong>on</strong>cept(s). The requirement that<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> new material not be in <str<strong>on</strong>g>the</str<strong>on</strong>g> text explicitly means that <str<strong>on</strong>g>the</str<strong>on</strong>g> system must have access<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> external informati<strong>on</strong> of some kind, such as an <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logy or a knowledge base, and be<br />

able <str<strong>on</strong>g>to</str<strong>on</strong>g> perform combina<str<strong>on</strong>g>to</str<strong>on</strong>g>ry inference (Hahn and Reimer 1997). Since no large-scale<br />

resources of this kind yet exist, abstractive summarizati<strong>on</strong> has not progressed bey<strong>on</strong>d<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> proof-of-c<strong>on</strong>cept stage (although <str<strong>on</strong>g>to</str<strong>on</strong>g>p-down informati<strong>on</strong> extracti<strong>on</strong> can be seen as<br />

<strong>on</strong>e variant).<br />

2.3 Multidocument Summarizati<strong>on</strong><br />

Multidocument summarizati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> process of producing a single summary of a set<br />

of related source documents, is relatively new. The three major problems introduced<br />

by having <str<strong>on</strong>g>to</str<strong>on</strong>g> handle multiple input documents are (1) recognizing and coping with<br />

redundancy, (2) identifying important differences am<strong>on</strong>g documents, and (3) ensuring<br />

summary coherence, even when material stems from different source documents.<br />

In an early approach <str<strong>on</strong>g>to</str<strong>on</strong>g> multidocument summarizati<strong>on</strong>, informati<strong>on</strong> extracti<strong>on</strong><br />

was used <str<strong>on</strong>g>to</str<strong>on</strong>g> facilitate <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong> of similarities and differences (McKeown and<br />

Radev 1995). As for single-document summarizati<strong>on</strong>, this approach produces more of a<br />

briefing than a summary, as it c<strong>on</strong>tains <strong>on</strong>ly preidentified informati<strong>on</strong> types. Identity of<br />

slot values are used <str<strong>on</strong>g>to</str<strong>on</strong>g> determine when informati<strong>on</strong> is reliable enough <str<strong>on</strong>g>to</str<strong>on</strong>g> include in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

summary. Later work merged informati<strong>on</strong> extracti<strong>on</strong> approaches with regenerati<strong>on</strong> of<br />

extracted text <str<strong>on</strong>g>to</str<strong>on</strong>g> improve summary generati<strong>on</strong> (Radev and McKeown 1998). Important<br />

differences (e.g., updates, trends, direct c<strong>on</strong>tradicti<strong>on</strong>s) are identified through a set of<br />

discourse rules. Recent work also follows this approach, using enhanced informati<strong>on</strong><br />

extracti<strong>on</strong> and additi<strong>on</strong>al forms of c<strong>on</strong>trasts (White and Cardie 2002).<br />

To identify redundancy in text documents, various similarity measures are used.<br />

A comm<strong>on</strong> approach is <str<strong>on</strong>g>to</str<strong>on</strong>g> measure similarity between all pairs of sentences and <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />

use clustering <str<strong>on</strong>g>to</str<strong>on</strong>g> identify <str<strong>on</strong>g>the</str<strong>on</strong>g>mes of comm<strong>on</strong> informati<strong>on</strong> (McKeown et al. 1999; Radev,<br />

Jing, and Budzikowska 2000; Marcu and Gerber 2001). Alternatively, systems measure<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> similarity of a candidate passage <str<strong>on</strong>g>to</str<strong>on</strong>g> that of already-selected passages and retain<br />

it <strong>on</strong>ly if it c<strong>on</strong>tains enough new (dissimilar) informati<strong>on</strong>. A popular such measure is<br />

maximal marginal relevance (MMR) (Carb<strong>on</strong>ell, Geng, and Goldstein 1997; Carb<strong>on</strong>ell<br />

and Goldstein 1998).<br />

Once similar passages in <str<strong>on</strong>g>the</str<strong>on</strong>g> input documents have been identified, <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y c<strong>on</strong>tain must be included in <str<strong>on</strong>g>the</str<strong>on</strong>g> summary. Ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than simply listing<br />

all similar sentences (a lengthy soluti<strong>on</strong>), some approaches will select a representative<br />

passage <str<strong>on</strong>g>to</str<strong>on</strong>g> c<strong>on</strong>vey informati<strong>on</strong> in each cluster (Radev, Jing, and Budzikowska<br />

2000), whereas o<str<strong>on</strong>g>the</str<strong>on</strong>g>r approaches use informati<strong>on</strong> fusi<strong>on</strong> techniques <str<strong>on</strong>g>to</str<strong>on</strong>g> identify repetitive<br />

phrases from <str<strong>on</strong>g>the</str<strong>on</strong>g> clusters and combine <str<strong>on</strong>g>the</str<strong>on</strong>g> phrases in<str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> summary (Barzilay,<br />

McKeown, and Elhadad 1999). Mani, Gates, and Bloedorn (1999) describe <str<strong>on</strong>g>the</str<strong>on</strong>g> use of<br />

human-generated compressi<strong>on</strong> and reformulati<strong>on</strong> rules.<br />

Ensuring coherence is difficult, because this in principle requires some understanding<br />

of <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent of each passage and knowledge about <str<strong>on</strong>g>the</str<strong>on</strong>g> structure of discourse.<br />

In practice, most systems simply follow time order and text order (passages from<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> oldest text appear first, sorted in <str<strong>on</strong>g>the</str<strong>on</strong>g> order in which <str<strong>on</strong>g>the</str<strong>on</strong>g>y appear in <str<strong>on</strong>g>the</str<strong>on</strong>g> input).<br />

To avoid misleading <str<strong>on</strong>g>the</str<strong>on</strong>g> reader when juxtaposed passages from different dates all<br />

say “yesterday,” some systems add explicit time stamps (Lin and Hovy 2002a). O<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

402


Radev, Hovy, and McKeown Summarizati<strong>on</strong>: <str<strong>on</strong>g>Introducti<strong>on</strong></str<strong>on</strong>g><br />

systems use a combinati<strong>on</strong> of temporal and coherence c<strong>on</strong>straints <str<strong>on</strong>g>to</str<strong>on</strong>g> order sentences<br />

(Barzilay, Elhadad, and McKeown 2001). Recently, Otterbacher, Radev, and Luo (2002)<br />

have focused <strong>on</strong> discourse-based revisi<strong>on</strong>s of multidocument clusters as a means for<br />

improving summary coherence.<br />

Although multidocument summarizati<strong>on</strong> is new and <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches described<br />

here are <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> beginning, current research also branches out in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r directi<strong>on</strong>s. Research<br />

is beginning <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> generati<strong>on</strong> of updates <strong>on</strong> new informati<strong>on</strong> (Allan, Gupta,<br />

and Khandelwal 2001). Researchers are currently studying <str<strong>on</strong>g>the</str<strong>on</strong>g> producti<strong>on</strong> of l<strong>on</strong>ger<br />

answers (i.e., multidocument summaries) from retrieved documents, focusing <strong>on</strong> such<br />

types as biographies of people, descripti<strong>on</strong>s of multiple events of <str<strong>on</strong>g>the</str<strong>on</strong>g> same type<br />

(e.g., multiple hurricanes), opini<strong>on</strong> pieces (e.g., edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rials and letters discussing a c<strong>on</strong>tentious<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g>pic), and causes of events. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r challenging <strong>on</strong>going <str<strong>on</strong>g>to</str<strong>on</strong>g>pic is <str<strong>on</strong>g>the</str<strong>on</strong>g> generati<strong>on</strong><br />

of titles for ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r a single document or set of documents. This challenge will be<br />

explored in an evaluati<strong>on</strong> planned by NIST in 2003.<br />

2.4 Evaluati<strong>on</strong><br />

Evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> quality of a summary has proven <str<strong>on</strong>g>to</str<strong>on</strong>g> be a difficult problem, principally<br />

because <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no obvious “ideal” summary. Even for relatively straightforward news<br />

articles, human summarizers tend <str<strong>on</strong>g>to</str<strong>on</strong>g> agree <strong>on</strong>ly approximately 60% of <str<strong>on</strong>g>the</str<strong>on</strong>g> time, measuring<br />

sentence c<strong>on</strong>tent overlap. The use of multiple models for system evaluati<strong>on</strong><br />

could help alleviate this problem, but researchers also need <str<strong>on</strong>g>to</str<strong>on</strong>g> look at o<str<strong>on</strong>g>the</str<strong>on</strong>g>r methods<br />

that can yield more acceptable models, perhaps using a task as motivati<strong>on</strong>.<br />

Two broad classes of metrics have been developed: form metrics and c<strong>on</strong>tent metrics.<br />

Form metrics focus <strong>on</strong> grammaticality, overall text coherence, and organizati<strong>on</strong><br />

and are usually measured <strong>on</strong> a point scale (Brandow, Mitze, and Rau 1995). C<strong>on</strong>tent is<br />

more difficult <str<strong>on</strong>g>to</str<strong>on</strong>g> measure. Typically, system output is compared sentence by sentence<br />

or fragment by fragment <str<strong>on</strong>g>to</str<strong>on</strong>g> <strong>on</strong>e or more human-made ideal abstracts, and as in informati<strong>on</strong><br />

retrieval, <str<strong>on</strong>g>the</str<strong>on</strong>g> percentage of extraneous informati<strong>on</strong> present in <str<strong>on</strong>g>the</str<strong>on</strong>g> system’s<br />

summary (precisi<strong>on</strong>) and <str<strong>on</strong>g>the</str<strong>on</strong>g> percentage of important informati<strong>on</strong> omitted from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

summary (recall) are recorded. O<str<strong>on</strong>g>the</str<strong>on</strong>g>r comm<strong>on</strong>ly used measures include kappa (Carletta<br />

1996) and relative utility (Radev, Jing, and Budzikowska 2000), both of which take<br />

in<str<strong>on</strong>g>to</str<strong>on</strong>g> account <str<strong>on</strong>g>the</str<strong>on</strong>g> performance of a summarizer that randomly picks passages from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

original document <str<strong>on</strong>g>to</str<strong>on</strong>g> produce an extract. In <str<strong>on</strong>g>the</str<strong>on</strong>g> Document Understanding C<strong>on</strong>ference<br />

(DUC)-01 and DUC-02 summarizati<strong>on</strong> competiti<strong>on</strong>s (Harman and Marcu 2001; Hahn<br />

and Harman 2002), NIST used <str<strong>on</strong>g>the</str<strong>on</strong>g> Summary Evaluati<strong>on</strong> Envir<strong>on</strong>ment (SEE) interface<br />

(Lin 2001) <str<strong>on</strong>g>to</str<strong>on</strong>g> record values for precisi<strong>on</strong> and recall. These two competiti<strong>on</strong>s, run al<strong>on</strong>g<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> lines of TREC, have served <str<strong>on</strong>g>to</str<strong>on</strong>g> establish overall baselines for single-document and<br />

multidocument summarizati<strong>on</strong> and have provided several hundred human abstracts<br />

as training material. (Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r popular source of training material is <str<strong>on</strong>g>the</str<strong>on</strong>g> Ziff-Davis corpus<br />

of computer product announcements.) Despite low interjudge agreement, DUC<br />

has shown that humans are better summary producers than machines and that, for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> news article genre, certain algorithms do in fact do better than <str<strong>on</strong>g>the</str<strong>on</strong>g> simple baseline<br />

of picking <str<strong>on</strong>g>the</str<strong>on</strong>g> lead material.<br />

The largest task-oriented evaluati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> date, <str<strong>on</strong>g>the</str<strong>on</strong>g> Summarizati<strong>on</strong> Evaluati<strong>on</strong> C<strong>on</strong>ference<br />

(SUMMAC) (Mani et al. 1998; Firmin and Chrzanowski 1999) included three<br />

tests: <str<strong>on</strong>g>the</str<strong>on</strong>g> categorizati<strong>on</strong> task (how well can humans categorize a summary compared<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> its full text?), <str<strong>on</strong>g>the</str<strong>on</strong>g> ad hoc task (how well can humans determine whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r a full text is<br />

relevant <str<strong>on</strong>g>to</str<strong>on</strong>g> a query just from reading <str<strong>on</strong>g>the</str<strong>on</strong>g> summary?) and <str<strong>on</strong>g>the</str<strong>on</strong>g> questi<strong>on</strong> task (how well<br />

can humans answer questi<strong>on</strong>s about <str<strong>on</strong>g>the</str<strong>on</strong>g> main thrust of <str<strong>on</strong>g>the</str<strong>on</strong>g> source text from reading<br />

just <str<strong>on</strong>g>the</str<strong>on</strong>g> summary?). But <str<strong>on</strong>g>the</str<strong>on</strong>g> interpretati<strong>on</strong> of <str<strong>on</strong>g>the</str<strong>on</strong>g> results is not simple; studies (Jing et<br />

al. 1998; D<strong>on</strong>away, Drummey, and Ma<str<strong>on</strong>g>the</str<strong>on</strong>g>r 2000; Radev, Jing, and Budzikowska 2000)<br />

403


Computati<strong>on</strong>al Linguistics Volume 28, Number 4<br />

show how <str<strong>on</strong>g>the</str<strong>on</strong>g> same summaries receive different scores under different measures or<br />

when compared <str<strong>on</strong>g>to</str<strong>on</strong>g> different (but presumably equivalent) ideal summaries created by<br />

humans. With regard <str<strong>on</strong>g>to</str<strong>on</strong>g> interhuman agreement, Jing et al. find fairly high c<strong>on</strong>sistency<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> news genre <strong>on</strong>ly when <str<strong>on</strong>g>the</str<strong>on</strong>g> summary (extract) length is fixed relatively short.<br />

Marcu (1997a) provides some evidence that o<str<strong>on</strong>g>the</str<strong>on</strong>g>r genres will deliver less c<strong>on</strong>sistency.<br />

With regard <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> lengths of <str<strong>on</strong>g>the</str<strong>on</strong>g> summaries produced by humans when not c<strong>on</strong>strained<br />

by a particular compressi<strong>on</strong> rate, both Jing and Marcu find great variati<strong>on</strong>.<br />

N<strong>on</strong>e<str<strong>on</strong>g>the</str<strong>on</strong>g>less, it is now generally accepted that for single news articles, systems produce<br />

generic summaries indistinguishable from those of humans.<br />

Au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated summary evaluati<strong>on</strong> is a gleam in every<strong>on</strong>e’s eye. Clearly, when an<br />

ideal extract has been created by human(s), extractive summaries are easy <str<strong>on</strong>g>to</str<strong>on</strong>g> evaluate.<br />

Marcu (1999) and Goldstein et al. (1999) independently developed an au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated<br />

method <str<strong>on</strong>g>to</str<strong>on</strong>g> create extracts corresp<strong>on</strong>ding <str<strong>on</strong>g>to</str<strong>on</strong>g> abstracts. But when <str<strong>on</strong>g>the</str<strong>on</strong>g> number of available<br />

extracts is not sufficient, it is not clear how <str<strong>on</strong>g>to</str<strong>on</strong>g> overcome <str<strong>on</strong>g>the</str<strong>on</strong>g> problems of low interhuman<br />

agreement. Simply using a variant of <str<strong>on</strong>g>the</str<strong>on</strong>g> Bilingual Evaluati<strong>on</strong> Understudy<br />

(BLEU) scoring method (based <strong>on</strong> a linear combinati<strong>on</strong> of matching n-grams between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> system output and <str<strong>on</strong>g>the</str<strong>on</strong>g> ideal summary) developed for machine translati<strong>on</strong> (Papineni<br />

et al. 2001) is promising but not sufficient (Lin and Hovy 2002b).<br />

3. The Articles in this <str<strong>on</strong>g>Issue</str<strong>on</strong>g><br />

The articles in this issue move bey<strong>on</strong>d <str<strong>on</strong>g>the</str<strong>on</strong>g> current state of <str<strong>on</strong>g>the</str<strong>on</strong>g> art in various ways.<br />

Whereas most research <str<strong>on</strong>g>to</str<strong>on</strong>g> date has focused <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> use of sentence extracti<strong>on</strong> for summarizati<strong>on</strong>,<br />

we are beginning <str<strong>on</strong>g>to</str<strong>on</strong>g> see techniques that allow a system <str<strong>on</strong>g>to</str<strong>on</strong>g> extract, merge,<br />

and edit phrases, as opposed <str<strong>on</strong>g>to</str<strong>on</strong>g> full sentences, <str<strong>on</strong>g>to</str<strong>on</strong>g> generate a summary. Whereas many<br />

summarizati<strong>on</strong> systems are designed for summarizati<strong>on</strong> of news, new algorithms are<br />

summarizing much l<strong>on</strong>ger and more complex documents, such as scientific journal<br />

articles, medical journal articles, or patents. Whereas most research <str<strong>on</strong>g>to</str<strong>on</strong>g> date has focused<br />

<strong>on</strong> text summarizati<strong>on</strong>, we are beginning <str<strong>on</strong>g>to</str<strong>on</strong>g> see a move <str<strong>on</strong>g>to</str<strong>on</strong>g>ward summarizati<strong>on</strong><br />

of speech, a medium that places additi<strong>on</strong>al demands <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> summarizati<strong>on</strong> process.<br />

Finally, in additi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> providing full summarizati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g> articles in this issue<br />

also focus <strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g>ols that can aid in <str<strong>on</strong>g>the</str<strong>on</strong>g> process of developing summarizati<strong>on</strong> systems,<br />

<strong>on</strong> computati<strong>on</strong>al efficiency of algorithms, and <strong>on</strong> techniques needed for preprocessing<br />

speech.<br />

The four articles that focus <strong>on</strong> summarizati<strong>on</strong> of text share a comm<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>me:<br />

Each views <str<strong>on</strong>g>the</str<strong>on</strong>g> summarizati<strong>on</strong> process as c<strong>on</strong>sisting of two phases. In <str<strong>on</strong>g>the</str<strong>on</strong>g> first, material<br />

within <str<strong>on</strong>g>the</str<strong>on</strong>g> original document that is important is identified and extracted. In <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sec<strong>on</strong>d, this extracted material may be modified, merged, and edited using generati<strong>on</strong><br />

techniques. Two of <str<strong>on</strong>g>the</str<strong>on</strong>g> articles focus <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> extracti<strong>on</strong> stage (Teufel and Moens<br />

2002; Silber and McCoy 2002), whereas Jing (2002) examines <str<strong>on</strong>g>to</str<strong>on</strong>g>ols for au<str<strong>on</strong>g>to</str<strong>on</strong>g>matically<br />

c<strong>on</strong>structing resources that can be used for <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d stage.<br />

Teufel and Moens propose significantly different techniques for sentence extracti<strong>on</strong><br />

than have been used in <str<strong>on</strong>g>the</str<strong>on</strong>g> past. Noting <str<strong>on</strong>g>the</str<strong>on</strong>g> difference in both length and structure<br />

between scientific articles and news, <str<strong>on</strong>g>the</str<strong>on</strong>g>y claim that both <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text of sentences and<br />

a more focused search for sentences is needed in order <str<strong>on</strong>g>to</str<strong>on</strong>g> produce a good summary<br />

that is <strong>on</strong>ly 2.5% of <str<strong>on</strong>g>the</str<strong>on</strong>g> original document. Their approach is <str<strong>on</strong>g>to</str<strong>on</strong>g> provide a summary<br />

that focuses <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> new c<strong>on</strong>tributi<strong>on</strong> of <str<strong>on</strong>g>the</str<strong>on</strong>g> paper and its relati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> previous work.<br />

They rely <strong>on</strong> rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical relati<strong>on</strong>s <str<strong>on</strong>g>to</str<strong>on</strong>g> provide informati<strong>on</strong> about c<strong>on</strong>text and <str<strong>on</strong>g>to</str<strong>on</strong>g> identify<br />

sentences relating <str<strong>on</strong>g>to</str<strong>on</strong>g>, for example, <str<strong>on</strong>g>the</str<strong>on</strong>g> aim of <str<strong>on</strong>g>the</str<strong>on</strong>g> paper, its basis in previous work,<br />

or c<strong>on</strong>trasts with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r work. Their approach features <str<strong>on</strong>g>the</str<strong>on</strong>g> use of corpora annotated<br />

both with rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical relati<strong>on</strong>s and with relevance; it uses text categorizati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> extract<br />

404


Radev, Hovy, and McKeown Summarizati<strong>on</strong>: <str<strong>on</strong>g>Introducti<strong>on</strong></str<strong>on</strong>g><br />

sentences corresp<strong>on</strong>ding <str<strong>on</strong>g>to</str<strong>on</strong>g> any of seven rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical categories. The result is a set of<br />

sentences that situate <str<strong>on</strong>g>the</str<strong>on</strong>g> article in respect <str<strong>on</strong>g>to</str<strong>on</strong>g> its original claims and in relati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

research.<br />

Silber and McCoy focus <strong>on</strong> computati<strong>on</strong>ally efficient algorithms for sentence extracti<strong>on</strong>.<br />

They present a linear time algorithm <str<strong>on</strong>g>to</str<strong>on</strong>g> extract lexical chains from a source<br />

document (<str<strong>on</strong>g>the</str<strong>on</strong>g> lexical-chain approach was originally developed by Barzilay and Elhadad<br />

[1997] but used an exp<strong>on</strong>ential time algorithm). This approach facilitates <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

use of lexical chains as an intermediate representati<strong>on</strong> for summarizati<strong>on</strong>. Barzilay and<br />

Elhadad present an evaluati<strong>on</strong> of <str<strong>on</strong>g>the</str<strong>on</strong>g> approach for summarizati<strong>on</strong> with both scientific<br />

documents and university textbooks.<br />

Jing advocates <str<strong>on</strong>g>the</str<strong>on</strong>g> use of a cut-and-paste approach <str<strong>on</strong>g>to</str<strong>on</strong>g> summarizati<strong>on</strong> in which<br />

phrases, ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than sentences, are extracted from <str<strong>on</strong>g>the</str<strong>on</strong>g> original document. She shows<br />

that such an approach is often used by human abstrac<str<strong>on</strong>g>to</str<strong>on</strong>g>rs. She <str<strong>on</strong>g>the</str<strong>on</strong>g>n presents an au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g>ol that is used <str<strong>on</strong>g>to</str<strong>on</strong>g> analyze a corpus of paired documents and abstracts written<br />

by humans, in order <str<strong>on</strong>g>to</str<strong>on</strong>g> identify <str<strong>on</strong>g>the</str<strong>on</strong>g> phrases within <str<strong>on</strong>g>the</str<strong>on</strong>g> documents that are used in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> abstracts. She has developed an HMM soluti<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching problem. The<br />

decompositi<strong>on</strong> program is a <str<strong>on</strong>g>to</str<strong>on</strong>g>ol that can produce training and testing corpora for<br />

summarizati<strong>on</strong>, and its results have been used for her own summarizati<strong>on</strong> program.<br />

Saggi<strong>on</strong> and Lapalme (2002) describe a system, SumUM, that generates indicativeinformative<br />

summaries from technical documents. To build <str<strong>on</strong>g>the</str<strong>on</strong>g>ir system, Saggi<strong>on</strong> and<br />

Lapalme have studied a corpus of professi<strong>on</strong>ally written (short) abstracts. They have<br />

manually aligned <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracts and <str<strong>on</strong>g>the</str<strong>on</strong>g> original documents. Given <str<strong>on</strong>g>the</str<strong>on</strong>g> structured form<br />

of technical papers, most of <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracts was also found in ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

author abstract (20%) or in <str<strong>on</strong>g>the</str<strong>on</strong>g> first secti<strong>on</strong> of <str<strong>on</strong>g>the</str<strong>on</strong>g> paper (40%) or <str<strong>on</strong>g>the</str<strong>on</strong>g> headlines or capti<strong>on</strong>s<br />

(23%). Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir observati<strong>on</strong>s, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors have developed an approach <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

summarizati<strong>on</strong>, called selective analysis, which mimics <str<strong>on</strong>g>the</str<strong>on</strong>g> human abstrac<str<strong>on</strong>g>to</str<strong>on</strong>g>rs’ routine.<br />

The four comp<strong>on</strong>ents of selective analysis are indicative selecti<strong>on</strong>, informative selecti<strong>on</strong>,<br />

indicative generati<strong>on</strong>, and informative generati<strong>on</strong>.<br />

The final article in <str<strong>on</strong>g>the</str<strong>on</strong>g> issue (Zechner 2002) is distinct from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r articles in<br />

that it addresses problems in summarizati<strong>on</strong> of speech. As in text summarizati<strong>on</strong>,<br />

Zechner also uses sentence extracti<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> determine <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent of <str<strong>on</strong>g>the</str<strong>on</strong>g> summary. Given<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> informal nature of speech, however, a number of significant steps must be taken<br />

in order <str<strong>on</strong>g>to</str<strong>on</strong>g> identify useful segments for extracti<strong>on</strong>. Zechner develops techniques for<br />

removing disfluencies from speech, for identifying units for extracti<strong>on</strong> that are in<br />

some sense equivalent <str<strong>on</strong>g>to</str<strong>on</strong>g> sentences, and for identifying relati<strong>on</strong>s such as questi<strong>on</strong>answer<br />

across turns in order <str<strong>on</strong>g>to</str<strong>on</strong>g> determine when units from two separate turns should<br />

be extracted as a whole. This preprocessing yields a transcript <strong>on</strong> which standard<br />

techniques for extracti<strong>on</strong> in text (here <str<strong>on</strong>g>the</str<strong>on</strong>g> use of MMR [Carb<strong>on</strong>ell and Goldstein 1998]<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> identify relevant units) can operate successfully.<br />

Though true abstractive summarizati<strong>on</strong> remains a researcher’s dream, <str<strong>on</strong>g>the</str<strong>on</strong>g> success<br />

of extractive summarizers and <str<strong>on</strong>g>the</str<strong>on</strong>g> rapid development of compressive and similar<br />

techniques testifies <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> effectiveness with which <str<strong>on</strong>g>the</str<strong>on</strong>g> research community can address<br />

new problems and find workable soluti<strong>on</strong>s <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>m.<br />

References<br />

Allan, James, Rahul Gupta, and Vikas<br />

Khandelwal. 2001. Temporal summaries<br />

of news <str<strong>on</strong>g>to</str<strong>on</strong>g>pics. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 24th<br />

Annual Internati<strong>on</strong>al ACM SIGIR C<strong>on</strong>ference<br />

<strong>on</strong> Research and Development in Informati<strong>on</strong><br />

Retrieval, pages 10–18.<br />

A<strong>on</strong>e, Chinatsu, Mary Ellen Okurowski,<br />

James Gorlinsky, and Bjornar Larsen.<br />

1999. A trainable summarizer with<br />

knowledge acquired from robust NLP<br />

techniques. In I. Mani and M. T. Maybury,<br />

edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Advances in Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text<br />

Summarizati<strong>on</strong>. MIT Press, Cambridge,<br />

pages 71–80.<br />

Barzilay, Regina and Michael Elhadad. 1997.<br />

405


Computati<strong>on</strong>al Linguistics Volume 28, Number 4<br />

Using lexical chains for text<br />

summarizati<strong>on</strong>. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

ACL/EACL’97 Workshop <strong>on</strong> Intelligent<br />

Scalable Text Summarizati<strong>on</strong>, pages 10–17,<br />

Madrid, July.<br />

Barzilay, Regina and Michael Elhadad. 1999.<br />

Using lexical chains for text<br />

summarizati<strong>on</strong>. In I. Mani and M. T.<br />

Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Advances in Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic<br />

Text Summarizati<strong>on</strong>. MIT Press,<br />

Cambridge, pages 111–121.<br />

Barzilay, Regina, Noémie Elhadad, and<br />

Kathy McKeown. 2001. Sentence ordering<br />

in multidocument summarizati<strong>on</strong>. In<br />

Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> Human Language<br />

Technology C<strong>on</strong>ference.<br />

Barzilay, Regina, Kathleen McKeown, and<br />

Michael Elhadad. 1999. Informati<strong>on</strong><br />

fusi<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text of multi-document<br />

summarizati<strong>on</strong>. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 37th<br />

Annual Meeting of <str<strong>on</strong>g>the</str<strong>on</strong>g> Associati<strong>on</strong> for<br />

Computati<strong>on</strong>al Linguistics, College Park,<br />

MD, 20–26 June, pages 550–557.<br />

Baxendale, P. B. 1958. Man-made index for<br />

technical literature—An experiment. IBM<br />

Journal of Research and Development,<br />

2(4):354–361.<br />

Borko, H. and C. Bernier. 1975. Abstracting<br />

C<strong>on</strong>cepts and Methods. Academic Press,<br />

New York.<br />

Brandow, R<strong>on</strong>, Karl Mitze, and Lisa F. Rau.<br />

1995. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic c<strong>on</strong>densati<strong>on</strong> of<br />

electr<strong>on</strong>ic publicati<strong>on</strong>s by sentence<br />

selecti<strong>on</strong>. Informati<strong>on</strong> Processing and<br />

Management, 31(5):675–685.<br />

Buckley, Chris and Claire Cardie. 1997.<br />

Using empire and smart for<br />

high-precisi<strong>on</strong> IR and summarizati<strong>on</strong>. In<br />

Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> TIPSTER Text Phase III<br />

12-M<strong>on</strong>th Workshop, San Diego, CA,<br />

Oc<str<strong>on</strong>g>to</str<strong>on</strong>g>ber.<br />

Carb<strong>on</strong>ell, Jaime, Y. Geng, and Jade<br />

Goldstein. 1997. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated<br />

query-relevant summarizati<strong>on</strong> and<br />

diversity-based reranking. In Proceedings<br />

of <str<strong>on</strong>g>the</str<strong>on</strong>g> IJCAI-97 Workshop <strong>on</strong> AI in Digital<br />

Libraries, pages 12–19.<br />

Carb<strong>on</strong>ell, Jaime G. and Jade Goldstein.<br />

1998. The use of MMR, diversity-based<br />

reranking for reordering documents and<br />

producing summaries. In Alistair Moffat<br />

and Justin Zobel, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

21st Annual Internati<strong>on</strong>al ACM SIGIR<br />

C<strong>on</strong>ference <strong>on</strong> Research and Development in<br />

Informati<strong>on</strong> Retrieval, Melbourne,<br />

Australia, pages 335–336.<br />

Carletta, Jean. 1996. Assessing agreement <strong>on</strong><br />

classificati<strong>on</strong> tasks: The kappa statistic.<br />

Computati<strong>on</strong>al Linguistics, 22(2):249–254.<br />

C<strong>on</strong>roy, John and Dianne O’Leary. 2001.<br />

Text summarizati<strong>on</strong> via hidden Markov<br />

406<br />

models. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 24th Annual<br />

Internati<strong>on</strong>al ACM SIGIR C<strong>on</strong>ference <strong>on</strong><br />

Research and Development in Informati<strong>on</strong><br />

Retrieval, pages 406–407.<br />

Cremmins, Edward T. 1996. The Art of<br />

Abstracting. Informati<strong>on</strong> Resources Press,<br />

Arling<str<strong>on</strong>g>to</str<strong>on</strong>g>n, VA, sec<strong>on</strong>d editi<strong>on</strong>.<br />

DeJ<strong>on</strong>g, Gerald Francis. 1978. Fast Skimming<br />

of News S<str<strong>on</strong>g>to</str<strong>on</strong>g>ries: The FRUMP System. Ph.D.<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>sis, Yale University, New Haven, CT.<br />

D<strong>on</strong>away, R. L., K. W. Drummey, and<br />

L. A. Ma<str<strong>on</strong>g>the</str<strong>on</strong>g>r. 2000. A comparis<strong>on</strong> of<br />

rankings produced by summarizati<strong>on</strong><br />

evaluati<strong>on</strong> measures. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Workshop <strong>on</strong> Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Summarizati<strong>on</strong>,<br />

ANLP-NAACL2000, Associati<strong>on</strong> for<br />

Computati<strong>on</strong>al Linguistics, 30 April,<br />

pages 69–78.<br />

Edmunds<strong>on</strong>, H. P. 1969. New methods in<br />

au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic extracting. Journal of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Associati<strong>on</strong> for Computing Machinery,<br />

16(2):264–285.<br />

Firmin, T. and M. J. Chrzanowski. 1999. An<br />

evaluati<strong>on</strong> of au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic text<br />

summarizati<strong>on</strong> systems. In I. Mani and<br />

M. T. Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Advances in<br />

Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text Summarizati<strong>on</strong>. MIT Press,<br />

Cambridge, pages 325–336.<br />

Goldstein, Jade, Mark Kantrowitz, Vibhu O.<br />

Mittal, and Jaime G. Carb<strong>on</strong>ell. 1999.<br />

Summarizing text documents: Sentence<br />

selecti<strong>on</strong> and evaluati<strong>on</strong> metrics. In<br />

Research and Development in Informati<strong>on</strong><br />

Retrieval, pages 121–128, Berkeley, CA.<br />

Hahn, Udo and D<strong>on</strong>na Harman, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs.<br />

2002. Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> Document<br />

Understanding C<strong>on</strong>ference (DUC-02).<br />

Philadelphia, July.<br />

Hahn, Udo and Ulrich Reimer. 1997.<br />

Knowledge-based text summarizati<strong>on</strong>:<br />

Salience and generalizati<strong>on</strong> opera<str<strong>on</strong>g>to</str<strong>on</strong>g>rs for<br />

knowledge base abstracti<strong>on</strong>. In I. Mani<br />

and M. Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Advances in<br />

Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text Summarizati<strong>on</strong>. MIT Press,<br />

Cambridge, pages 215–232.<br />

Harman, D<strong>on</strong>na and Daniel Marcu, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs.<br />

2001. Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> Document<br />

Understanding C<strong>on</strong>ference (DUC-01). New<br />

Orleans, September.<br />

Hovy, E. and C.-Y. Lin. 1999. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated<br />

text summarizati<strong>on</strong> in SUMMARIST. In<br />

I. Mani and M. T. Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs,<br />

Advances in Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text Summarizati<strong>on</strong>.<br />

MIT Press, Cambridge, pages 81–94.<br />

Jing, H<strong>on</strong>gyan. 2002. Using hidden Markov<br />

modeling <str<strong>on</strong>g>to</str<strong>on</strong>g> decompose human-written<br />

summaries. Computati<strong>on</strong>al Linguistics,<br />

28(4), 527–543.<br />

Jing, H<strong>on</strong>gyan and Kathleen McKeown.<br />

1999. The decompositi<strong>on</strong> of<br />

human-written summary sentences. In


Radev, Hovy, and McKeown Summarizati<strong>on</strong>: <str<strong>on</strong>g>Introducti<strong>on</strong></str<strong>on</strong>g><br />

M. Hearst, F. Gey, and R. T<strong>on</strong>g, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs,<br />

Proceedings of SIGIR’99: 22nd Internati<strong>on</strong>al<br />

C<strong>on</strong>ference <strong>on</strong> Research and Development in<br />

Informati<strong>on</strong> Retrieval, University of<br />

California, Berkeley, August,<br />

pages 129–136.<br />

Jing, H<strong>on</strong>gyan, Kathleen McKeown, Regina<br />

Barzilay, and Michael Elhadad. 1998.<br />

Summarizati<strong>on</strong> evaluati<strong>on</strong> methods:<br />

Experiments and analysis. In Intelligent<br />

Text Summarizati<strong>on</strong>: Papers from <str<strong>on</strong>g>the</str<strong>on</strong>g> 1998<br />

AAAI Spring Symposium, Stanford, CA,<br />

23–25 March. Technical Report SS-98-06.<br />

AAAI Press, pages 60–68.<br />

Knight, Kevin and Daniel Marcu. 2000.<br />

Statistics-based summarizati<strong>on</strong>—Step <strong>on</strong>e:<br />

Sentence compressi<strong>on</strong>. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

17th Nati<strong>on</strong>al C<strong>on</strong>ference of <str<strong>on</strong>g>the</str<strong>on</strong>g> American<br />

Associati<strong>on</strong> for Artificial Intelligence<br />

(AAAI-2000), pages 703–710.<br />

Kupiec, Julian, Jan O. Pedersen, and<br />

Francine Chen. 1995. A trainable<br />

document summarizer. In Research and<br />

Development in Informati<strong>on</strong> Retrieval,<br />

pages 68–73.<br />

Lin, C. and E. Hovy. 1997. Identifying <str<strong>on</strong>g>to</str<strong>on</strong>g>pics<br />

by positi<strong>on</strong>. In Fifth C<strong>on</strong>ference <strong>on</strong> Applied<br />

Natural Language Processing, Associati<strong>on</strong><br />

for Computati<strong>on</strong>al Linguistics, 31<br />

March–3 April, pages 283–290.<br />

Lin, Chin-Yew. 1999. Training a selecti<strong>on</strong><br />

functi<strong>on</strong> for extracti<strong>on</strong>. In Proceedings of<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Eighteenth Annual Internati<strong>on</strong>al ACM<br />

C<strong>on</strong>ference <strong>on</strong> Informati<strong>on</strong> and Knowledge<br />

Management (CIKM), Kansas City, 6<br />

November. ACM, pages 55–62.<br />

Lin, Chin-Yew. 2001. Summary evaluati<strong>on</strong><br />

envir<strong>on</strong>ment.<br />

http://www.isi.edu/cyl/SEE.<br />

Lin, Chin-Yew and Eduard Hovy. 2002a.<br />

From single <str<strong>on</strong>g>to</str<strong>on</strong>g> multi-document<br />

summarizati<strong>on</strong>: A pro<str<strong>on</strong>g>to</str<strong>on</strong>g>type system and<br />

its evaluati<strong>on</strong>. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 40th<br />

C<strong>on</strong>ference of <str<strong>on</strong>g>the</str<strong>on</strong>g> Associati<strong>on</strong> of<br />

Computati<strong>on</strong>al Linguistics, Philadelphia,<br />

July, pages 457–464.<br />

Lin, Chin-Yew and Eduard Hovy. 2002b.<br />

Manual and au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic evaluati<strong>on</strong> of<br />

summaries. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> Document<br />

Understanding C<strong>on</strong>ference (DUC-02)<br />

Workshop <strong>on</strong> Multi-Document Summarizati<strong>on</strong><br />

Evaluati<strong>on</strong> at <str<strong>on</strong>g>the</str<strong>on</strong>g> ACL C<strong>on</strong>ference,<br />

Philadelphia, July, pages 45–51.<br />

Luhn, H. P. 1958. The au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic creati<strong>on</strong> of<br />

literature abstracts. IBM Journal of Research<br />

Development, 2(2):159–165.<br />

Mani, Inderjeet. 2001. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic<br />

Summarizati<strong>on</strong>. John Benjamins,<br />

Amsterdam/Philadelphia.<br />

Mani, Inderjeet and Eric Bloedorn. 1997.<br />

Multi-document summarizati<strong>on</strong> by graph<br />

search and matching. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Fourteenth Nati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Artificial<br />

Intelligence (AAAI-97), Providence, RI.<br />

American Associati<strong>on</strong> for Artificial<br />

Intelligence, pages 622–628.<br />

Mani, Inderjeet, Barbara Gates, and Eric<br />

Bloedorn. 1999. Improving summaries by<br />

revising <str<strong>on</strong>g>the</str<strong>on</strong>g>m. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 37th<br />

Annual Meeting of <str<strong>on</strong>g>the</str<strong>on</strong>g> Associati<strong>on</strong> for<br />

Computati<strong>on</strong>al Linguistics (ACL 99), College<br />

Park, MD, June, pages 558–565.<br />

Mani, Inderjeet, David House, G. Klein,<br />

Lynette Hirshman, Leo Obrst, Thérèse<br />

Firmin, Michael Chrzanowski, and Beth<br />

Sundheim. 1998. The TIPSTER SUMMAC<br />

text summarizati<strong>on</strong> evaluati<strong>on</strong>. Technical<br />

Report MTR 98W0000138, The Mitre<br />

Corporati<strong>on</strong>, McLean, VA.<br />

Mani, Inderjeet and Mark Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs.<br />

1999. Advances in Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text<br />

Summarizati<strong>on</strong>. MIT Press, Cambridge.<br />

Marcu, Daniel. 1997a. From discourse<br />

structures <str<strong>on</strong>g>to</str<strong>on</strong>g> text summaries. In<br />

Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> ACL’97/EACL’97 Workshop<br />

<strong>on</strong> Intelligent Scalable Text Summarizati<strong>on</strong>,<br />

Madrid, July 11, pages 82–88.<br />

Marcu, Daniel. 1997b. The Rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical Parsing,<br />

Summarizati<strong>on</strong>, and Generati<strong>on</strong> of Natural<br />

Language Texts. Ph.D. <str<strong>on</strong>g>the</str<strong>on</strong>g>sis, University of<br />

Tor<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>, Tor<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>.<br />

Marcu, Daniel. 1999. The au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic<br />

c<strong>on</strong>structi<strong>on</strong> of large-scale corpora for<br />

summarizati<strong>on</strong> research. In M. Hearst,<br />

F. Gey, and R. T<strong>on</strong>g, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Proceedings of<br />

SIGIR’99: 22nd Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong><br />

Research and Development in Informati<strong>on</strong><br />

Retrieval, University of California,<br />

Berkeley, August, pages 137–144.<br />

Marcu, Daniel and Laurie Gerber. 2001. An<br />

inquiry in<str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> nature of multidocument<br />

abstracts, extracts, and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir evaluati<strong>on</strong>. In<br />

Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> NAACL-2001 Workshop <strong>on</strong><br />

Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Summarizati<strong>on</strong>, Pittsburgh, June.<br />

NAACL, pages 1–8.<br />

McKeown, Kathleen, Judith Klavans,<br />

Vasileios Hatzivassiloglou, Regina<br />

Barzilay, and Eleazar Eskin. 1999.<br />

Towards multidocument summarizati<strong>on</strong><br />

by reformulati<strong>on</strong>: Progress and prospects.<br />

In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 16th Nati<strong>on</strong>al<br />

C<strong>on</strong>ference of <str<strong>on</strong>g>the</str<strong>on</strong>g> American Associati<strong>on</strong> for<br />

Artificial Intelligence (AAAI-1999), 18–22<br />

July, pages 453–460.<br />

McKeown, Kathleen R. and Dragomir R.<br />

Radev. 1995. Generating summaries of<br />

multiple news articles. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

18th Annual Internati<strong>on</strong>al ACM SIGIR<br />

C<strong>on</strong>ference <strong>on</strong> Research and Development in<br />

Informati<strong>on</strong> Retrieval, Seattle, July,<br />

pages 74–82.<br />

Ono, K., K. Sumita, and S. Miike. 1994.<br />

407


Computati<strong>on</strong>al Linguistics Volume 28, Number 4<br />

Abstract generati<strong>on</strong> based <strong>on</strong> rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical<br />

structure extracti<strong>on</strong>. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Computati<strong>on</strong>al<br />

Linguistics, Kyo<str<strong>on</strong>g>to</str<strong>on</strong>g>, Japan, pages 344–348.<br />

Otterbacher, Jahna, Dragomir R. Radev, and<br />

Air<strong>on</strong>g Luo. 2002. Revisi<strong>on</strong>s that improve<br />

cohesi<strong>on</strong> in multi-document summaries:<br />

A preliminary study. In ACL Workshop <strong>on</strong><br />

Text Summarizati<strong>on</strong>, Philadelphia.<br />

Papineni, K., S. Roukos, T. Ward, and W-J.<br />

Zhu. 2001. BLEU: A method for au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic<br />

evaluati<strong>on</strong> of machine translati<strong>on</strong>.<br />

Research Report RC22176, IBM.<br />

Radev, Dragomir, Sim<strong>on</strong>e Teufel, Horacio<br />

Saggi<strong>on</strong>, Wai Lam, John Blitzer, Arda<br />

Çelebi, H<strong>on</strong>g Qi, Elliott Drabek, and<br />

Danyu Liu. 2002. Evaluati<strong>on</strong> of text<br />

summarizati<strong>on</strong> in a cross-lingual<br />

informati<strong>on</strong> retrieval framework.<br />

Technical Report, Center for Language<br />

and Speech Processing, Johns Hopkins<br />

University, Baltimore, June.<br />

Radev, Dragomir R., H<strong>on</strong>gyan Jing, and<br />

Malgorzata Budzikowska. 2000.<br />

Centroid-based summarizati<strong>on</strong> of<br />

multiple documents: Sentence extracti<strong>on</strong>,<br />

utility-based evaluati<strong>on</strong>, and user studies.<br />

In ANLP/NAACL Workshop <strong>on</strong><br />

Summarizati<strong>on</strong>, Seattle, April.<br />

Radev, Dragomir R. and Kathleen R.<br />

McKeown. 1998. Generating natural<br />

language summaries from multiple<br />

<strong>on</strong>-line sources. Computati<strong>on</strong>al Linguistics,<br />

24(3):469–500.<br />

Rau, Lisa and Paul Jacobs. 1991. Creating<br />

segmented databases from free text for<br />

text retrieval. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> 14th<br />

Annual Internati<strong>on</strong>al ACM-SIGIR C<strong>on</strong>ference<br />

<strong>on</strong> Research and Development in Informati<strong>on</strong><br />

Retrieval, New York, pages 337–346.<br />

Saggi<strong>on</strong>, Horacio and Guy Lapalme. 2002.<br />

Generating indicative-informative<br />

summaries with SumUM. Computati<strong>on</strong>al<br />

Linguistics, 28(4), 497–526.<br />

Sal<str<strong>on</strong>g>to</str<strong>on</strong>g>n, G., A. Singhal, M. Mitra, and<br />

408<br />

C. Buckley. 1997. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic text<br />

structuring and summarizati<strong>on</strong>.<br />

Informati<strong>on</strong> Processing & Management,<br />

33(2):193–207.<br />

Silber, H. Gregory and Kathleen McCoy.<br />

2002. Efficiently computed lexical chains<br />

as an intermediate representati<strong>on</strong> for<br />

au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic text summarizati<strong>on</strong>.<br />

Computati<strong>on</strong>al Linguistics, 28(4), 487–496.<br />

Sparck J<strong>on</strong>es, Karen. 1999. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic<br />

summarizing: Fac<str<strong>on</strong>g>to</str<strong>on</strong>g>rs and directi<strong>on</strong>s. In<br />

I. Mani and M. T. Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs,<br />

Advances in Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text Summarizati<strong>on</strong>.<br />

MIT Press, Cambridge, pages 1–13.<br />

Strzalkowski, Tomek, Gees Stein, J. Wang,<br />

and Bowden Wise. 1999. A robust<br />

practical text summarizer. In I. Mani and<br />

M. T. Maybury, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Advances in<br />

Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Text Summarizati<strong>on</strong>. MIT Press,<br />

Cambridge, pages 137–154.<br />

Teufel, Sim<strong>on</strong>e and Marc Moens. 2002.<br />

Summarizing scientific articles:<br />

Experiments with relevance and rhe<str<strong>on</strong>g>to</str<strong>on</strong>g>rical<br />

status. Computati<strong>on</strong>al Linguistics, 28(4),<br />

409–445.<br />

White, Michael and Claire Cardie. 2002.<br />

Selecting sentences for multidocument<br />

summaries using randomized local<br />

search. In Proceedings of <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong><br />

Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic Summarizati<strong>on</strong> (including DUC<br />

2002), Philadelphia, July. Associati<strong>on</strong> for<br />

Computati<strong>on</strong>al Linguistics, New<br />

Brunswick, NJ, pages 9–18.<br />

Witbrock, Michael and Vibhu Mittal. 1999.<br />

Ultra-summarizati<strong>on</strong>: A statistical<br />

approach <str<strong>on</strong>g>to</str<strong>on</strong>g> generating highly c<strong>on</strong>densed<br />

n<strong>on</strong>-extractive summaries. In Proceedings<br />

of <str<strong>on</strong>g>the</str<strong>on</strong>g> 22nd Annual Internati<strong>on</strong>al ACM SIGIR<br />

C<strong>on</strong>ference <strong>on</strong> Research and Development in<br />

Informati<strong>on</strong> Retrieval, Berkeley,<br />

pages 315–316.<br />

Zechner, Klaus. 2002. Au<str<strong>on</strong>g>to</str<strong>on</strong>g>matic<br />

summarizati<strong>on</strong> of open-domain<br />

multiparty dialogues in diverse genres.<br />

Computati<strong>on</strong>al Linguistics, 28(4), 447–485.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!