16.01.2013 Views

Proceedings of the 13 ESSLLI Student Session - Multiple Choices ...

Proceedings of the 13 ESSLLI Student Session - Multiple Choices ...

Proceedings of the 13 ESSLLI Student Session - Multiple Choices ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />

<strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

4–15 August 2008, Hamburg, Germany<br />

Kata Balogh<br />

(editor)


Copyright c○ to <strong>the</strong> authors


Contents<br />

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

Martin Avanzini<br />

POP ∗ and Semantic Labelling using SAT . . . . . . . . . . . . 7<br />

Timo Baumann<br />

Simulating Spoken Dialogue<br />

With a Focus on Realistic Turn-Taking . . . . . . . . . . . . . 17<br />

Christopher Brumwell<br />

Epistemic Modals in Dialogue . . . . . . . . . . . . . . . . . . 27<br />

Bert Le Bruyn<br />

Bare predication and kinds . . . . . . . . . . . . . . . . . . . . 37<br />

James Burton<br />

Diagrammatic Reasoning<br />

with Enhanced Static Constraints . . . . . . . . . . . . . . . . 47<br />

Gemma Celestino<br />

Fictional Contingencies . . . . . . . . . . . . . . . . . . . . . 57<br />

Michael Franke<br />

Meaning & Inference in Case <strong>of</strong> Conflict . . . . . . . . . . . . 65<br />

Michael Hartwig<br />

Towards a New Characterisation <strong>of</strong> Chomsky’s Hierarchy via<br />

Acceptance Probability . . . . . . . . . . . . . . . . . . . . . . 75<br />

Simon Hopp<br />

Distance Effects in Sentence Processing . . . . . . . . . . . . . 85<br />

Pierre Lison<br />

A Salience-driven Approach to<br />

Speech Recognition for Human-Robot Interaction . . . . . . . . 95<br />

Petar Maksimović – Dragan Doder–Bojan Marinković – Aleksandar<br />

Perović<br />

A logic with a conditional probability operator . . . . . . . . . 105<br />

Scott Martin<br />

A Pro<strong>of</strong>-<strong>the</strong>oretic Approach to French Pronominal Clitics . . . 115<br />

Takako Nemoto<br />

Infinite games from an intuitionistic point <strong>of</strong> view . . . . . . . 125<br />

Ivelina Nikolova<br />

Language Technologies for Instructional Resources in Bulgarian<strong>13</strong>5<br />

3


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Yves Peirsman<br />

Word Space Models <strong>of</strong> Semantic Similarity and Relatedness . . 143<br />

Maren Schierloh<br />

Examining <strong>the</strong> Noticing Function <strong>of</strong> Output . . . . . . . . . . 153<br />

Andreas Schnabl<br />

Cdiprover3: a Tool for Proving<br />

Derivational Complexities <strong>of</strong> Term Rewriting Systems . . . . . 165<br />

Éva Szilágyi<br />

The Rank(s) Of A Totally Lexicalist Syntax . . . . . . . . . . 175<br />

Camilo Thorne<br />

Expressing Conjunctive and Aggregate Queries<br />

over Ontologies with Controlled English . . . . . . . . . . . . . 185<br />

Christina Unger – Gianluca Giorgolo<br />

Interrogation in Dynamic Epistemic Logic . . . . . . . . . . . 195<br />

Melanie Uth<br />

The Semantic Change <strong>of</strong> <strong>the</strong> French -age-Derivation . . . . . . 203<br />

Grégoire Winterstein<br />

Adversary Implicatures . . . . . . . . . . . . . . . . . . . . . . 2<strong>13</strong><br />

List <strong>of</strong> Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223<br />

4


Preface<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

This years <strong>Student</strong> <strong>Session</strong> is <strong>the</strong> thirteenth in <strong>the</strong> twenty years history <strong>of</strong> <strong>the</strong> annual<br />

European Summer School on Logic Language and Information. The first edition was<br />

held in Prague in 1996, invented and organized by students, and ever since ESS-<br />

LLI has been accompanied by a separate <strong>Student</strong> <strong>Session</strong>. The aim <strong>of</strong> <strong>the</strong> <strong>Student</strong><br />

<strong>Session</strong> is to give an opportunity to students at all levels (Bachelor-, Master-, and<br />

PhD-students) to present and discuss <strong>the</strong>ir work in progress with a possibility to get<br />

feedback from senior researchers.<br />

Similarly to <strong>the</strong> previous years, <strong>the</strong> quality <strong>of</strong> <strong>the</strong> submissions was high, this made<br />

<strong>the</strong> selection procedure difficult. This year 17 papers were selected for oral presentation<br />

and 5 for poster presentation from a total <strong>of</strong> 46 submissions. All <strong>the</strong> accepted<br />

papers are included in this volume.<br />

I would like to thank <strong>the</strong> <strong>ESSLLI</strong> organization, in particular Rineke Verbrugge and<br />

Benedikt Loewe for <strong>the</strong>ir continuos support and for making it possible. I am grateful<br />

to <strong>the</strong> StuS Program Committee, <strong>the</strong> co-chairs: Laia Mayol, Manuel Kirschner and<br />

Ji Ruan, for <strong>the</strong>ir efforts in coordinating <strong>the</strong> reviewing process, and <strong>the</strong> senior area<br />

experts: Anke Lüdeling, Paul Egré, Guram Bezhanishvili and Alexander Rabinovich<br />

for <strong>the</strong>ir continuous presence and helpful advice. Also, I want to hank <strong>the</strong> anonymous<br />

reviewers, whose detailed comments have not only proved invaluable during<br />

<strong>the</strong> selection procedure, but also provide useful feedback to <strong>the</strong> authors. Many<br />

thanks to <strong>the</strong> Kluwer Academic Publishers who <strong>of</strong>fered — as in previous years —<br />

prizes in “Best <strong>Student</strong> Paper in <strong>the</strong> Oral <strong>Session</strong>” and “Best <strong>Student</strong> Paper in <strong>the</strong><br />

Poster <strong>Session</strong>” nominations.<br />

We are very much looking forward to <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong> in Hamburg,<br />

and believe that it will be again a very inspiring meeting.<br />

5<br />

Kata Balogh<br />

Amsterdam, May 2008


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

6


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

POP ∗ AND SEMANTIC LABELING USING SAT<br />

Martin Avanzini<br />

University <strong>of</strong> Innsbruck<br />

Abstract. The polynomial path order (POP ∗ for short) is a termination method that induces<br />

polynomial bounds on <strong>the</strong> innermost runtime complexity <strong>of</strong> term rewrite systems (TRSs).<br />

Semantic labeling is a transformation technique used for proving termination. In this paper<br />

we propose an efficient implementation <strong>of</strong> POP ∗ toge<strong>the</strong>r with finite semantic labeling. This<br />

automation works by a reduction to <strong>the</strong> problem <strong>of</strong> boolean satisfiability. Satisfiability <strong>of</strong><br />

<strong>the</strong> resulting formula is checked by a state-<strong>of</strong>-<strong>the</strong>-art SAT-solver. We have implemented <strong>the</strong><br />

technique and experimental results confirm <strong>the</strong> feasibility <strong>of</strong> our approach. By semantic<br />

labeling, we significantly increase <strong>the</strong> power <strong>of</strong> POP ∗ .<br />

Term rewrite systems provide a conceptually simple but powerful abstract model <strong>of</strong><br />

computation. In rewriting, proving termination is a long standing research field and<br />

consequently termination techniques applicable in an automated setting have been introduced<br />

quite early. Former research concentrated mainly on direct termination techniques<br />

(TeReSe, 2003). One such technique is <strong>the</strong> use <strong>of</strong> recursive path orders (RPOs), for instance<br />

<strong>the</strong> multiset path order (MPO) (Baader and Nipkow, 1998). Recently, <strong>the</strong> emphasis<br />

shifted toward transformation techniques like <strong>the</strong> dependency pair method (Arts and<br />

Giesl, 2000) or semantic labeling (Zantema, 1995). These methods significantly increase<br />

<strong>the</strong> possibility to automatically conclude termination.<br />

For direct termination techniques it is <strong>of</strong>ten possible to infer upper bounds on <strong>the</strong><br />

derivational complexity <strong>of</strong> a rewrite system R from <strong>the</strong> termination pro<strong>of</strong>. For instance,<br />

H<strong>of</strong>bauer was <strong>the</strong> first to observe that termination via MPO implies <strong>the</strong> existence <strong>of</strong> a<br />

primitive recursive bound on <strong>the</strong> derivational complexity (H<strong>of</strong>bauer, 1992). Here derivational<br />

complexity refers to <strong>the</strong> function that relates <strong>the</strong> length <strong>of</strong> <strong>the</strong> longest derivation<br />

sequence to <strong>the</strong> size <strong>of</strong> <strong>the</strong> initial term. It is thus quite natural to extend such a termination<br />

analysis <strong>of</strong> rewrite systems to <strong>the</strong> analysis <strong>of</strong> complexity properties. For <strong>the</strong> study <strong>of</strong> lower<br />

complexity bounds we recently introduced in (Avanzini and Moser, 2008) <strong>the</strong> polynomial<br />

path order (POP ∗ for short). This order is in essence a miniaturization <strong>of</strong> MPO, carefully<br />

crafted to induce polynomial bounds on <strong>the</strong> number <strong>of</strong> rewrite steps (c.f. Theorem 4).<br />

In this work, we show how to increase <strong>the</strong> power <strong>of</strong> POP ∗ by semantic labeling<br />

(Zantema, 1995). The idea behind semantic labeling is to label <strong>the</strong> function symbols<br />

<strong>of</strong> a rewrite system R with semantic information in such a way that direct termination<br />

methods become applicable for <strong>the</strong> labeled rewrite system Rlab. In order to label R, one<br />

needs to define suitable interpretation- and labeling-functions for all symbols appearing<br />

in R. Naturally, <strong>the</strong>se functions have to be chosen such that POP ∗ is applicable to <strong>the</strong><br />

labeled system. To find <strong>the</strong>m automatically, we extend <strong>the</strong> propositional encoding from<br />

(Avanzini and Moser, 2008). Satisfiability <strong>of</strong> <strong>the</strong> constructed formula certifies <strong>the</strong> existence<br />

<strong>of</strong> a labeled system Rlab that is compatible with POP ∗ . Finite semantic labeling is<br />

non-termination preserving and moreover, it is complexity preserving. Thus from compatibility<br />

<strong>of</strong> Rlab with POP ∗ we conclude that R admits a polynomial runtime complexity<br />

(c.f. Lemma 6).<br />

7


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

A translation <strong>of</strong> infinite semantic labeling in conjunction with RPOs has already been<br />

given in (Koprowski and Middeldorp, 2007). Unfortunately, this approach is inapplicable<br />

in our context since <strong>the</strong> runtime complexity <strong>of</strong> <strong>the</strong> original system cannot be related to <strong>the</strong><br />

runtime complexity <strong>of</strong> <strong>the</strong> infinite labeled system in general. Fur<strong>the</strong>rmore, finite semantic<br />

labeling using heuristics is implemented in <strong>the</strong> termination prover TPA (Koprowski, 2006)<br />

for instance. We consider <strong>the</strong> here presented approach favorable, as <strong>the</strong> choice <strong>of</strong> labeling<br />

suitable for <strong>the</strong> base order can be left to a state-<strong>of</strong>-<strong>the</strong>-art SAT-solver.<br />

1 The Polynomial Path Order<br />

We briefly recall <strong>the</strong> basic concepts <strong>of</strong> term rewriting, for details (Baader and Nipkow,<br />

1998) provides a good resource. Let V denote a countably infinite set <strong>of</strong> variables and F<br />

a signature. The set <strong>of</strong> terms over F and V is denoted by T (F, V). We write ✂ for <strong>the</strong><br />

subterm relation, <strong>the</strong> converse is denoted by ☎ and <strong>the</strong> <strong>the</strong> strict part <strong>of</strong> ☎ by ✄.<br />

A term rewrite system (TRS for short) R over T (F, V) is a set <strong>of</strong> rewrite rules l → r<br />

such that l, r ∈ T (F, V), l �∈ V and all variables <strong>of</strong> r also appear in l. In <strong>the</strong> following, R<br />

will always denote a TRS and in our context, R is finite. A binary relation on T (F, V) is a<br />

rewrite relation if it is compatible with F-operations and closed under substitutions. The<br />

smallest extension <strong>of</strong> R that is a rewrite relation is denoted by →R. The innermost rewrite<br />

relation i −→R is a restriction <strong>of</strong> →R, where innermost terms have to be reduced first. The<br />

transitive and reflexive closure <strong>of</strong> a rewrite relation → is denoted by →∗ and we write<br />

s →n t for <strong>the</strong> contraction <strong>of</strong> s to t in n steps. We say that R is (innermost) terminating<br />

i<br />

if <strong>the</strong>re exists no infinite chain <strong>of</strong> terms t0, t1, . . . such that ti →R ti+1 (ti −→R ti+1) for<br />

all i ∈ N.<br />

The root symbols <strong>of</strong> left-hand sides <strong>of</strong> rewrite rules in R are called defined symbols and<br />

collected in D(R), while all o<strong>the</strong>r symbols are called constructor symbols and collected<br />

in C(R). A term f(s1, . . . , sn) is constructor-based with respect to R if f ∈ D(R) and<br />

s1, . . . , sn ∈ T (C(R), V). We write T cb(R) for <strong>the</strong> set <strong>of</strong> all constructor-based terms<br />

over R. If every left-hand side <strong>of</strong> R is constructor-based <strong>the</strong>n R is called constructor<br />

TRS. Constructor TRSs allow us to model <strong>the</strong> computation <strong>of</strong> functions in a very natural<br />

way. Consider <strong>the</strong> following TRS:<br />

Example 1 The constructor TRS Rmult is defined by<br />

add(0, y) → y mult(0, y) → 0<br />

add(s(x), y) → s(add(x, y)) mult(s(x), y) → add(y, mult(x, y)).<br />

Rmult defines <strong>the</strong> function symbols add and mult, i.e. D(R) = {add, mult}. Natural<br />

numbers are represented using <strong>the</strong> constructor symbols from C(R) = {s, 0}. Define <strong>the</strong><br />

encoding function �·� : Σ ∗ → T (C(R), ∅) by �0� = 0 and �n + 1� = s(�n�). Then for<br />

all n, m ∈ N, mult(�n�, �m�) i −→ ∗ R �n ∗ m�. We say that Rmult computes multiplication<br />

(and addition) on natural numbers. For instance, <strong>the</strong> system admits <strong>the</strong> innermost rewrite<br />

sequence mult(s(0), 0) i −→ add(0, mult(0, 0)) i −→ add(0, 0) i −→ 0, computing 1 ∗ 0. Notice<br />

that we have to reduce in <strong>the</strong> second step <strong>the</strong> innermost redex mult(0, 0) first.<br />

In (Lescanne, 1995) it is proposed to conceive <strong>the</strong> complexity <strong>of</strong> a rewrite system<br />

R as <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> functions computed by R. Whereas this view falls into <strong>the</strong><br />

realm <strong>of</strong> implicit complexity analysis, we conceive rewriting under R as <strong>the</strong> evaluation<br />

8


mechanism <strong>of</strong> <strong>the</strong> encoded function. Thus it is natural to define <strong>the</strong> runtime complexity<br />

based on <strong>the</strong> number <strong>of</strong> rewrite steps admitted by R. Let |s| denote <strong>the</strong> size <strong>of</strong> a term<br />

s. The (innermost) runtime complexity <strong>of</strong> a terminating rewrite system R is defined by<br />

DlR(m) = max{n | ∃s, t. s i −→ n t, s ∈ T cb(R) and |s| � m}.<br />

To verify whe<strong>the</strong>r <strong>the</strong> runtime complexity <strong>of</strong> a rewrite system R is polynomially<br />

bounded, we employ <strong>the</strong> polynomial path order. Similar to <strong>the</strong> recursion-<strong>the</strong>oretic characterization<br />

<strong>of</strong> <strong>the</strong> polytime functions given in (Bellantoni and Cook, 1992), POP ∗ relies<br />

on <strong>the</strong> separation <strong>of</strong> safe and normal inputs. For this, <strong>the</strong> notion <strong>of</strong> safe mappings is introduced.<br />

A safe mapping safe associates with every n-ary function symbol f <strong>the</strong> set <strong>of</strong><br />

safe argument positions. If f ∈ D(R) <strong>the</strong>n safe(f) ⊆ {1, . . . , n}, for f ∈ C(R) we fix<br />

safe(f) = {1, . . . , n}. The argument positions not included in safe(f) are called normal<br />

and denoted by nrm(f). A precedence is an irreflexive and transitive order on F. The<br />

polynomial path order >pop∗ is an extension <strong>of</strong> <strong>the</strong> auxiliary order >pop, both defined in<br />

<strong>the</strong> following definitions:<br />

Definition 2 Let > be a precedence and safe a safe mapping. We define <strong>the</strong> order >pop<br />

inductively as follows: s = f(s1, . . . , sn) >pop t if one <strong>of</strong> <strong>the</strong> following alternatives hold:<br />

1. f ∈ C(R) and si > = pop t for some i ∈ {1, . . . , n}, or<br />

2. si > = pop t for some i ∈ nrm(f), or<br />

3. t = g(t1, . . . , tm) with f ∈ D(R) and f > g and s >pop ti for all 1 � i � m.<br />

Definition 3 Let > be a precedence and safe a safe mapping. We define <strong>the</strong> polynomial<br />

path order >pop∗ inductively as follows: s = f(s1, . . . , sn) >pop∗ t if ei<strong>the</strong>r<br />

1. s >pop t, or<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

2. si > = pop∗ t for some i ∈ {1, . . . , n}, or<br />

3. t = g(t1, . . . , tm), with f ∈ D(R), f > g, and <strong>the</strong> following properties hold:<br />

• s >pop∗ ti0 for some i0 ∈ safe(g) and<br />

• ei<strong>the</strong>r s >pop ti or s ✄ ti and i ∈ safe(g) for all i �= i0, or<br />

4. t = f(t1, . . . , tm) and for nrm(f) = {i1, . . . , ip}, safe(f) = {j1, . . . , jq} both<br />

[si1, . . . , sip] (>pop∗)mul [ti1, . . . , tip] and [sj1, . . . , sjq] (> = pop∗)mul [tj1, . . . , tjq] holds.<br />

Here > = pop∗ (> = pop) denotes <strong>the</strong> reflexive closure <strong>of</strong> >pop∗ (>pop) and (>pop∗)mul <strong>the</strong> multiset<br />

extension <strong>of</strong> >pop∗. When R ⊆ >pop∗ holds, we say that >pop∗ is compatible with R.<br />

The main <strong>the</strong>orem from (Avanzini and Moser, 2008) states:<br />

Theorem 4 Let R be a finite, constructor TRS compatible with >pop∗, i.e., R ⊆ >pop∗.<br />

Then <strong>the</strong> runtime complexity <strong>of</strong> R is polynomial. The polynomial depends only on <strong>the</strong><br />

cardinality <strong>of</strong> F and <strong>the</strong> sizes <strong>of</strong> <strong>the</strong> right-hand sides in R.<br />

We conclude this section by demonstrating <strong>the</strong> application <strong>of</strong> POP ∗ on <strong>the</strong> TRS Rmult:<br />

9


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Example 5 Reconsider <strong>the</strong> rewrite system Rmult from Example 1. We suppose that <strong>the</strong><br />

second argument <strong>of</strong> addition (add) is safe (safe(add) = {2}) and that all arguments <strong>of</strong><br />

multiplication (mult) are normal (safe(mult) = ∅). Fur<strong>the</strong>rmore let <strong>the</strong> precedence ><br />

be defined as mult > add > s. Then Rmult is compatible with >pop∗. As a consequence<br />

<strong>of</strong> Theorem 4, <strong>the</strong> number <strong>of</strong> rewrite steps starting from mult(�n�, �m�) is polynomially<br />

bounded in n and m.<br />

In order to verify compatibility for this particular instance >pop∗ we need to show that<br />

all <strong>the</strong> rules in Rmult are strictly decreasing with respect to >pop∗, that is l >pop∗ r holds<br />

for l → r ∈ Rmult. To exemplify this, consider <strong>the</strong> rule add(s(x), y) → s(add(x, y)).<br />

We write 〈i〉 for <strong>the</strong> i-th case <strong>of</strong> Definition 3. From s(x) >pop∗ x by rule 〈2〉 we infer<br />

[s(x)](>pop∗)mul[x]. Fur<strong>the</strong>rmore [y](> = pop∗)mul[y] holds and thus by rule 〈4〉 we obtain<br />

add(s(x), y) >pop∗ add(x, y). Finally, from this and add > s we conclude by one application<br />

<strong>of</strong> rule 〈3〉 that add(s(x), y) >pop∗ s(add(x, y)) holds.<br />

2 A Propositional Encoding <strong>of</strong> POP ∗ with Finite Semantic Labeling<br />

In (Zantema, 1995) <strong>the</strong> transformation technique semantic labeling is introduced. From<br />

R a labeled TRS Rlab is obtained by labeling <strong>the</strong> function symbols in R with semantic<br />

information. Semantics are given to R by defining a model. A model is a F-algebra<br />

A, i.e. a carrier A equipped with operations fA : A n → A for every n-ary symbol<br />

f ∈ F, such that for every rule l → r ∈ R and any assignment α : V → A, <strong>the</strong> equality<br />

[α]A(l) = [α]A(r) holds. Here [α]A(t) denotes <strong>the</strong> interpretation <strong>of</strong> t with assignment α,<br />

inductively defined by [α]A(t) = α(t) if t ∈ V and [α]A(t) = fA([α]A(t1), . . . , [α]A(tn))<br />

if t = f(t1, . . . , tn). The system is <strong>the</strong>n labeled according to a labeling ℓ for A, i.e. a set<br />

<strong>of</strong> mappings ℓf : A n → A for every n-ary function symbol f ∈ F. 1<br />

For every assignment α, <strong>the</strong> mapping labα(t) is defined by labα(t) = t if t ∈ V and<br />

labα(f(t1, . . . , tn)) = fa(labα(t1), . . . , labα(tn)) where a = ℓf([α]A(t1), . . . , [α]A(tn)).<br />

The labeled TRS Rlab is obtained by labeling all rules for all assignments α, that is<br />

Rlab = {labα(l) → labα(r) | l → r ∈ R and assignment α}.<br />

The main <strong>the</strong>orem from (Zantema, 1995) states that Rlab is terminating if and only if R<br />

is terminating. In <strong>the</strong> following, we restrict to algebras B with carrier B = {true, false},<br />

however <strong>the</strong> approach is extensible to arbitrary finite carriers.<br />

To encode a Boolean function b : B n → B, we make use <strong>of</strong> unique propositional atoms<br />

bw for every sequence <strong>of</strong> arguments w = w1, . . . , wn ∈ B n . The atom bw will denote<br />

<strong>the</strong> result <strong>of</strong> applying w1, . . . , wn to b. Let a1, . . . , an be propositional formulas. To<br />

impose restrictions on <strong>the</strong> encoded function b, we introduce <strong>the</strong> formula �b�(a1, . . . , an)<br />

such that for a satisfying assignment ν <strong>the</strong> equality ν(�b�(a1, . . . , an)) = bν(a1),...,ν(an)<br />

holds. For instance with �b�(a1, a2) ↔ r we assert that <strong>the</strong> encoded function b satisfies<br />

b(ν(a1), ν(a2)) = ν(r).<br />

For every assignment α : V → A and term t appearing in R we introduce <strong>the</strong> atoms<br />

intα,t and labα,t for t �∈ V. The meaning <strong>of</strong> intα,t will be <strong>the</strong> result <strong>of</strong> [α]B(t), labα,t<br />

will denote <strong>the</strong> label <strong>of</strong> <strong>the</strong> root symbol <strong>of</strong> t under α. In order to ensure this for t =<br />

1 The definition from (Zantema, 1995) allows <strong>the</strong> labeling <strong>of</strong> a subset <strong>of</strong> F and leave o<strong>the</strong>r symbols<br />

unchanged. In our context, this has no consequence and simplifies <strong>the</strong> translation.<br />

10


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

f(t1, . . . , tn) and a particular assignment α, we define<br />

INTα(t) = intα,t ↔ �fB�(intα,t1, . . . , intα,tn), and<br />

LABα(t) = labα,t ↔ �ℓf�(intα,t1, . . . , intα,tn).<br />

Fur<strong>the</strong>rmore for t ∈ V we set INTα(t) = intα,t ↔ α(t). We extend ☎ to TRSs as follows:<br />

R ☎ t if l ☎ t or r ☎ t for some rule l → r ∈ R. Beside <strong>the</strong> model condition, <strong>the</strong> above<br />

constraints have to be enforced for every term appearing in R. This is covered by<br />

LAB(R) = ��<br />

�<br />

(INTα(t) ∧ LABα(t)) ∧ �<br />

(intα,l ↔ intα,r) � .<br />

α<br />

R☎t<br />

l→r∈R<br />

Assume ν is a satisfying assignment for LAB(R) and Rlab denotes <strong>the</strong> system obtained by<br />

labeling R according to <strong>the</strong> encoded labeling and model. In order to show compatibility<br />

<strong>of</strong> Rlab with POP ∗ , we need to find a precedence > and safe mapping safe such that<br />

Rlab ⊆>pop∗ holds for <strong>the</strong> induced order >pop∗. To compare <strong>the</strong> labeled versions <strong>of</strong> two<br />

concrete terms s, t ∈ T (F, V) under a particular assignment α, we define<br />

�s >pop∗ t�α = �s > (1)<br />

pop∗ t�α ∨ �s > (2)<br />

pop∗ t�α ∨ �s > (3)<br />

pop∗ t�α ∨ �s > (4)<br />

pop∗ t�α.<br />

Here �s > (i)<br />

pop∗ t� refers to <strong>the</strong> encodings <strong>of</strong> <strong>the</strong> case 〈i〉 from Definition 3. We discuss<br />

<strong>the</strong> cases 〈2〉 – 〈4〉, case 〈1〉, <strong>the</strong> comparison using <strong>the</strong> weaker order >pop, is obtained<br />

similarly.<br />

Note that si = t implies labα(si) = labα(t). Thus case 〈2〉 is perfectly captured<br />

by �f(s1, . . . , sn) > (2)<br />

pop∗ t�α = ⊤ 2 if si = t holds for some si. O<strong>the</strong>rwise, we define<br />

�f(s1, . . . , sn) > (2)<br />

pop∗ t�α = � n<br />

i=1 �si >pop∗ t�α. For f ∈ F and formula a representing<br />

<strong>the</strong> label, <strong>the</strong> formula SF(fa, i) (NRM(fa, i)) assesses that depending on <strong>the</strong> valuation <strong>of</strong><br />

a, <strong>the</strong> i-th position <strong>of</strong> ftrue or ffalse is safe (normal). Likewise, for f, g ∈ F, <strong>the</strong> formula<br />

�fa > gb� is defined such that for a satisfying assignment ν, fν(a) > gν(b) is asserted.<br />

Assume <strong>the</strong> unlabeled symbol f is a defined symbol <strong>of</strong> R.We define for f �= g<br />

�f(s1, . . . , sn) > (3)<br />

pop∗ g(t1, . . . , tm)�α = �flabα,s > glabα,t�<br />

n� �<br />

∧ �s >pop∗ ti0�α ∧ SF(glabα,t, i0)<br />

i0=1<br />

∧<br />

n�<br />

i=1,i�=i0<br />

� �s > (1)<br />

pop∗ ti�α ∨ ( SF(glabα,t, i) ∧ �s ✄ ti� ) ��<br />

.<br />

Here we employ that <strong>the</strong> superterm property ✄ is closed under labeling. Additionally<br />

we add <strong>the</strong> rule fa(x1, . . . , xn) → c with c a fresh constant to <strong>the</strong> labeled system and<br />

require fa > c in <strong>the</strong> precedence. This guarantees that fa is defined with respect to<br />

Rlab as o<strong>the</strong>rwise case 〈3〉 is not applicable. Alternatively one could encode whe<strong>the</strong>r fa is<br />

defined and adopt <strong>the</strong> encoding <strong>of</strong> case 〈3〉 accordingly, but experimental findings indicate<br />

that <strong>the</strong> described approach is favorable.<br />

To encode multiset comparisons, we make use <strong>of</strong> multiset covers (Schneider-Kamp,<br />

Thiemann, Annov, Codish and Giesl, 2007). A multiset cover is a pair <strong>of</strong> total mappings<br />

2 We use ⊤ and ⊥ to denote truth and falsity in propositional formulas.<br />

11


γ : {1, . . . , n} → {1, . . . , n} and ε: {1, . . . , n} → B, encoded using fresh atoms γi,j and<br />

εi. The underlying idea is that for <strong>the</strong> comparison [s1, . . . , sn](> = pop∗)mul[t1, . . . , tn] to<br />

hold, every term tj has to be covered by some term si (encoded as γij = true), ei<strong>the</strong>r by<br />

si = tj (εi = true) or si >pop∗ tj (εi = false). For <strong>the</strong> case si = tj, si must not cover<br />

any element besides tj. To assert a correct encoding <strong>of</strong> (γ, ε), we introduce <strong>the</strong> formula<br />

�(γ, ε)�. By means <strong>of</strong> multiset covers we are able to encode case 〈4〉 using one multiset<br />

comparison. We define<br />

�f(s1, . . . , sn) > (4)<br />

pop∗ f(t1, . . . , tn)�α =<br />

(labα,s ↔ labα,t) ∧ �(γ, ε)� ∧<br />

∧<br />

n�<br />

i=1 j=1<br />

n� � �<br />

NRM(flabα,s, i) ∧ ¬εi<br />

i=1<br />

n� �<br />

γi,j → � (SF(flabα,s, i) ↔ SF(flabα,t, j))<br />

∧ (εi → �si = tj�) ∧ (¬εi → �si >pop∗ tj�α) ��<br />

where we restrict comparisons <strong>of</strong> arguments by <strong>the</strong>ir kind. Assuming STRICT(R) and<br />

SMSL(R) cover <strong>the</strong> restrictions on <strong>the</strong> precedence and safe mapping, satisfiability <strong>of</strong><br />

POP ∗ SL(R) = � �<br />

�l >pop∗ r�α ∧ SM(R) ∧ STRICT(R) ∧ LAB(R)<br />

α<br />

l→r∈R<br />

certifies <strong>the</strong> existence <strong>of</strong> a model B and labeling ℓ such that <strong>the</strong> rewrite system<br />

R ′ lab = Rlab ∪ {fa(x1, . . . , xn) → c | f ∈ D(R) and fa ∈ C(Rlab)}<br />

is compatible with >pop∗. Since every rewrite sequence in R translates to a sequence in<br />

Rlab, by Theorem 4 it is an easy exercise to pro<strong>of</strong> <strong>the</strong> following lemma:<br />

Lemma 6 Let R be a finite, constructor TRS and assume POP∗ SL (R) is satisfiable. Then<br />

<strong>the</strong> induced runtime complexity is polynomial.<br />

3 Experimental Results<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We implemented <strong>the</strong> encoding <strong>of</strong> POP ∗ with semantic labeling (denoted by POP ∗ SL )<br />

in OCaml and compare it to <strong>the</strong> implementation without labeling from (Avanzini and<br />

Moser, 2008) (denoted by POP ∗ ) and an implementation <strong>of</strong> a restricted class <strong>of</strong> polynomial<br />

interpretations (denoted by SMC). To check satisfiability <strong>of</strong> <strong>the</strong> obtained formulas<br />

we employ <strong>the</strong> MiniSat SAT-solver (Eén and Sörensson, 2003).<br />

SMC refers to a restrictive class polynomial interpretations: Every constructor symbol<br />

is interpreted by a strongly linear polynomial, i.e. a polynomial <strong>of</strong> shape P (x1, . . . , xn) =<br />

Σ n i=1xi + c with c ∈ N, c � 1. Fur<strong>the</strong>rmore, each defined symbol is interpreted by a<br />

simple-mixed polynomial P (x1, . . . , xn) = Σij∈0,1ai1...inx i1<br />

1 . . . x in<br />

n + Σ n i=1bix 2 i with coefficients<br />

in N. For this class <strong>of</strong> polynomial interpretations it is trivial to check that <strong>the</strong>y<br />

induce polynomial bounds on <strong>the</strong> runtime complexity. To find <strong>the</strong>se interpretations automatically<br />

we employ cdiprover3 (Moser and Schnabl, 2008).<br />

12


The table below presents experimental results based on two testbeds. Testbed T constitutes<br />

<strong>of</strong> <strong>the</strong> 957 examples from <strong>the</strong> Termination Problem Database 4.03 (TPDB) that were<br />

automatically verified terminating in <strong>the</strong> competition <strong>of</strong> 20074 . Testbed C is a restriction<br />

<strong>of</strong> T where only constructor TRSs have been considered (449 in total). Experimental<br />

results, performed on a PC with 512 MB <strong>of</strong> RAM and a 2.4 GHz Intel R○ Pentium TM<br />

IV<br />

processor, are collected in Table 15 .<br />

Table 1: Experimental results on TPDB 4.0.<br />

POP ∗ POP ∗ SL SMC<br />

T C T C T C<br />

Yes 65 41 128 74 156 83<br />

Maybe 892 408 800 370 495 271<br />

Timeout (60 sec.) 0 0 29 5 306 95<br />

Average Time Yes (sec.) 0.037 0.<strong>13</strong>0 0.183<br />

The results confirm that semantic labeling significantly increases <strong>the</strong> power <strong>of</strong> POP∗ ,<br />

yielding comparable results to SMC. What is noteworthy is that <strong>the</strong> union <strong>of</strong> yes-instances<br />

<strong>of</strong> <strong>the</strong> three methods constitutes <strong>of</strong> 218 examples for testbed T and 112 for testbed C. For<br />

<strong>the</strong>se 112 out <strong>of</strong> 449 constructor TRSs we are able to conclude a polynomial runtime<br />

complexity. Interestingly POP∗ SL and SMC succeed on a quite different range <strong>of</strong> systems.<br />

There are 29 constructor TRSs that only POP∗ SL can deal with, whereas 38 constructor<br />

yes-instances <strong>of</strong> SMC cannot be handled by POP∗ SL . Table 1 reflects that for both suites<br />

SMC runs into a timeout for approximately every fourth system. This indicates that purely<br />

semantic methods similar to SMC tend to get impractical when <strong>the</strong> size <strong>of</strong> <strong>the</strong> input system<br />

increases. Compared to this, <strong>the</strong> number <strong>of</strong> timeouts <strong>of</strong> POP ∗ SL<br />

is ra<strong>the</strong>r low, confirming<br />

<strong>the</strong> feasibility <strong>of</strong> our new approach.<br />

We perform various optimizations in our implementation: First <strong>of</strong> all, <strong>the</strong> constraint<br />

formula can be reduced during construction. It is usually beneficial in combination with<br />

this to lazily construct <strong>the</strong> formula. For example, �f(s1, . . . , sn) > (2)<br />

pop∗ si�α reduces to ⊤<br />

and thus one can directly conclude �f(s1, . . . , sn) >pop∗ si�α = ⊤ without constructing<br />

encodings for <strong>the</strong> o<strong>the</strong>r cases. Fur<strong>the</strong>rmore, s >pop∗ t is doomed to failure if t contains<br />

variables not appearing in s, in this case we replace <strong>the</strong> constraint by ⊥. SAT-solvers<br />

expect <strong>the</strong>ir input in CNF (worst case exponential in size). We employ <strong>the</strong> transformation<br />

proposed in (Plaisted and Greenbaum, 1986) to obtain a equisatisfiable CNF linear in size.<br />

This approach is analogous to Tseitin’s transformation (Tseitin, 1968) but additionally<br />

takes <strong>the</strong> plurality <strong>of</strong> atoms into account, usually resulting in shorter transformations.<br />

4 Conclusion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In this paper we have shown how to automatically verify polynomial runtime complexities<br />

<strong>of</strong> rewrite systems. For that we employ semantic labeling and <strong>the</strong> polynomial path order<br />

3 Available at http://www.lri.fr/ ∼ marche/tpdb.<br />

4 C.f. http://www.lri.fr/ ∼ marche/termination-competition/2007/.<br />

5 Detailed results available at http://homepage.uibk.ac.at/ ∼ csae2496/esslli08.<br />

<strong>13</strong>


POP ∗ . Our automation works by a reduction to SAT and employing a state-<strong>of</strong>-<strong>the</strong>-art<br />

SAT-solver. To our best knowledge, this is <strong>the</strong> first SAT encoding <strong>of</strong> recursive path orders<br />

with finite semantic labeling. The experimental results confirm <strong>the</strong> feasibility <strong>of</strong> our approach.<br />

Moreover, <strong>the</strong>y demonstrate that by semantic labeling we significantly increase<br />

<strong>the</strong> power <strong>of</strong> POP ∗ .<br />

Our research seems also comparable to (Bonfante, Marion and Pchoux, 2007), where<br />

recursive path orders toge<strong>the</strong>r with strongly linear polynomial quasi-interpretations are<br />

employed in <strong>the</strong> complexity analysis. However, this method relies on caching techniques<br />

to achieve polytime computability. Opposite to this, we only demand an eager evaluation<br />

strategy.<br />

In future work we will streng<strong>the</strong>n <strong>the</strong> applicability <strong>of</strong> our methods. Currently we investigate<br />

in <strong>the</strong> integration <strong>of</strong> POP ∗ into <strong>the</strong> dependency pair framework for an automatic<br />

complexity analysis as proposed in (Hirokawa and Moser, 2008). As this framework allows<br />

<strong>the</strong> use <strong>of</strong> argument filterings (Kusakari, Nakamura and Toyama, 1999) and usable<br />

rules (Arts and Giesl, 2000), we expect a significant increase in <strong>the</strong> ability to automatically<br />

verify polynomial runtime complexities.<br />

Finally we want to mention ano<strong>the</strong>r exciting field <strong>of</strong> application. There is a long interest<br />

in <strong>the</strong> functional programming community to automatically verify complexity properties<br />

<strong>of</strong> programs. For brevity we just mention (Rosendahl, 1989; Anderson, Khoo,<br />

Andrei and Luca, 2005; Bonfante et al., 2007). Rewriting naturally models <strong>the</strong> evaluation<br />

<strong>of</strong> functional programs, and termination behavior <strong>of</strong> functional programs via transformations<br />

to rewrite systems has been extensively studied. For instance, one recent approach is<br />

described in (Giesl, Swiderski, Schneider-Kamp and Thiemann, 2006) where Haskell programs<br />

are covered. In joint work with Hirokawa, Middeldorp and Moser (Avanzini, Hirokawa,<br />

Middeldorp and Moser, 2007) we propose a translation from (a subset <strong>of</strong> higherorder)<br />

Scheme programs to term rewrite systems. The transformation is designed to be<br />

complexity preserving and thus allows <strong>the</strong> study <strong>of</strong> <strong>the</strong> complexity <strong>of</strong> a Scheme program<br />

P by <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> transformed rewrite system R. Hence from compatibility <strong>of</strong> R<br />

with POP ∗ we can directly conclude that <strong>the</strong> number <strong>of</strong> evaluation steps <strong>of</strong> <strong>the</strong> Scheme<br />

program P is polynomially bounded with respect to <strong>the</strong> input sizes. All necessary steps<br />

can be performed mechanically and thus we arrive at a completely automatic complexity<br />

analysis for Scheme, and eagerly evaluated functional programs in general.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Anderson, H., Khoo, S.-C., Andrei, S. and Luca, B. (2005). Calculating polynomial<br />

runtime properties, Proc. 3th APLAS, pp. 230–246.<br />

Arts, T. and Giesl, J. (2000). Termination <strong>of</strong> term rewriting using dependency pairs, TCS<br />

236(1-2): <strong>13</strong>3–178.<br />

Avanzini, M., Hirokawa, N., Middeldorp, A. and Moser, G. (2007). Proving termination<br />

<strong>of</strong> scheme programs by rewriting. Draft 6 .<br />

Avanzini, M. and Moser, G. (2008). Complexity analysis by rewriting, Proc. 9th FLOPS,<br />

Vol. 4989 <strong>of</strong> LICS, pp. <strong>13</strong>0–146.<br />

6 Available at http://cl-informatik.uibk.ac.at/ ∼ georg/list.publications<br />

14


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Baader, F. and Nipkow, T. (1998). Term Rewriting and All That, Cambridge University<br />

Press.<br />

Bellantoni, S. and Cook, S. A. (1992). A new recursion-<strong>the</strong>oretic characterization <strong>of</strong> <strong>the</strong><br />

polytime functions, CC 2: 97–110.<br />

Bonfante, G., Marion, J.-Y. and Pchoux, R. (2007). Quasi-interpretation syn<strong>the</strong>sis by<br />

decomposition., Proc. 4th ICTAC, Vol. 4711 <strong>of</strong> LICS, pp. 410–424.<br />

Eén, N. and Sörensson, N. (2003). An extensible sat-solver, Proc. 6th SAT, Vol. 2919 <strong>of</strong><br />

LICS, pp. 502–518.<br />

Giesl, J., Swiderski, S., Schneider-Kamp, P. and Thiemann, R. (2006). Automated termination<br />

analysis for haskell: From term rewriting to programming languages, Proc.<br />

17th RTA, Vol. 4098 <strong>of</strong> LICS, pp. 297–312.<br />

Hirokawa, N. and Moser, G. (2008). Automated complexity analysis based on <strong>the</strong> dependency<br />

pair method, Proc. 4th IJCAR. To appear.<br />

H<strong>of</strong>bauer, D. (1992). Termination pro<strong>of</strong>s by multiset path orderings imply primitive recursive<br />

derivation lengths, TCS 105(1): 129–140.<br />

Koprowski, A. (2006). Tpa: Termination proved automatically, Proc. 17th RTA, pp. 257–<br />

266.<br />

Koprowski, A. and Middeldorp, A. (2007). Predictive labeling with dependency pairs<br />

using sat, Proc. 21th CADE, Vol. 4603 <strong>of</strong> LICS, pp. 410–425.<br />

Kusakari, K., Nakamura, M. and Toyama, Y. (1999). Argument filtering transformation,<br />

Proc. 1th PPDP, Vol. 1702 <strong>of</strong> LICS, pp. 47–61.<br />

Lescanne, P. (1995). Termination <strong>of</strong> rewrite systems by elementary interpretations, Formal<br />

Aspects <strong>of</strong> Computing 7(1): 77–90.<br />

Moser, G. and Schnabl, A. (2008). Proving quadratic derivational complexities using<br />

context dependent interpretations, Proc. 19th RTA. To appear.<br />

Plaisted, D. A. and Greenbaum, S. (1986). A structure-preserving clause form translation,<br />

J. Symb. Comput. 2(3): 293–304.<br />

Rosendahl, M. (1989). Automatic complexity analysis, Proc. 4th FPCA, pp. 144–156.<br />

Schneider-Kamp, P., Thiemann, R., Annov, E., Codish, M. and Giesl, J. (2007). Proving<br />

termination using recursive path orders and SAT solving, Proc. 6th FroCoS, number<br />

4720 in LNCS, pp. 267–282.<br />

TeReSe (2003). Term Rewriting Systems, Vol. 55 <strong>of</strong> CTTCS, Cambridge University Press.<br />

Tseitin, G. (1968). On <strong>the</strong> complexity <strong>of</strong> derivation in propositional calculus, SCML, Part<br />

2 pp. 115–125.<br />

Zantema, H. (1995). Termination <strong>of</strong> term rewriting by semantic labelling, FI 24(1/2): 89–<br />

105.<br />

15


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

16


SIMULATING SPOKEN DIALOGUE<br />

WITH A FOCUS ON REALISTIC TURN-TAKING<br />

Timo Baumann<br />

University <strong>of</strong> Potsdam<br />

Abstract. We present a system for testing turn-taking strategies in a simulation environment,<br />

in which artificial dialogue participants exchange audio streams in real time – unlike earlier<br />

turn-taking simulations, which interchanged unambiguous symbolic messages. Dialogue<br />

participants autonomously determine <strong>the</strong>ir turn-taking behaviour, based on <strong>the</strong>ir analysis <strong>of</strong><br />

<strong>the</strong> incoming audio. We use machine-learning methods to classifiy <strong>the</strong> continuous audio<br />

signal into symbolic turn-taking states. We experiment with various rule sets and show how<br />

simple, local management rules can create realistic behavioural patterns.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Turn-taking management, i. e. deciding who may speak when in a dialogue, is an important<br />

subtask <strong>of</strong> interaction management. The classical model <strong>of</strong> turn-taking (Sacks,<br />

Schegl<strong>of</strong>f and Jefferson, 1974) describes turn-taking as locally managed (depending only<br />

on a local context) and predictive (upcoming turn endings are signalled in advance by <strong>the</strong><br />

interplay <strong>of</strong> syntax, semantics and prosody). Current speech dialogue systems (SDSes) on<br />

<strong>the</strong> o<strong>the</strong>r hand, use reactive turn-taking schemes, with <strong>the</strong> turn being taken after a silence<br />

<strong>of</strong> fixed length or <strong>of</strong> contextually determined length (Ferrer, Shriberg and Stolcke, 2002).<br />

This limits <strong>the</strong> interactivity <strong>of</strong> SDSes, as turns have to be separated by intervening silence.<br />

The prediction <strong>of</strong> turn endings (EoT prediction) has been investigated by a number<br />

<strong>of</strong> authors. Schlangen (2006) trains classifiers to predict <strong>the</strong> end <strong>of</strong> turn (EoT) but uses<br />

features that are not calculated strictly incrementally. Turn-management has also been<br />

studied before, but typically in simulation systems that interchange symbolic messages<br />

and work in a centrally managed environment (Padilha, 2006). In <strong>the</strong> present paper, we<br />

combine <strong>the</strong> efforts for EoT-prediction and turn-taking simulation. We propose an incremental<br />

classification <strong>of</strong> speech into speech states that control <strong>the</strong> system’s turn-taking. We<br />

first evaluate <strong>the</strong> classification itself and <strong>the</strong>n combined with different turn-management<br />

strategies in a dialogue simulation environment.<br />

Dialogue simulation itself has a long standing tradition in <strong>the</strong> development <strong>of</strong> SDSes,<br />

but <strong>the</strong> main focus seems to be on <strong>the</strong> improvement <strong>of</strong> dialogue strategies (Schatzmann,<br />

Weilhammer, Stuttle and Young, 2006) and audio is usually just used to trigger realistic<br />

ASR errors (López-Cózar, De la Torre, Segura and Rubio, 2003), which contrasts with<br />

<strong>the</strong> focus <strong>of</strong> <strong>the</strong> present paper: Our goal is to show how realistic turn-taking behaviour<br />

can be simulated using only local context for <strong>the</strong> classification <strong>of</strong> speech into classes relevant<br />

to turn-taking management combined with simple, locally managed rules. Dialogue<br />

strategies in general are not locally managed and thus learning dialogue strategies seems<br />

to require <strong>the</strong> more complex reinforcement learning instead <strong>of</strong> simple classifier training<br />

which we use.<br />

17


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 1: A human user conversing with an artificial DP in our interaction environment<br />

(structured as in section 2). A dialogue recorder wiretaps <strong>the</strong>ir conversation.<br />

We do not (and do not need to) take into account <strong>the</strong> content <strong>of</strong> <strong>the</strong> dialogues and<br />

in fact we limit our speech analysis to simple prosodic features for <strong>the</strong> EoT prediction.<br />

Thus, for this work, we abstract away from all questions <strong>of</strong> content management and let<br />

our dialogue participants speak randomly selected pre-recorded utterances – though with<br />

proper turn-taking.<br />

The remainder <strong>of</strong> <strong>the</strong> paper is structured as follows: Section 2 describes <strong>the</strong> system<br />

architecture and Section 3 <strong>the</strong> corpora we use. Section 4 evaluates <strong>the</strong> speech state classification<br />

and Section 5 demonstrates and evaluates some simple turn-management strategies.<br />

We close with conclusions and ideas for fur<strong>the</strong>r work.<br />

2 Architecture <strong>of</strong> <strong>the</strong> Interaction Environment<br />

Our architecture defines an interaction environment in which dialogue participants (DPs)<br />

communicate with each o<strong>the</strong>r. Interaction is purely non-symbolic, using asynchronous<br />

audio streams over RTP (Schulzrinne, Casner, Frederick and Jacobson, 2003). There is no<br />

common clock, or o<strong>the</strong>r synchronisation required between DPs. The architecture provides<br />

a headset tool for human DPs, and monitoring tools to listen to ongoing dialogues and to<br />

record <strong>the</strong>m to disk.<br />

Figure 1 shows two dialogue participants – one human, one artificial – conversing in<br />

<strong>the</strong> environment described above. The artificial DP on <strong>the</strong> right <strong>of</strong> figure 1 is structured<br />

as described below.<br />

Artificial DPs are realized as modular and extensible collections <strong>of</strong> event-driven s<strong>of</strong>tware<br />

agents in <strong>the</strong> open agent architecture, OAA (Martin, Cheyer and Moran, 1999).<br />

In <strong>the</strong> OAA each s<strong>of</strong>tware agent advertises its own abilities to solve problems (such as<br />

generating utterances) and may itself request o<strong>the</strong>r agents to solve sub-problems (e. g.<br />

sending data over RTP). For audio processing inside <strong>the</strong> DP we rely on <strong>the</strong> Sphinx-4<br />

framework (Walker, Lamere, Kwok, Raj, Singh, Gouvea, Wolf and Woelfel, 2004) which<br />

we extended for our audio-processing pipeline. In <strong>the</strong> current system, we do not yet use<br />

Sphinx’ abilities as a speech recognizer and most o<strong>the</strong>r modules that would be needed for<br />

a real dialogue system are missing. These are obvious enhancements for later versions.<br />

18


21 Speech Generation<br />

Speech generation consists <strong>of</strong> a syn<strong>the</strong>sizer and a dispatcher. The syn<strong>the</strong>sizer currently<br />

selects from a corpus <strong>of</strong> pre-recorded utterances and will be extended to include text-tospeech.<br />

To make turn-taking management harder and <strong>the</strong> system more realistic a fixed<br />

delay <strong>of</strong> 100 ms between signal to <strong>the</strong> module and onset <strong>of</strong> <strong>the</strong> recorded utterance is<br />

introduced at this point. 1 This delay is realized by sending 100 ms <strong>of</strong> recorded silence<br />

before <strong>the</strong> utterance and utterances are also followed by 100 ms <strong>of</strong> recorded silence. (If<br />

DPs were to send digital zeros directly before and after <strong>the</strong>ir utterances, speech state<br />

classification, as described below, would become trivial.)<br />

The speech dispatcher continuously sends an RTP stream in packets <strong>of</strong> 10 ms, ei<strong>the</strong>r<br />

audio from a file or sine waves if so instructed by <strong>the</strong> syn<strong>the</strong>sizer, or silence (digital zero).<br />

It can also be ordered to interrupt <strong>the</strong> audio and to revert to silence. The dispatcher also<br />

publishes its current speech state which may be one <strong>of</strong> sil, start <strong>of</strong> turn (SoT), talk, or end<br />

<strong>of</strong> turn (EoT) to <strong>the</strong> DP it is part <strong>of</strong>.<br />

22 Speech Analysis<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Speech analysis focuses solely on local prosodic analysis for <strong>the</strong> classification <strong>of</strong> <strong>the</strong><br />

listening state (which should reflect <strong>the</strong> interlocutor’s speech state, as described above).<br />

In order to be effective, classification must happen with as short a lag as possible. While<br />

short lags would allow for reactive behaviour, we aim to predict when <strong>the</strong> interlocutor’s<br />

end <strong>of</strong> turn is approaching in order to achieve smooth turn changes and counter-balance<br />

<strong>the</strong> 100 ms lag before a response can be uttered by <strong>the</strong> speech generation.<br />

We use machine learning to classify each received frame (10 ms) <strong>of</strong> audio as silence (sil),<br />

ongoing talk (talk) or end <strong>of</strong> turn (EoT). Classification is based exclusively on signal<br />

power, pitch and derived features. Our pitch extraction is modelled after <strong>the</strong> first three<br />

steps <strong>of</strong> <strong>the</strong> YIN algorithm (de Cheveigné and Kawahara, 2002). As no smoothing or dynamic<br />

programming is applied to <strong>the</strong> pitch extraction, results are computed incrementally<br />

in real-time and become available instantaneously. The algorithm runs at several times<br />

real-time on average hardware. On <strong>the</strong> corpora described below, <strong>the</strong> gross error rate is<br />

1.6 % compared to <strong>the</strong> well known ESPS algorithm (Talkin, 1995).<br />

In order to track changes over time, we derive features by windowing over past values<br />

<strong>of</strong> pitch and power with sizes ranging from 20 to 500 ms. While <strong>the</strong> features calculated<br />

on smaller windows help to smooth and to remove outliers due to failures <strong>of</strong> <strong>the</strong> pitch<br />

extraction, <strong>the</strong> larger windows are expected to capture long-term trends. We calculate <strong>the</strong><br />

arithmetic mean and <strong>the</strong> range <strong>of</strong> <strong>the</strong> values, <strong>the</strong> mean difference between values within<br />

<strong>the</strong> window and <strong>the</strong> relative position <strong>of</strong> <strong>the</strong> minimum and maximum. We also perform<br />

a linear regression and use its slope, <strong>the</strong> MSE <strong>of</strong> <strong>the</strong> regression and <strong>the</strong> error <strong>of</strong> <strong>the</strong><br />

regression for <strong>the</strong> last value in <strong>the</strong> window.<br />

23 Turn-Taking Management<br />

The turn-taking management agent determines whe<strong>the</strong>r to start or stop emitting utterances<br />

on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> states <strong>of</strong> <strong>the</strong> generation and analysis modules. An important aspect in<br />

turn-taking management is robustness. To be robust, <strong>the</strong> turn-taking strategy must not<br />

1 In a dialogue system NLG and TTS would require processing time; for humans <strong>the</strong>re is a delay between<br />

starting to plan an utterance and <strong>the</strong> start <strong>of</strong> <strong>the</strong> articulation (Levinson, 1983).<br />

19


depend on its interlocutor acting and reacting in certain ways. Naturally, “good” dialogue<br />

will only evolve from friendly dialogue partners, but <strong>the</strong> turn-management strategy must<br />

prevent dead-locks due to <strong>the</strong> interlocutor’s behaviour.<br />

Upon <strong>the</strong> reception <strong>of</strong> dialogue state change notifications from <strong>the</strong> analysis module, <strong>the</strong><br />

agent decides about emitting messages to <strong>the</strong> generation module, ordering it to talk or to<br />

hush, according to a defined turn-taking strategy. Messages are only emitted with certain<br />

probabilities. The probabilities to start or hush were determined empirically to lead to<br />

natural performance. If no action is taken, <strong>the</strong> agent sleeps for a short while (currently,<br />

50 ms) being awakened if ano<strong>the</strong>r message is received (for example EoT changing to<br />

sil). Thus, exact timings are non-deterministic and randomly differ between agents. The<br />

probability to start an utterance is set to 0.1, and to hush during an utterance to 0.3.<br />

3 Corpora<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We perform our experiments with two different corpora, one <strong>of</strong> simple pseudo-speech, one<br />

<strong>of</strong> read speech. Each corpus contains material from two different speakers (one female,<br />

one male) for which we train separate speech analyzers, in order to be able to simulate<br />

dialogues with one male and one female each.<br />

For pseudo-speech our speakers repeatedly uttered <strong>the</strong> syllable /ba/ instead <strong>of</strong> <strong>the</strong> actually<br />

occuring syllables in a script <strong>of</strong> 50 utterances (questions, informative sentences,<br />

confirmations, etc). By always uttering <strong>the</strong> same syllable, we remove segment-inherent<br />

influences on power and pitch variation, while at <strong>the</strong> same time retaining sentence intonation.<br />

For read speech we relied on <strong>the</strong> two major speakers <strong>of</strong> <strong>the</strong> Kiel Corpus <strong>of</strong><br />

Read Speech, KCoRS (IPDS, 1994). That corpus contains some 600 utterances for each<br />

speaker.<br />

The two corpora differ in size and complexity. Our controlled pseudo-speech poses<br />

hardly any problem for pitch-extraction and does not contain voiceless speech, silence<br />

during <strong>the</strong> occlusion <strong>of</strong> voiceless plosives or o<strong>the</strong>r potentially “difficult” audio. The<br />

KCoRS on <strong>the</strong> o<strong>the</strong>r hand contains far more training material. Also, as <strong>the</strong> pseudo-speech<br />

does not convey any semantic meaning, subjects in a listening test for <strong>the</strong> evaluation <strong>of</strong><br />

generated turn-taking patterns would not be distracted by nonsense dialogue.<br />

The performance <strong>of</strong> a speech state classifier on both <strong>of</strong> our corpora is likely to be better<br />

than on a corpus <strong>of</strong> real dialogue speech as it is more homogenous (especially compared to<br />

speaker-independent speech state classification). Thus, our results should be considered<br />

an upper bound on realistic results.<br />

The start and end <strong>of</strong> each utterance were hand-annotated and each 10 ms <strong>of</strong> audio was<br />

assigned to one <strong>of</strong> <strong>the</strong> listening states as described above with EoT being assigned to<br />

frames in <strong>the</strong> vicinity <strong>of</strong> ± 50 ms <strong>of</strong> <strong>the</strong> utterance end. For <strong>the</strong> turn-taking management<br />

experiments, we crop <strong>the</strong> audio files so that each utterance is preceeded and succeeded by<br />

100 ms <strong>of</strong> silence.<br />

4 Speech Analysis Evaluation<br />

We used <strong>the</strong> machine learning toolkit Weka (Witten and Frank, 2000) to train various<br />

speaker-dependent classifiers. For <strong>the</strong> evaluation 80 % <strong>of</strong> each corpus were used as<br />

training- and 20 % as test-set. Tables 1 and 2 show <strong>the</strong> results <strong>of</strong> <strong>the</strong> OneR-, J48 and<br />

20


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

classifier<br />

OneR<br />

J48<br />

Acc.<br />

96.1<br />

94.8<br />

female speaker<br />

Fsil Ftalk FEoT<br />

0.98 0.96 0.00<br />

0.98 0.95 0.50<br />

F AR<br />

21.4<br />

68.9<br />

Acc.<br />

92.8<br />

96.3<br />

male speaker<br />

Fsil Ftalk FEoT<br />

0.96 0.93 0.<strong>13</strong><br />

0.97 0.97 0.71<br />

F AR<br />

65.5<br />

64.3<br />

JRip 95.3 0.98 0.95 0.55 68.3 96.2 0.97 0.97 0.80 59.2<br />

Stateful JRip<br />

Stateful JRip, shifted<br />

95.9<br />

96.2<br />

0.98<br />

0.98<br />

0.95<br />

0.96<br />

0.59<br />

0.59<br />

48.4<br />

48.4<br />

95.5<br />

96.4<br />

0.97<br />

0.97<br />

0.96<br />

0.97<br />

0.72<br />

0.80<br />

50.0<br />

47.5<br />

Table 1: Accuracy, per-class f-measures and false alarm rate for various speech state<br />

classifiers for <strong>the</strong> pseudo-speech corpus.<br />

classifier<br />

OneR<br />

J48<br />

Acc.<br />

94.5<br />

97.3<br />

female speaker<br />

Fsil Ftalk FEoT<br />

0.97 0.96 0.03<br />

0.98 0.98 0.61<br />

F AR<br />

65.4<br />

71.1<br />

Acc.<br />

93.7<br />

96.1<br />

male speaker<br />

Fsil Ftalk FEoT<br />

0.92 0.96 0.10<br />

0.96 0.98 0.42<br />

F AR<br />

80.7<br />

84.1<br />

JRip 96.6 0.97 0.98 0.73 61.1 95.9 0.97 0.96 0.61 65.7<br />

Stateful JRip 96.4 0.96 0.98 0.70 31.9 94.9 0.97 0.96 0.58 50.0<br />

Stateful JRip, shifted 96.9 0.97 0.98 0.74 31.6 95.5 0.97 0.96 0.64 48.9<br />

Table 2: Accuracy, per-class f-measures and false alarm rate for various speech state<br />

classifiers for <strong>the</strong> KCoRS speakers.<br />

JRip-algorithms for each corpus. OneR finds <strong>the</strong> most predictive feature to be <strong>the</strong> dynamic<br />

range <strong>of</strong> frame energy over <strong>the</strong> last 100 or 200 ms. JRip outperforms J48, but<br />

has far worse training complexity. Separation <strong>of</strong> speech and silence (which here is <strong>the</strong><br />

recorded silence in <strong>the</strong> corpus, not digital zero) is done with high accuracy. Recognition<br />

<strong>of</strong> EoT regions is <strong>of</strong> lower quality, but still surpasses results in (Schlangen, 2006). 2<br />

While <strong>the</strong> data and <strong>the</strong>ir states are sequential in nature, <strong>the</strong> classifiers as described<br />

above evaluate each frame independently. At <strong>the</strong> same time, recognizing <strong>the</strong> o<strong>the</strong>r speaker’s<br />

start or end <strong>of</strong> turn a little too late or too early hardly matters, while frequently<br />

changing <strong>the</strong> listening state may lead to bad dialogue behaviour. This is measured in <strong>the</strong><br />

false alarm rate (FAR), defined as <strong>the</strong> proportion <strong>of</strong> over-generated state changes.<br />

The analysis <strong>of</strong> classification output showed that wrong classifications would <strong>of</strong>ten<br />

last for only one frame. We implemented a stateful classifier that only changes state<br />

after two consecutive classifications <strong>of</strong> <strong>the</strong> underlying classifier. This strongly decreases<br />

FAR but introduces systematic errors in <strong>the</strong> classification (every actual state change will<br />

be registered one frame too late) and reduces precision/recall measures. When this is<br />

accounted for in <strong>the</strong> evaluation, <strong>the</strong> stateful classifier outperforms <strong>the</strong> base classifier also<br />

in <strong>the</strong>se measures.<br />

The results show, that <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> KCoRS is counterbalanced by its 10 times<br />

larger size. This may indicate, that speech state classification for real dialogue speech<br />

would be feasible with a sufficiently large corpus and speaker-normalized prosodic features.<br />

5 Simple Strategies for Turn-Taking<br />

We outline some simple strategies to turn-control. Their purpose is to exemplify how<br />

very restricted locally managed behaviour with some simple rules can already lead to<br />

acceptable turn-taking behaviour as postulated by <strong>the</strong> local management model <strong>of</strong> Sacks<br />

et al. (1974), without <strong>the</strong> need for a dialogue history, or complex temporal reasoning.<br />

2 Results cannot be easily compared, as Schlangen (2006) recognizes turn-final words using prosodic<br />

and syntactic features on a more complex corpus, reaching an f-measure <strong>of</strong> 0.36.<br />

21


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

measure strategy 1 strategy 2 strategy 3<br />

gap 14.0 % 351 ms 18.7 % 358 ms 17.4 % 362 ms<br />

speaker a 31.4 % 1259 ms 35.9 % 1009 ms 36.5 % 1079 ms<br />

speaker b 39.3 % 1415 ms 39.8 % 1165 ms 40.8 % 1225 ms<br />

clash 15.4 % 1184 ms 5.6 % 317 ms 5.3 % 278 ms<br />

Table 3: Distribution and mean duration <strong>of</strong> dialogue states for three turn-taking strategies<br />

with pseudo-speech.<br />

measure strategy 1 strategy 2 strategy 3<br />

gap 14.1 % 528 ms 20.7 % 477 ms 18.9 % 454 ms<br />

speaker a 36.2 % 1764 ms 40.5 % 1456 ms 34.7 % 1232 ms<br />

speaker b 26.2 % 1437 ms 24.8 % <strong>13</strong>07 ms 42.0 % 1540 ms<br />

clash 23.5 % 1915 ms 4.0 % 253 ms 4.4 % 243 ms<br />

Table 4: Distribution and mean duration <strong>of</strong> dialogue states for three turn-taking strategies<br />

with KCoRS speakers.<br />

51 Measuring Turn-Management Success<br />

The dialogue state can be described by <strong>the</strong> current speech state <strong>of</strong> each <strong>of</strong> <strong>the</strong> dialogue<br />

participants, with each speech state being ei<strong>the</strong>r talk or sil. For two-party dialogue, this<br />

results in four states: two “good” states where ei<strong>the</strong>r one <strong>of</strong> <strong>the</strong> dialogue participants is<br />

talking and two “bad” states: Clashes when both participants talk simultaneously, and<br />

gaps with nei<strong>the</strong>r <strong>of</strong> <strong>the</strong>m talking.<br />

According to Sacks et al. (1974), speakers try to optimize <strong>the</strong>ir behaviour so as to<br />

minimize <strong>the</strong> occurence <strong>of</strong> both clashes and gaps. That is why we choose clashes and<br />

gaps as basic measures for turn-taking success. Slight gaps and clashes occur all <strong>the</strong> time,<br />

but <strong>the</strong>y are not always perceptually relevant. We thus decided to calculate <strong>the</strong> proportion<br />

<strong>of</strong> clashes and gaps over <strong>the</strong> course <strong>of</strong> <strong>the</strong> dialogue as well as <strong>the</strong>ir mean duration.<br />

For evaluation purposes, we set up two artificial dialogue participants and let <strong>the</strong>m talk<br />

with each o<strong>the</strong>r for about 10 minutes for each <strong>of</strong> <strong>the</strong> following strategies. We recorded<br />

<strong>the</strong> internal states and calculated <strong>the</strong> described measures. The audio itself was recorded<br />

but not fur<strong>the</strong>r analyzed in <strong>the</strong> evaluation. The results <strong>of</strong> <strong>the</strong> strategies described below<br />

are shown in tables tables 3 and 4.<br />

52 Strategy 1: Talk When Nobody Talks<br />

Rule: Start an utterance when nei<strong>the</strong>r you nor your interlocutor is talking. (Implicitly:<br />

Continue talking until your utterance is finished.)<br />

The performance with this strategy strongly depends on <strong>the</strong> round-trip time from one<br />

agent’s decision to take <strong>the</strong> turn until <strong>the</strong> o<strong>the</strong>r agent notices <strong>the</strong> turn being taken. The<br />

shorter <strong>the</strong> lags introduced by <strong>the</strong> talking agent’s internal communication, audio transmission,<br />

prosodic processing and classification, and <strong>the</strong> listening agent’s internal communication,<br />

<strong>the</strong> more likely it is for a dialogue participant to notice its interlocutor talking (and<br />

<strong>the</strong>n listen until he has finished) before she has started talking herself. For longer lags,<br />

<strong>the</strong> DP will decide to talk even though its interlocutor may already have started talking<br />

himself. As can be seen, this strategy leads to a large amount <strong>of</strong> clahes.<br />

53 Strategy 2: Hush When Both Talk<br />

Rule as above, plus: Stop your utterance when both you and your interlocutor are talking.<br />

22


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The rule proves effective in reducing simultaneous talk as clashes are reduced by 65 %<br />

(pseudo-speech) and over 80 % (KCors) respectively. At <strong>the</strong> same time, this strategy leads<br />

to <strong>the</strong> introduction <strong>of</strong> utterance truncations, when an utterance was stopped prematuerly.<br />

(Actually, <strong>the</strong> majority <strong>of</strong> utterances (71 % for pseudo-speech) was truncated, but many <strong>of</strong><br />

<strong>the</strong>se truncations occur in <strong>the</strong> silent phases before or after <strong>the</strong> actual talk and do not have<br />

any deteriorating effect on <strong>the</strong> perceived turn-taking performance.) Truncations could be<br />

reduced with a higher probability to hush during SoT.<br />

54 Strategy 3: Start Talking Early<br />

The previous strategies only react after turns have started or ended. In order to initiate<br />

actions early and anticipates turn changes, this strategy exploits <strong>the</strong> EoT class <strong>of</strong> <strong>the</strong><br />

speech analysis (which was ignored before) in <strong>the</strong> first rule: Start an utterance, when you<br />

are not talking and your interlocutor is ending <strong>the</strong>ir turn or has already finished.<br />

By starting utterance planning before <strong>the</strong> interlocutor’s preceding utterance is finished,<br />

<strong>the</strong> dialogue participant can hide some <strong>of</strong> <strong>the</strong> lag introduced by its speech generation<br />

module. The duration <strong>of</strong> both gaps and clashes is reduced compared to strategy 2, for<br />

gaps because turns will be taken over more quickly and for clashes due to <strong>the</strong> original<br />

talk-owner noticing <strong>the</strong> turn-change earlier, avoiding <strong>the</strong> start <strong>of</strong> a new utterance.<br />

The durations for gaps and clashes with this strategy are similar to those reported<br />

for parts <strong>of</strong> <strong>the</strong> Verbmobil corpus by Weilhammer and Rabold (2003), with 363 ms and<br />

331 ms respectively. 3 Performance could be fur<strong>the</strong>r improved by using a lower probability<br />

to hush during EoT.<br />

6 Conclusion and Future Directions<br />

We have presented a flexible, modular architecture for dialogue strategy evaluation where<br />

arbitrary pairings <strong>of</strong> human users and artificial dialogue participants can be created. We<br />

have discussed a case-study in this environment, where pairs <strong>of</strong> artificial DPs converse in<br />

real time via audio. Each DP autonomously decides on <strong>the</strong>ir turn-taking behaviour (start<br />

or stop talking) based on a local analysis <strong>of</strong> <strong>the</strong> audio signal and using machine-learned<br />

classifiers. We tested <strong>the</strong>se with corpora <strong>of</strong> simplified speech and achieve good recognition<br />

performance. Three implemented turn-management rulesets, all <strong>of</strong> <strong>the</strong>m locallymanaged<br />

in <strong>the</strong> sense <strong>of</strong> Sacks et al. (1974), i. e. not requiring dialogue memory, were<br />

shown to create increasingly realistic behavioural patterns.<br />

We plan to use <strong>the</strong> components developed for this system in an interactive speech<br />

dialogue system. For <strong>the</strong> speech state classification, we will need normalized prosodic<br />

features that allow for speaker independent speech state classification. At <strong>the</strong> same time,<br />

ASR will make features relative to syllable information (stress patterns, speech rate, ...)<br />

accessible, as well as word hypo<strong>the</strong>ses. We may also want to look into classifier confidence<br />

scores, only emitting speech state changes if <strong>the</strong> classifier is reasonably certain.<br />

In real dialogue, <strong>the</strong> problem <strong>of</strong> hesitations arises. Our classification will have to be<br />

extended to distinguish hesitational interruptions from normal EoT. We would also like to<br />

identify positions in a turn where a back-channelling utterance might be appropriate.<br />

3 Note, that <strong>the</strong>ir numbers are for turn changes only, while we do not distinguish between gaps at turn<br />

changes and at turn continuations.<br />

23


Acknowledgements<br />

I would like to thank my supervisor David Schlangen for his constant guidance and support<br />

and <strong>the</strong> anonymous reviewers for <strong>the</strong>ir insightful comments and suggestions.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

de Cheveigné, A. and Kawahara, H. (2002). Yin, a fundamental frequency estimator for<br />

speech and music, The Journal <strong>of</strong> <strong>the</strong> Acoustical Society <strong>of</strong> America 111(4): 1917–<br />

1930.<br />

Ferrer, L., Shriberg, E. and Stolcke, A. (2002). Is <strong>the</strong> speaker done yet? Faster and more<br />

accurate end-<strong>of</strong>-utterance detection using prosody, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> International<br />

Conference on Spoken Language Processing (ICSLP2002), Denver, USA.<br />

IPDS (1994). The kiel corpus <strong>of</strong> read speech, CD-ROM.<br />

Levinson, S. C. (1983). Pragmatics, Cambridge Textbooks in Linguistics, Cambridge<br />

University Press.<br />

López-Cózar, R., De la Torre, A., Segura, J. and Rubio, A. (2003). Assessment <strong>of</strong> dialogue<br />

systems by means <strong>of</strong> a new simulation technique, Speech Communication<br />

40(3): 387–407.<br />

Martin, D., Cheyer, A. and Moran, D. (1999). The Open Agent Architecture: a framework<br />

for building distributed s<strong>of</strong>tware systems, Applied Artificial Intelligence <strong>13</strong>(1/2): 91–<br />

128.<br />

URL: citeseer.ist.psu.edu/martin99open.html<br />

Padilha, E. G. (2006). Modelling Turn-taking in a Simulation <strong>of</strong> Small Group Discussion,<br />

PhD <strong>the</strong>sis, School <strong>of</strong> Informatics, University <strong>of</strong> Edinburgh, Edinburgh, UK.<br />

Sacks, H., Schegl<strong>of</strong>f, E. A. and Jefferson, G. A. (1974). A simplest systematic for <strong>the</strong><br />

organization <strong>of</strong> turn-taking in conversation, Language 50: 735–996.<br />

Schatzmann, J., Weilhammer, K., Stuttle, M. and Young, S. (2006). A survey <strong>of</strong> statistical<br />

user simulation techniques for reinforcement-learning <strong>of</strong> dialogue management<br />

strategies, The Knowledge Engineering Review 21(02): 97–126.<br />

Schlangen, D. (2006). From reaction to prediction: Experiments with computational<br />

models <strong>of</strong> turn-taking, Interspeech 2006, Pittsburgh, USA.<br />

URL: http://www.ling.uni-potsdam.de/ das/papers/schlangen intersp2006.pdf<br />

Schulzrinne, H., Casner, S., Frederick, R. and Jacobson, V. (2003). RTP: A Transport<br />

Protocol for Real-Time Applications, RFC 3550 (Standard).<br />

URL: http://www.ietf.org/rfc/rfc3550.txt<br />

Talkin, D. (1995). A robust algorithm for pitch tracking (rapt), in W. B. Kleijn and K. K.<br />

Paliwal (eds), Speech Coding and Syn<strong>the</strong>sis, Elsevier, chapter 14, pp. 495–518.<br />

24


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P. and Woelfel,<br />

J. (2004). Sphinx-4: A flexible open source framework for speech recognition,<br />

Technical Report SMLI TR2004-0811, Sun Microsystems Inc.<br />

Weilhammer, K. and Rabold, S. (2003). Durational aspects in turn taking, Proc. <strong>of</strong> <strong>the</strong><br />

ICPhS, Barcelona, Spain.<br />

URL: http://www.phonetik.uni-muenchen.de/Publications/WeilhammerRabold-03-<br />

ICPhS.pdf<br />

Witten, I. H. and Frank, E. (2000). Data Mining. Practical Machine Learning Tools and<br />

Techniques with Java Implementations., Morgan Kaufmann.<br />

25


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

26


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

EPISTEMIC MODALS IN DIALOGUE<br />

Chris Brumwell<br />

University <strong>of</strong> Amsterdam<br />

Abstract. I present an update semantics for epistemic modals in which a formula <strong>of</strong> <strong>the</strong><br />

form might φ acts on a context Γ by introducing a salient possibility con-structed from φ<br />

into Γ. This <strong>the</strong>ory is meant to account for <strong>the</strong> intuitions and data that suggest that assertions<br />

<strong>of</strong> epistemic modals do not provide information to <strong>the</strong> participants in a conversation, but<br />

instead suggest certain possibilities for <strong>the</strong>ir con-sideration. Among this data is <strong>the</strong> important<br />

empirical fact that epistemic modals can answer questions. To account for this, I also define a<br />

semantics for questions and show that in this system epistemic modals can count as answers<br />

to questions.<br />

1 Introduction and Motivations<br />

In <strong>the</strong> classic picture <strong>of</strong> communication given in Stalnaker (1978), a conversation is a<br />

process <strong>of</strong> distinguishing between various possibilities, or ways <strong>the</strong> world might be. It is<br />

clear, however, that in a conversation not all possibilities are given equal attention by <strong>the</strong><br />

interlocutors. People talking about whe<strong>the</strong>r or not John murdered Jack are not trying to<br />

distinguish a possibility in which chocolate makes cats sick from a possibility in which<br />

chocolate doesnt make cats sick. In this paper, I call <strong>the</strong> possibilities <strong>the</strong> interlocutors are<br />

most interested in salient possibilities.<br />

Asking a question is <strong>the</strong> canonical way <strong>of</strong> introducing salient possibilities into a discourse:<br />

questions introduce possibilities corresponding to <strong>the</strong>ir different answers. But<br />

o<strong>the</strong>r constructions introduce salient possibilities as well. The disjunction Jones works<br />

at a bank or a hospital introduces <strong>the</strong> salient possibilities that Jones works at a bank and<br />

Jones works at a hospital. Constructions containing indefinite NPs such as somebody<br />

stole <strong>the</strong> jewels can introduce salient possibilities corresponding to various instantiations<br />

<strong>of</strong> somebody. Free choice commands such as Take any apple you like introduce salient<br />

possibilities corresponding to your various choices. Finally, a statement expressing epistemic<br />

modality such as John might be hiding upstairs introduces <strong>the</strong> salient possi-bility<br />

that John is hiding upstairs.<br />

Recent work by Groenendijk (Groenendijk 2007) proposes an analysis <strong>of</strong> disjunction<br />

and existential quantification that captures <strong>the</strong>ir potential to introduce salient possibilities<br />

into a dialogue. In this paper, I formalize <strong>the</strong> notion <strong>of</strong> a salient possibility and use it<br />

to define a dynamic semantics for questions and epistemic modals. In <strong>the</strong> semantics,<br />

a question introduces salient possibilities corresponding to its possible answers, and an<br />

epistemic modal <strong>of</strong> <strong>the</strong> form might φ introduces a salient possibility constructed from φ<br />

and, following Veltman (1996), tests <strong>the</strong> common ground to see whe<strong>the</strong>r it is consistent<br />

with φ.<br />

Salient possibilities are almost perfectly suited for an analysis <strong>of</strong> epistemic modals.<br />

Unlike o<strong>the</strong>r kinds <strong>of</strong> assertions, an assertion <strong>of</strong> an epistemic modal does not con-tribute<br />

information to a conversation. Instead, its function is to call attention to certain possibilities<br />

that <strong>the</strong> conversational participants should, for some reason, find interesting. Thus,<br />

27


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

to analyze epistemic modals one must develop a framework in which assertions can significantly<br />

change a context without providing information. Since this papers framework<br />

postulates that epistemic modals affect <strong>the</strong> salient possibilities in a context ra<strong>the</strong>r than its<br />

information, <strong>the</strong> non-informative yet non-trivial effects <strong>of</strong> epistemic modals are properly<br />

represented.<br />

One advantage <strong>of</strong> this analysis is that it is able to account for <strong>the</strong> felicity <strong>of</strong> a modalized<br />

construction as an answer to a question. For example:<br />

(1) A: Where are my keys?<br />

B: They might be in <strong>the</strong> basement.<br />

(2) A: Are John and Bill coming to <strong>the</strong> party?<br />

B: They might.<br />

In dialogue (1), B doesnt answers As questions by saying where her keys are (because,<br />

if hes acting felicitously, he doesnt know where <strong>the</strong>y are), but by suggesting a possibility<br />

for her to consider. Similarly in (2): B suggests that A should not overlook <strong>the</strong> possibility<br />

that Bill and John come to <strong>the</strong> party. If she really dislikes <strong>the</strong>m, <strong>the</strong> very possibility that<br />

<strong>the</strong>y attend may be reason enough for her to skip <strong>the</strong> party.<br />

Enemies <strong>of</strong> salient possibilities may think that a modal answer to a question really says<br />

nothing more than I dont know, or Any answer is consistent with my knowledge. Against<br />

this, consider <strong>the</strong> following case: suppose A is frantically looking for her husband Joe,<br />

and comes across B, who has never met Joe and has never given him one thought. If<br />

she asks him Where is Joe? and he responds I dont know, this is perfectly acceptable.<br />

However, if he responds He might be in Boston this is completely infelicitous: if A takes<br />

him seriously, shes on her way to a wild goose chase. Intuitively, this is because she<br />

seriously takes into account his (inappropriate) suggestion to consider <strong>the</strong> possibility that<br />

Joe is in Boston.<br />

A classical partition <strong>the</strong>ory <strong>of</strong> questions has difficulty accounting for (1) and (2). This<br />

is <strong>the</strong> case because in a partition <strong>the</strong>ory an answer to a question has to give information.<br />

However, as dialogues (1) and (2) demonstrate, answers to questions do not need to be<br />

informative: it suffices that <strong>the</strong>y suggest informative answers. Below, I give a more detailed<br />

and formal discussion <strong>of</strong> <strong>the</strong> problem partition <strong>the</strong>ories <strong>of</strong> questions face from noninformative<br />

answers to questions, and discuss <strong>the</strong> similarities and differences between <strong>the</strong><br />

<strong>the</strong>ory presented in this paper and a partition <strong>the</strong>ory.<br />

This analysis also accounts for a puzzling feature <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> epistemic modals<br />

under attitude reports. Statements <strong>of</strong> <strong>the</strong> form x believes that might φ mean, in part,<br />

that <strong>the</strong> attitude holder x considers φ to be a salient possibility. For example, sup-pose<br />

that John has never given a thought to what <strong>the</strong> wea<strong>the</strong>r is like in Amsterdam. Then (3)<br />

certainly seems wrong:<br />

(3) John believes it might be raining in Amsterdam.<br />

Using this papers <strong>the</strong>ory, one could account for (3) by analyzing a belief state as composed<br />

<strong>of</strong> both information and salient possibilities. The content <strong>of</strong> (3) <strong>the</strong>n states, roughly,<br />

that its consistent with Johns beliefs that its raining in Amsterdam and that this is a salient<br />

28


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

possibility in his belief state. Several contemporary <strong>the</strong>ories <strong>of</strong> epistemic modality do not<br />

appeal to any notion similar to that <strong>of</strong> a salient possibility, and hence have no clear way<br />

<strong>of</strong> accounting for (3) (e.g. DeRose (1991) and Egan et. al. (2005); for similar reasons<br />

<strong>the</strong>se <strong>the</strong>ories also have problems accounting for <strong>the</strong> question and answer data presented<br />

above). Due to constraints on length we will not formalize this <strong>the</strong>ory <strong>of</strong> <strong>the</strong> interaction<br />

between epistemic modals and attitude reports below.<br />

As mentioned above, <strong>the</strong> analysis is carried out in a dynamic semantic framework. In<br />

dynamic semantics, <strong>the</strong> meaning <strong>of</strong> a formula is not identified with its truth conditions,<br />

but ra<strong>the</strong>r with <strong>the</strong> way it changes a context. More specifically, our <strong>the</strong>ory is a version <strong>of</strong><br />

update semantics in <strong>the</strong> style <strong>of</strong> Veltman (1996), i.e. we give a definition <strong>of</strong> an information<br />

state and <strong>the</strong> meanings <strong>of</strong> formulas are functions from information states to information<br />

states.<br />

2 Questions and Salient Possibilities<br />

In this section, we define a 1st-order language with a question operator and an epistemic<br />

possibility operator. We <strong>the</strong>n define <strong>the</strong> structures (information states) used to give a<br />

semantics for this language and define <strong>the</strong> notion <strong>of</strong> a salient possibility. Finally, we give<br />

<strong>the</strong> semantics for this language and define what it means for a formula to be an answer to<br />

a question. This definition will allow modal and non-modal formulas to answer questions.<br />

DEFINITION 1. We define <strong>the</strong> languages L1, L2, and L3 as follows:<br />

(i) If P is an n-place predicate and t1...tn are terms, <strong>the</strong>n P(t1...tn) ∈ L1<br />

(ii) If φ,ψ ∈ L1, <strong>the</strong>n φ ∧ ψ ∈ L1 and ¬ φ ∈ L1<br />

(iii) If φ ∈ L1, <strong>the</strong>n ⋄φ ∈ L2<br />

(iv) If φ,ψ ∈ L2, <strong>the</strong>n φ ∧ ψ ∈ L2 and ¬ φ ∈ L2<br />

(v) If φ ∈ L1, <strong>the</strong>n ?φ ∈ L3<br />

(vi) If φ,ψ ∈ L3, <strong>the</strong>n φ ∧ ψ ∈ L3<br />

The language L we discuss in this paper is defined L = L1 ∪ L2 ∪ L3. As a notational<br />

convention, we write atomic sentences (i.e. atomic formulas with no free variables) and<br />

Boolean combinations <strong>of</strong> atomic sentences as p, q, ¬q,p ∧ q, etc.<br />

In a standard update semantics, information states are sets <strong>of</strong> indices, where an index<br />

assigns an individual from a domain D to each constant <strong>of</strong> <strong>the</strong> language and an n-ary<br />

relation to each n-place predicate. In this papers framework, an information state is a set<br />

<strong>of</strong> sets <strong>of</strong> indices A such that <strong>the</strong>re is an I* ∈ A that for all I m ∈ A, I m ⊆ I*. The intui-tion<br />

behind this definition is that this maximal set I* represents <strong>the</strong> common ground at a point<br />

in a conversation. Any subset <strong>of</strong> I* is a possible future state <strong>of</strong> <strong>the</strong> common ground, and<br />

hence could be a possibility that <strong>the</strong> discourse participants are interested in. However,<br />

recalling <strong>the</strong> introduction, all such subsets are not always <strong>of</strong> interest to <strong>the</strong> discourse<br />

participants. With that in mind we think <strong>of</strong> <strong>the</strong> subsets I m <strong>of</strong> I* as salient possibilities. We<br />

formally define information states below:<br />

DEFINITION 2. Let I be <strong>the</strong> set <strong>of</strong> all indices for <strong>the</strong> language L. We define an information<br />

state to be a set Γ = {P1,...,Pn,...} such that:<br />

(i) Pi ⊆ I for all n (ii) For some i, Pi = Γ<br />

(iii) There is an i such that for all j, Pj ⊆ Pi. This maximal set Pi is called <strong>the</strong> common<br />

ground.<br />

We write CG (common ground) for <strong>the</strong> maximal set Pi defined in (iii), and write<br />

Γ = {CG,P1,...,Pn,...,∅}. In some cases, we refer to information states as contexts.<br />

29


Though every element <strong>of</strong> an information state is a salient possibility (except <strong>the</strong> empty<br />

set, which is present to simplify <strong>the</strong> definition <strong>of</strong> an answer to a question), <strong>the</strong> sets in an<br />

information state do not exhaust its salient possibilities. Ra<strong>the</strong>r, <strong>the</strong> salient possibilities<br />

in an information state are generated by closing it under union and intersection. Salient<br />

possibilities are defined this way because, intuitively, if P1 and P2 are salient possibilities<br />

in a context, <strong>the</strong>n if <strong>the</strong>y are not mutually exclusive it is also possible that <strong>the</strong>y both obtain.<br />

Thus, <strong>the</strong>ir intersection should count as a salient possibility as well. Similar reasoning<br />

supports considering <strong>the</strong> union <strong>of</strong> salient possibilities to be a salient possibility.<br />

DEFINITION 3. Let Γ be an information state. Then 〈Γ〉, <strong>the</strong> set <strong>of</strong> salient possibilities in<br />

Γ, is defined as <strong>the</strong> <strong>the</strong> smallest set such that:<br />

(i) If P ∈ Γ, <strong>the</strong>n P ∈ 〈Γ〉 (ii) If P1, P2 ∈ 〈Γ〉, <strong>the</strong>n P1 ∪ P2 ∈ 〈Γ〉<br />

(iii) If P1, P2 ∈ 〈Γ〉, <strong>the</strong>n P1 ∩ P2 ∈ 〈Γ〉.<br />

We need one more concept in order to define <strong>the</strong> semantics <strong>of</strong> wh-questions. On our<br />

analysis, wh-questions introduce salient possibilities corresponding to each <strong>of</strong> <strong>the</strong>ir possible<br />

answers into an information state. To represent <strong>the</strong> possible answers to a wh-question,<br />

we use <strong>the</strong> relations defined in definition 5 (Definition 4 is a standard account <strong>of</strong> satisfaction,<br />

which is necessary for articulating definition 5):<br />

DEFINITION 4. Let φ ψ ∈ L1, let i be an index, and let g be a variable assignment function.<br />

(i) If φ = Qt1...tn, <strong>the</strong>n i |= φ [g] iff 〈[t1] i,g ,...,[tn] i,g 〉 ∈ i(Q)<br />

(ii) i |= φ ∧ ψ [g] iff i |= φ [g] and i |= ψ [g] (iii) i |= ¬φ [g] iff i �|= φ [g]<br />

DEFINITION 5. Let φ ∈ L1, and let i and j be indices. We say that i ≡ j (mod φ) if for all<br />

assignments g, i |= φ [g] iff i |= ψ [g]<br />

Given a formula φ, definition 6 defines <strong>the</strong> conditions under which two indices give<br />

<strong>the</strong> same answer to <strong>the</strong> question ?φ. For a sentence φ <strong>of</strong> L1, i ≡ j (mod φ) will hold as<br />

long as i and j assign φ <strong>the</strong> same truth value. But for a formula <strong>of</strong> L1 with free variables,<br />

congruence modulo φ requires that <strong>the</strong> indices assign <strong>the</strong> same denotations (or just similar<br />

denotations if <strong>the</strong> formula contains both free variables and constants) to predicates that<br />

occur in φ. The following examples illustrates how this definition works.<br />

EXAMPLE 1.<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(i) Let ?φ = ?Px (Who came to <strong>the</strong> party?) i ≡ j (mod φ) if i(P) = j(P), or informally, if <strong>the</strong><br />

same people came to <strong>the</strong> party according to indices i and j.<br />

(ii) Let ?φ = ?Ibx (Who did Bill invite to <strong>the</strong> party?) i ≡ j (mod φ) if<br />

{d ∈ D| 〈d, i(b)〉 ∈ i(P)} = {d ∈ D| 〈d, j(b)〉 ∈ j(P)}.<br />

(iii) Let ?φ = ?p (Did Alice help Bill?) i ≡ j (mod φ) if i |= p iff j |= p.<br />

In our update semantics, <strong>the</strong> effect <strong>of</strong> a formula on an information state will be defined<br />

in terms <strong>of</strong> <strong>the</strong> effects it has on certain elements <strong>of</strong> <strong>the</strong> information state. Thus, to state<br />

our update semantics for information states we require an update semantics for sets <strong>of</strong><br />

indices as well. The update semantics for sets indices is fairly simple, and is roughly <strong>the</strong><br />

same as that given in Veltman (1996).<br />

DEFINITION 6. Let φ ∈ L1 ∪ L2 be a sentence, and let P be a set <strong>of</strong> indices. We define <strong>the</strong><br />

update <strong>of</strong> P with φ, written P[φ], as follows:<br />

(i) P[p] = {i ∈ P | i |= p} (ii) P[φ ∧ ψ] = P[φ][ψ]<br />

(iii) P[¬φ] = {i ∈ P | i �∈ P[φ]} (iv) P[⋄φ] = P if P[φ] �= ∅<br />

(v) P[⋄φ] = ∅ if P[φ] = ∅<br />

30


We now state our update semantics for information states.<br />

DEFINITION 7. Let Γ = {CG, P1,...,Pn, ∅} be an information state, and let φ ∈ L be a<br />

sentence. We define <strong>the</strong> update <strong>of</strong> Γ with φ as follows:<br />

(i) Γ[p] = { CG[p], P1[p],...,Pn[p], ∅}<br />

(ii) Γ[¬φ] = { CG[¬φ], P1[¬φ],...,Pn[¬φ], ∅}<br />

(iii) Γ[φ ∧ ψ] = Γ[φ][ψ]<br />

(iv) Γ[⋄φ] = {CG[⋄φ], P1[⋄φ],...,Pn[⋄φ], ∅} if <strong>the</strong>re is a P ∈ Γ such that P[φ]] = P<br />

(v) Γ[⋄φ] = { CG[⋄φ], CG[φ], P1[φ],...,Pn, ∅} if <strong>the</strong>re is a P ∈ Γ such that P[φ]] = P<br />

(vi) Γ = Γ ∪ {{ i | i ≡ j (mod φ)} | j ∈ CG}.<br />

Clauses (i) - (vi) apply as long as CG[φ] �= ∅. In <strong>the</strong> degenerate case that CG[φ] = ∅, we set<br />

Γ[φ] = {∅}, <strong>the</strong> absurd state.<br />

In <strong>the</strong> semantics defined above, although epistemic modals can change information<br />

states <strong>the</strong>y cannot have a non-trivial effect on <strong>the</strong> common ground. This is as it should<br />

be: only constructions that provide information should change <strong>the</strong> common ground, and<br />

epistemic modals do not play that role in a dialogue. Thus, this semantics complies with<br />

<strong>the</strong> requirement set forward in <strong>the</strong> introduction: epistemic modals change a context in a<br />

significant yet non-informative manner.<br />

More specifically, epistemic modals change a context by drawing attention to certain<br />

possibilities. However, <strong>the</strong> manner in which an epistemic modal accomplishes this depends<br />

on <strong>the</strong> possibilities that are already salient in <strong>the</strong> dialogue’s context. If <strong>the</strong> possibility<br />

an epistemic modal calls attention to is not under discussion at all, <strong>the</strong>n <strong>the</strong> epistemic<br />

modal adds this possibility to <strong>the</strong> set <strong>of</strong> salient possibilities in <strong>the</strong> context, acting in <strong>the</strong><br />

manner specified in clause (v) (see example 4 below). But if this possibility is already<br />

under consideration, an epistemic modal draws attention to it by eliminating salient possibilities<br />

that are inconsistent with it from <strong>the</strong> context. In this latter case, epistemic modals<br />

act in <strong>the</strong> manner specified in clause (iv) (see example 2 - 3 below).<br />

An epistemic modal acts in accordance with clause (iv) when it functions as an answer<br />

to a question. Questions introduce several salient possibilities in a context, and an<br />

epistemic modal acts to draw attention to some answers ra<strong>the</strong>r than o<strong>the</strong>rs. But epistemic<br />

modals arent always used to answer questions. For example, <strong>the</strong>y can be used to provide<br />

someone with a warning:<br />

(4) A: Alice and I are going fishing in Leiden tomorrow.<br />

B: It might be illegal to fish in Leiden.<br />

A: Oh, I hadn’t thought to check that; thanks.<br />

B draws A’s attention to <strong>the</strong> possibility that fishing is illegal in Leiden, a possibility that<br />

A had overlooked but should investigate. Here, it is essential that B’s utterance contributes<br />

a new salient possibility to <strong>the</strong> context.<br />

Using this framework, we now define <strong>the</strong> conditions under which a formula φ answers<br />

a question ψ. Note that this definition admits full and partial answers.<br />

DEFINITION 8.<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Let φ ∈ L, and let ψ ∈ L3. We say that φ answers ψ if 〈{I,∅}[ψ][φ]〉 ⊂ 〈{I,∅}[ψ]〉.<br />

31


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Thus, φ answers ψ if φ removes some salient possibilities that ψ introduces. This<br />

notion <strong>of</strong> answerhood should be familiar from a partition <strong>the</strong>ory <strong>of</strong> questions: in both<br />

cases, answering a question amounts to eliminating some <strong>of</strong> <strong>the</strong> possibilities it introduces.<br />

But an important, unique feature <strong>of</strong> this definition is that an answer doesnt necessarily give<br />

information: it suffices that an answer suggest certain possibilities for <strong>the</strong> questioner to<br />

consider.<br />

We close this section by working through a few examples. We use <strong>the</strong> following notational<br />

conventions: {p} = {i ∈ I | i |= p}, {¬p} = {i ∈ I | i �|= p} etc.<br />

Example 2: A Polar Question.<br />

Recall example (2), and let p and q be <strong>the</strong> propositions ‘Bill is coming to <strong>the</strong> party’ and ‘John<br />

is coming to <strong>the</strong> party’ respectively. Let Γ = {I, ∅}; <strong>the</strong>n ⋄p ∧ ⋄q answers ?p ∧ ?q: Γ[?p ∧<br />

?q] = {I, {p}, {¬p}, {q}, {¬q}, ∅} = Γ 1 . Then: Γ 1 [⋄p ∧ ⋄q] = {I, {p}, {q}, ∅} = Γ 2 , and<br />

since 〈Γ 2 〉 ⊂ 〈Γ 1 〉, ⋄p ∧ ⋄q answers ?p ∧ ?q.<br />

Example 3: A Wh-Question.<br />

Consider <strong>the</strong> question ‘Who is likes to paint?’, and note that ‘Bill might like to paint’ felicitously<br />

answers this question. Let Px be ‘x likes to paint’, and let b be Bill. Let Γ = {I, ∅}.<br />

Then: Γ[?Px] = Γ ∪ {{i | i ≡ j (mod Px)} | j ∈ I }<br />

= Γ ∪ {{i |i(P) = D*}| D* ⊆ D} = Γ 1 . Then<br />

Γ 1 [⋄Pb] = Γ ∪ {{ i | i(P) = D*}[⋄Pb] |D* ⊆ D}<br />

= Γ ∪ {{i | i(b) ∈ i(P) and i(P) = D*}|D* ⊆ D such that i(b) ∈ D*}<br />

Since Γ 1 [⋄Pb] ⊂ Γ 1 , ⋄Pj is an answer to ?Px.<br />

Examples 3 and 4 bring out an important feature <strong>of</strong> this paper’s framework: epistemic<br />

modals behave much like questions. Both questions and epistemic modals draw attention<br />

to certain possibilities without committing <strong>the</strong> speaker to a position on whe<strong>the</strong>r or not<br />

<strong>the</strong>se possibilities are actual. Epistemic modals, however, are stronger than questions:<br />

modals draw attention to fewer possibilities than questions, suggesting that <strong>the</strong> chosen<br />

possibilities are somehow more important than <strong>the</strong> ignored possibilities. The notion <strong>of</strong> a<br />

salient possibility allows us to represent this similarity between questions and epistemic<br />

modals in fully formal way.<br />

Example 4: Raising Issues Without Questions.<br />

Recall (4), and let p and q be ‘Alice and A are going fishing in Leiden tomorrow’ and ‘It’s<br />

illegal to fish in Leiden’ respectively. Let Γ = {I, ∅}. Then<br />

Γ[p][⋄q] = {{p}, {p ∧ q}, ∅}. Here, since no possibility in Γ[p] satisfied q, <strong>the</strong> epistemic<br />

modal acted to add <strong>the</strong> possibility {p ∧ q} to <strong>the</strong> context. Thus, even though no questions<br />

have been asked in this context, B is able to bring A’s attention to some issue by using an<br />

epistemic modal.<br />

Example 5: Infelicitous Answer.<br />

Responding to a polar question ?φ with ⋄φ ∧ ⋄¬φ should not count as answering <strong>the</strong> question:<br />

ra<strong>the</strong>r, responding to a question with ‘maybe, maybe not’ is a deliberate and almost<br />

reticent refusal to answer <strong>the</strong> question. Our semantics allows us to account for this: {I,<br />

∅}[?p][⋄p ∧ ⋄¬p] = {I, {p}, {¬p}, ∅}[⋄p ∧ ⋄¬p]<br />

= {I, {p}, {¬p}, ∅}[⋄p][⋄¬p] = {I, {p}, ∅}[⋄¬p] = {I, {p}, {¬p}, ∅}. Thus,<br />

⋄p ∧ ⋄¬p does not answer ?p. Moreover, ⋄p ∧ ⋄¬p is actually equivalent to ?p in this information<br />

state.<br />

In general, ?φ and ⋄φ ∧ ⋄¬φ are equivalent in any information state that is consistent with<br />

both φ and ¬φ, so polar questions can almost be defined using epistemic modals (if we assume<br />

that polar questions presuppose that both <strong>of</strong> <strong>the</strong>ir answers are possible, polar questions<br />

can be defined in terms <strong>of</strong> <strong>the</strong> epistemic modality operator).<br />

32


3 Comparison With a Partition Semantics <strong>of</strong> Questions<br />

In this section, we will slightly change our semantics to yield a partition <strong>the</strong>ory <strong>of</strong> questions,<br />

1 and examine <strong>the</strong> difficulties it faces. These difficulties will bring to light problems<br />

that any partition <strong>the</strong>ory <strong>of</strong> questions faces in accounting for non-informative answers to<br />

questions, and point to an important feature <strong>of</strong> <strong>the</strong> <strong>the</strong>ory above that allows it to account<br />

for non-informative answers. For ease <strong>of</strong> exposition, we only consider polar questions: in<br />

this section, suppose that we only allow atomic sentences to be well-formed elements <strong>of</strong><br />

L1.<br />

Using our terminology, in a partition <strong>the</strong>ory <strong>of</strong> questions a question divides <strong>the</strong> common<br />

ground into <strong>the</strong> salient possibilities corresponding to its different answers. Crucially,<br />

salient possibilities are not added to <strong>the</strong> context as <strong>the</strong>y were in section 2. Thus, to state<br />

a partition <strong>the</strong>ory <strong>of</strong> questions in our framework we have to alter <strong>the</strong> definition <strong>of</strong> an<br />

information state: we no longer assume an information state contains a maximal set <strong>of</strong><br />

indices, and for purposes <strong>of</strong> this section we remove clause (ii) from <strong>the</strong> definition <strong>of</strong> an<br />

information state.<br />

Since information states no longer contain a common ground, clause (v) in <strong>the</strong> update<br />

semantics for information states is difficult to translate to this new system. For purposes<br />

<strong>of</strong> this section, <strong>the</strong>n, we also remove clause (v) from this definition, and stipulate that<br />

epistemic modals always change an information state according to clause (iv).<br />

Our partition <strong>the</strong>ory <strong>of</strong> questions results from changing definition 8 and clause (vi) in<br />

definition 7 to <strong>the</strong> following.<br />

DEFINITION 9.<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(i) Let Γ = {P1,...,Pn} be an information state, and let ?φ ∈ L. Then we define<br />

Γ[?φ] = {P1[φ]P1[¬φ],...,Pn[φ], Pn[¬φ]}<br />

(ii) Let φ ∈ L and let ψ ∈ L3. We say that φ answers ψ if {I}[ψ][φ] ⊂ {I}[ψ].<br />

An immediate problem with this <strong>the</strong>ory is that modal formulas can eliminate blocks <strong>of</strong><br />

a partition. This is <strong>the</strong> case because after a question ?p, ⋄p will eliminate any possibility<br />

that was just updated with ¬p. While this is good in so far as under this <strong>the</strong>ory modal formulas<br />

can answer questions, it has o<strong>the</strong>r disastrous consequences. Since modal formulas<br />

can eliminate blocks <strong>of</strong> a partition, <strong>the</strong>y can provide as much information as non-modal<br />

formulas: for any information state Γ, Γ[?p][⋄p] = Γ[?p][p]. This is <strong>the</strong> case because both<br />

p and ⋄p will eliminate <strong>the</strong> possibilities from Γ that have been updated with ¬p and have<br />

no effect on <strong>the</strong> possibilities that have been updated with p. This is a bad result: Γ[?p][⋄p]<br />

[¬p] should be consistent, but Γ[?p][p] [¬p] shouldn’t be. While modals and non-modals<br />

should both count as answers to questions, <strong>the</strong>y should not answer questions in <strong>the</strong> same<br />

way.<br />

On a more general level, <strong>the</strong> problem with <strong>the</strong> partition semantics is that any update<br />

has to provide information or add possibilities, and possibilities can only be removed by<br />

information. This leads to trouble with epistemic modals: if one lets an epistemic modal<br />

answer a question, it must provide information and hence function far too much like a<br />

non-modal. But, on <strong>the</strong> o<strong>the</strong>r hand, if one posits that an epistemic modal doesnt provide<br />

1 For purposes <strong>of</strong> this paper, a partition semantics for questions is a semantics that holds: (i) a question<br />

changes a context by partitioning an information state, and (ii) to answer a question is to remove blocks<br />

from this partition. The partition semantics given in Groenendijk (1999) is similar to <strong>the</strong> one we present in<br />

this section.<br />

33


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

information, <strong>the</strong>re is no way to say how it could change an information state in a way that<br />

answers a question.<br />

In <strong>the</strong> framework presented above this problem is dealt with by separating <strong>the</strong> common<br />

ground, and hence <strong>the</strong> information, from <strong>the</strong> salient possibilities. This change makes noninformative<br />

answers to questions possible: epistemic modals can eliminate possibilities<br />

without changing <strong>the</strong> information in <strong>the</strong> common ground. However, by connecting <strong>the</strong><br />

meaning <strong>of</strong> a question to its possible answers in a context, and by identifying answers to<br />

questions with <strong>the</strong> elimination <strong>of</strong> possibilities, this approach retains much <strong>of</strong> <strong>the</strong> spirit <strong>of</strong><br />

<strong>the</strong> partition <strong>the</strong>ory <strong>of</strong> questions.<br />

4 Fur<strong>the</strong>r Issues and Expansions <strong>of</strong> <strong>the</strong> System<br />

In this section, I will discuss some expansions <strong>of</strong> <strong>the</strong> system defined above and consider<br />

two objections to it.<br />

First, I will discuss <strong>the</strong> objections. Though <strong>the</strong> idea that epistemic modals can answer<br />

wh-questions or o<strong>the</strong>r complex questions by suggesting possible answers is quite natural,<br />

some readers may find <strong>the</strong> suggestion that epistemic modals answer polar questions by<br />

suggesting possible answers a bit odd. After all, someone asking a polar question clearly<br />

has both possibilities in mind, so how can simply making one <strong>of</strong> <strong>the</strong>m more salient in <strong>the</strong><br />

context count as felicitously answering her question?<br />

Dealing with this objection involves delving into <strong>the</strong> pragmatics <strong>of</strong> epistemic modals,<br />

and more specifically <strong>the</strong> pragmatic role that salient possibilities play in a context. This<br />

topic would take a great deal <strong>of</strong> space to treat, and is beyond <strong>the</strong> scope <strong>of</strong> this paper. But<br />

to respond to <strong>the</strong> objection we note that one very plausible pragmatic principle governing<br />

<strong>the</strong> use <strong>of</strong> epistemic modals is that, in general, one should only focus attention to some<br />

possibility if one has some reason to believe that it is <strong>the</strong> case. To see this, note how<br />

infelicitous dialogue (5) sounds:<br />

(5) A: Are John and Bill coming to <strong>the</strong> party?<br />

B: They might.<br />

A: Why do you say that?<br />

B: I dont know; <strong>the</strong>y just might.<br />

Thus, pragmatically, answering a polar question with an epistemic modal can commit <strong>the</strong><br />

speaker to having some reason to believe that <strong>the</strong> possibility made salient by her answer<br />

actually obtains. This pragmatic dimension <strong>of</strong> epistemic modals makes it clear how a<br />

speaker can answer a polar question simply by making one <strong>of</strong> <strong>the</strong> possible answers ra<strong>the</strong>r<br />

than <strong>the</strong> o<strong>the</strong>r salient in <strong>the</strong> context.<br />

Ano<strong>the</strong>r objection to this framework questions <strong>the</strong> idea that, given <strong>the</strong> informal description<br />

<strong>of</strong> salient possibilities in <strong>the</strong> introduction, it makes sense to say that epistemic<br />

modals actually eliminate salient possibilities that questions introduce. After all, if a question<br />

is answered by an epistemic modal, its possible answers that are inconsistent with <strong>the</strong><br />

epistemic modal aren’t completely forgotten about. But in <strong>the</strong> formal system, <strong>the</strong>se possibilities<br />

have <strong>the</strong> same status as many o<strong>the</strong>r possibilities that <strong>the</strong> interlocutors haven’t<br />

given any thought to. Thus, this objection concludes, holding that epistemic modals actually<br />

eliminate salient possibilities from a context is far too strong.<br />

34


We take this objection seriously, and admit that <strong>the</strong> definition <strong>of</strong> salient possibilities<br />

given above is too coarse. A better definition would make salience into a scalar notion.<br />

With a scalar notion <strong>of</strong> salience, we could say that <strong>the</strong> salient possibilities eliminated by<br />

an epistemic modal acting as an answer to a question are less salient than those still in<br />

<strong>the</strong> context, but more salient than many o<strong>the</strong>r subsets <strong>of</strong> <strong>the</strong> common ground. A potential<br />

candidate for this scale is defined below:<br />

Scale.<br />

Let Γ = {CG, P1,,Pn, ∅} be an information state, and let P ⊆ CG.<br />

(i) P is 1-salient if P ∈ 〈Γ〉 and P - CG �∈ 〈Γ〉<br />

(ii) P is 2-salient if P ∈ 〈Γ〉 and P - CG ∈ 〈Γ〉<br />

(iii) P is 3-salient if P �∈ 〈Γ〉 and P - CG ∈ 〈Γ〉<br />

(iv) P is 4-salient if P �∈ 〈Γ〉 and P - CG �∈ 〈Γ〉<br />

Here, 1-salient propositions are most salient, and 4-salient propositions are least salient.<br />

In general, after an epistemic modal answers a question it changes its possible answers<br />

from 2-salient propositions to ei<strong>the</strong>r 1-salient propositions or 3-salient propositions, thus<br />

making possible answers ei<strong>the</strong>r more or less salient and not rendering any forgotten. Thus,<br />

replacing an absolute notion <strong>of</strong> salience with a scalar one solves <strong>the</strong> problem raised by <strong>the</strong><br />

objection.<br />

In this papers semantics, epistemic modals can only focus attention on possibilities that<br />

are subsets <strong>of</strong> <strong>the</strong> common ground. This is problematic because some uses <strong>of</strong> epistemic<br />

modals make possibilities that lie outside <strong>of</strong> <strong>the</strong> common ground salient in a conversation.<br />

(6) A: There arent any deer in this part <strong>of</strong> <strong>the</strong> forest.<br />

B: (2 hours later) Look over <strong>the</strong>re! Ho<strong>of</strong>prints! There might be deer after all.<br />

These modal assertions also challenge previously accepted information without directly<br />

contradicting it. To account for this use <strong>of</strong> epistemic modals, one could posit that if ⋄φ<br />

is inconsistent with <strong>the</strong> common ground <strong>of</strong> an information state, <strong>the</strong>n ⋄φ acts on this<br />

information state by: (i) introducing a salient possibility corresponding to <strong>the</strong> revision <strong>of</strong><br />

CG with φ, (ii) transforming <strong>the</strong> information states common ground into <strong>the</strong> union <strong>of</strong> this<br />

revision and <strong>the</strong> old common ground, and (iii) performing a similar operation on <strong>the</strong> o<strong>the</strong>r<br />

possibilities in <strong>the</strong> information state. Thus, though <strong>the</strong> papers <strong>the</strong>ory itself cannot account<br />

for uses <strong>of</strong> epistemic modals like (6), augmented with a <strong>the</strong>ory <strong>of</strong> belief revision it can<br />

provide an elegant analysis.<br />

I would like to thank Paul Dekker.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Acknowledgements<br />

De Rose, K. (1991). Epistemic possibilities, Philosophical Review 100.: 581–605.<br />

Egan, A., Hawthorne, J. and Wea<strong>the</strong>rson, B. (2005). Epistemic modals in context, in<br />

G. Preyer and P. Peter (eds), Contextualism in Philosophy, Oxford University Press,<br />

Oxford, pp. <strong>13</strong>1– 170.<br />

35


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Groenendijk, J. (1999). The logic <strong>of</strong> interrogation, in T. Mat<strong>the</strong>ws and D. Strolovitch<br />

(eds), The <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Ninth Conference on Semantics and Linguistic Theory,<br />

CLC Publications, Ithaca, NY, pp. 109–126.<br />

Groenendijk, J. (2007). Inquisitive semantics: Two possibilities for disjunction.<br />

Groenendijk, J. and Stokh<strong>of</strong>, M. (1997). Questions, in J. van Ben<strong>the</strong>m and A. T. Meulen<br />

(eds), Handbook <strong>of</strong> Logic and Language, Elsevier.<br />

Stalnaker, R. (1978). Assertion, Syntax and Semantics 9.<br />

Veltman, F. (1996). Defaults in update semantics, Journal <strong>of</strong> Philosophical Logic 25(3).<br />

36


BARE PREDICATION AND KINDS ∗<br />

Bert Le Bruyn<br />

Utrecht University<br />

Abstract. This paper treats <strong>the</strong> distinction between singular nominal predication with and<br />

without indefinite article in languages like Dutch. The former variant is referred to as nonbare<br />

predication, <strong>the</strong> latter as bare predication. I make <strong>the</strong> following claims: (i) temporal<br />

analyses <strong>of</strong> <strong>the</strong> distinction between bare and non-bare predication are on <strong>the</strong> wrong track,<br />

(ii) bare predication needn’t be analyzed as a lexical phenomenon, (iii) non-bare predication<br />

should be analyzed as kind-membership predication.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In order to understand <strong>the</strong> role played by <strong>the</strong> indefinite article in predicate position it is instructive<br />

to look at instances <strong>of</strong> singular nominal predication in which <strong>the</strong> indefinite article<br />

does not appear. These instances are subsumed under <strong>the</strong> notion <strong>of</strong> bare predication (see<br />

(Kupferman, 1991), (Broekhuis, Keizer and Den Dikken, 2003), (de Swart, Winter and<br />

Zwarts, 2005), (de Swart, Winter and Zwarts, 2007), (Matushansky and Spector, 2005),<br />

(Déprez, 2005), (Munn and Schmitt, 2005), (Roy, 2006), (Beyssade and Dobrovie-Sorin,<br />

2005)). In English bare predication is marginal but a language like Dutch seems to have<br />

a productive paradigm:<br />

(1) (a) Jan is slager. (litt. John is butcher) (b) Jan is moslim. (litt. John is muslim)<br />

(c) Jan is Belg. (litt. John is Belgian) (d) Jan is hertog. (litt. John is duke)<br />

Nouns that typically occur in bare predication are linked to pr<strong>of</strong>essions (1a), religions<br />

(1b), nationalities (1c) and titles (1d). It is important to note that this is not an idiosyncracy<br />

<strong>of</strong> Dutch but a pervasive phenomenon in Romance and Germanic languages (examples<br />

taken from (de Swart et al., 2007)):<br />

(2) Es negrero. (Spanish, litt. Is trader in black slaves); João é médico. (Portuguese,<br />

litt. John is doctor); Gianni è dottore. (Italian, litt. John is doctor); Jean est<br />

médecin. (French, litt. John is doctor); Olivier var skuespiller. (Danish, litt. Oliver<br />

was actor); Herr Weber är katolik. (Swedish, litt. Mr Weber is catholic); Han er<br />

lærer. (Norwegian, litt. He is teacher); Er ist praktizierender Katholik. (German,<br />

litt. He is practicing catholic).<br />

∗ This paper should be read as a working paper that presents thoughts and bits <strong>of</strong> analysis that are not<br />

finished yet. I’m very grateful to audiences at ConSOLE XVI, my UiL-OTS kermit-lecture and <strong>the</strong> LSB<br />

2008 Linguists’ Day and to <strong>the</strong> reviewers <strong>of</strong> <strong>the</strong> <strong>ESSLLI</strong> student session for very useful comments and<br />

discussion. Special thanks also to Min Que, Gianluca Giorgolo, Dorota Klimek, Sander Lestrade, Joost<br />

Zwarts and Henriëtte de Swart.<br />

37


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In this paper I will defend three claims concerning bare predication. The first is that analyses<br />

that reduce <strong>the</strong> distinction between bare and non-bare predication to a temporal one<br />

are not on <strong>the</strong> right track (see paragraph 2). The second is that a purely lexical approach to<br />

bare predication is not tenable (see paragraph 3). The third and final one is that non-bare<br />

predication should be analyzed as kind-membership predication (see paragraph 4).<br />

2 Bare predication and time<br />

When comparing sentences (3a) and (3b) most informants tend to say that <strong>the</strong> a-variant is<br />

more ’eventive’ than <strong>the</strong> b-variant (Roy, 2006):<br />

(3) (a) Paul est acteur. (French, litt. Paul is actor)<br />

(b) Paul est un acteur. (French, litt. Paul is an actor)<br />

This intuition has led linguists to explore a temporal analysis <strong>of</strong> bare predication. In its<br />

simplest form it would state that bare predication is concerned with transient properties<br />

whereas non-bare predication is concerned with permanent ones. The most convincing<br />

argument in favour <strong>of</strong> this analysis comes from ’lifetime effects’:<br />

(4) (a) Paul était médecin. (French, litt. Paul was doctor)<br />

(b) Paul était un médecin. (French, litt. Paul was a doctor)<br />

Sentence (4a) can be understood as stating that Paul used to be a doctor and that he’s<br />

retired now. Sentence (4b) can only mean that Paul is dead. Under <strong>the</strong> assumption that<br />

non-bare predication is concerned with permanent properties <strong>the</strong> interpretation <strong>of</strong> sentence<br />

(4b) follows: to cancel a permanent property one has to cancel <strong>the</strong> existence <strong>of</strong> <strong>the</strong><br />

entity <strong>the</strong> property applies to. The problem this analysis faces is that it predicts that inherently<br />

transient properties should always occur bare in predicate position. This prediction<br />

is not borne out (cf. (de Swart et al., 2007)):<br />

(5) ?? Marie est fille. (French, litt. Mary is girl)<br />

Ano<strong>the</strong>r temporal approach to bare predication is <strong>the</strong> one presented in (Roy, 2006) (variants<br />

are (Munn and Schmitt, 2005) and (Déprez, 2005)). Roy assumes all nouns come<br />

with an event argument that has to be bound. When bound by <strong>the</strong> indefinite article it is<br />

signalled that <strong>the</strong> predication holds for <strong>the</strong> maximal event around <strong>the</strong> ’time <strong>of</strong> utterance’<br />

(given by <strong>the</strong> Tense on <strong>the</strong> copula) and that this event cannot be split up into smaller intervals.<br />

When bound by Tense it is signalled that <strong>the</strong> maximal event can be split up. The<br />

facts that led to this analysis are presented in (6) and (7):<br />

(6) (a) Jean est pr<strong>of</strong>esseur le jour, danseur la nuit.<br />

(French, litt. John is teacher by day, dancer by night)<br />

(b) ?? Jean est un pr<strong>of</strong>esseur le jour, un danseur la nuit.<br />

(French, litt. John is a teacher by day, a dancer by night)<br />

(7) (a) Paul est devenu chanteur.<br />

(French, litt. Paul has become singer)<br />

(b) ?? Paul est devenu un chanteur.<br />

(French, litt. Paul has become a singer)<br />

38


The reason why <strong>the</strong> b-variants are out on Roy’s analysis is that adverbials like le jour<br />

... la nuit (’by day ... by night’) and verbs like devenir (’become’) split up <strong>the</strong> ’time <strong>of</strong><br />

utterance’. This is depicted for <strong>the</strong> adverbials in (8) and for <strong>the</strong> verb in (9).<br />

(8)<br />

(9)<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

It is important to note that in absence <strong>of</strong> temporal adverbials or verbs like devenir <strong>the</strong>re is<br />

no clear reason in Roy’s analysis to prefer bare over non-bare predication or vice versa.<br />

In order to account for preferences like in (5) Roy has to assume that whenever world<br />

knowledge makes it implausible / impossible that <strong>the</strong> maximal event is split up <strong>the</strong> indefinite<br />

article is obligatory and that whenever world knowledge makes it plausible / possible<br />

that <strong>the</strong> maximal event is split up <strong>the</strong> indefinite article ends being obligatory.<br />

The problem Roy’s analysis faces is that <strong>the</strong> incompatibility <strong>of</strong> non-bare predication with<br />

temporal adverbials or verbs like devenir is only a strong tendency that surfaces as an<br />

epiphenomenon. To show this it is necessary to anticipate <strong>the</strong> analysis presented in paragraph<br />

4. There it is claimed that non-bare predication signals kind-membership. A sentence<br />

like (10) e.g. would mean that White Fang belongs to <strong>the</strong> kind wolf.<br />

(10) White Fang is een wolf.<br />

(Dutch, litt. White Fang is a wolf)<br />

What makes kind-membership special is that in general one cannot change from one kind<br />

into ano<strong>the</strong>r. White Fang e.g. cannot turn into a sheep or a wild boar. This explains why<br />

non-bare predication in general is incompatible with temporal adverbials or verbs like devenir.<br />

There are however instances <strong>of</strong> transformations in nature and in folklore: e.g. <strong>the</strong><br />

transformation from a caterpillar into a butterfly and from a man into a werewolf. The former<br />

can be described in a sentence with <strong>the</strong> verb devenir and <strong>the</strong> latter in a sentence with<br />

temporal adverbials. Roy’s analysis predicts that in <strong>the</strong>se sentences non-bare predication<br />

is not allowed. An analysis that takes non-bare predication to signal kind-membership<br />

predicts <strong>the</strong> opposite. As shown by <strong>the</strong> acceptability <strong>of</strong> (11) and (12) it is <strong>the</strong> latter that<br />

makes <strong>the</strong> right prediction.<br />

(11) In Lady Hawke is Rutger Hauer ’s nachts een wolf en overdag een mens.<br />

(Dutch, litt. In Lady Hawke is Rutger Hauer by night a wolf and by day a man)<br />

(12) La chenille est devenue un papillon.<br />

(French, litt. The caterpillar has become a butterfly)<br />

39


From <strong>the</strong> preceding I conclude that <strong>the</strong> existing analyses that try to reduce <strong>the</strong> distinction<br />

between bare and non-bare predication to a temporal one are not on <strong>the</strong> right track. It was<br />

important to establish this given that most existing analyses are cast in temporal terms<br />

whereas <strong>the</strong> one I will defend in paragraph 4 is not.<br />

3 Bare predication and <strong>the</strong> lexicon<br />

In <strong>the</strong> literature on bare predication one <strong>of</strong> <strong>the</strong> following positions is <strong>of</strong>ten taken: (i)<br />

nouns that usually appear in non-bare predication are marked in <strong>the</strong> lexicon (see e.g.<br />

(Matushansky and Spector, 2005)); (ii) nouns that usually appear in bare predication are<br />

marked in <strong>the</strong> lexicon (see e.g. (de Swart et al., 2005), (de Swart et al., 2007)). In this<br />

section it will be argued that purely lexical standpoints like (i) and (ii) should be amended.<br />

In order to do so it will be shown that :<br />

(a) all nouns that usually appear in bare predication can appear in non-bare predication;<br />

(b) all nouns that usually appear in non-bare predication can appear in bare predication.<br />

It should be noted that (a) and (b) don’t constitute decisive arguments against lexical<br />

analyses. They do however make <strong>the</strong>m less appealing.<br />

31 Bare predication nouns<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

As stated in paragraph 1 <strong>the</strong>re is a subclass <strong>of</strong> nouns that usually appear in bare predication.<br />

They include nouns related to pr<strong>of</strong>essions, religions, nationalities and titles. It is<br />

however well-known that <strong>the</strong>se nouns appear fairly frequently in non-bare predication too<br />

(see e.g. (de Swart et al., 2005), (de Swart et al., 2005)). When <strong>the</strong>y do <strong>the</strong>y allow for<br />

<strong>the</strong>ir normal interpretation and an enriched one. This will be illustrated on <strong>the</strong> basis <strong>of</strong><br />

(<strong>13</strong>):<br />

(<strong>13</strong>) (a) Sil is beenhouwer. (Dutch, litt. Sil is butcher)<br />

(b) Sil is een beenhouwer. (Dutch, litt. Sil is a butcher)<br />

The a-variant is <strong>the</strong> unmarked one and simply states that Sil works as a butcher. The bvariant<br />

has <strong>the</strong> same interpretation but also allows <strong>the</strong> interpretation according to which<br />

Sil is not a butcher but has <strong>the</strong> characteristics we usually associate with butchers. A<br />

typical person <strong>the</strong> b-variant would apply to is a violent boxer. The enriched interpretation<br />

projects <strong>the</strong> (stereotypical) characteristics that are associated with a pr<strong>of</strong>ession on<br />

an individual. From a lexical standpoint one could see <strong>the</strong> enriched interpretation as an<br />

instance <strong>of</strong> coercion. Note though that if we store in our world knowledge that butcher is<br />

a pr<strong>of</strong>ession we can get <strong>the</strong> same coercion effect to arise.<br />

32 Non-bare predication nouns<br />

The majority <strong>of</strong> nouns in languages like Dutch usually appears in non-bare predication.<br />

Up to date <strong>the</strong>se nouns have been defined negatively; <strong>the</strong>y are those that are not related to<br />

pr<strong>of</strong>essions, religions, nationalities and titles.<br />

In <strong>the</strong> literature <strong>the</strong>re are two claims about nouns appearing in bare predication. The<br />

first is that <strong>the</strong>y are usually [+ human] (cf. (Matushansky and Spector, 2005) and (Roy,<br />

40


2006)). The second is that nouns referring to kinds (which would be a subset <strong>of</strong> nonbare<br />

predication nouns) can never appear in bare predication (cf. (Kupferman, 1991) and<br />

(Roy, 2006)). In order to argue that all non-bare predication nouns can in principle appear<br />

in bare predication <strong>the</strong> strongest claim would <strong>the</strong>refore be to say that even [-human] and<br />

[+kind] nouns can appear in bare predication. This is <strong>the</strong> claim I defend here.<br />

A noun that meets both <strong>the</strong> [-human] and <strong>the</strong> [+kind] criterion is wolf. An example <strong>of</strong><br />

wolf in non-bare predication was given in (10). Its bare variant would look as follows:<br />

(14) Ik ben wolf. (Dutch, litt. I am wolf)<br />

Even though (14) might seem ungrammatical at first sight it is acceptable in Dutch under<br />

a very specific interpretation, viz. <strong>the</strong> one in which wolf is a role in a game (e.g. <strong>the</strong><br />

werewolf game). This should not come as a surprise given that it is <strong>of</strong>ten claimed that<br />

bare predication nouns refer to roles in society:<br />

”[Bare predication nouns] usually [...] denote specific roles in society: pr<strong>of</strong>essions, religions<br />

or nationalities. O<strong>the</strong>r nominals (non-human or human) that are not related to such<br />

roles generally resist taking up a bare nominal position.” (de Swart et al., 2007)<br />

Under <strong>the</strong> assumption that any noun can be reinterpreted as referring to a role in a game<br />

<strong>the</strong>re is no reason to expect a principled limit on nouns appearing in bare predication.<br />

Note that <strong>the</strong> reinterpretation referred to can be seen as a coercion mechanism from a lexical<br />

standpoint. Once again it is not obvious though that we couldn’t get <strong>the</strong> same effect<br />

through world knowledge.<br />

33 Conclusion<br />

In 3.1. and 3.2. it was argued that any noun can appear in both bare predication and<br />

non-bare predication. As noted before <strong>the</strong>se facts cannot be seen as decisive arguments<br />

against a lexical approach. They do however make lexical approaches less appealing and<br />

clear <strong>the</strong> road for non-lexical analyses like <strong>the</strong> one that will be presented in paragraph 4.<br />

4 Bare predication and kinds<br />

In this paragraph I will introduce <strong>the</strong> basic ingredients for an analysis in which non-bare<br />

predication is seen as kind-membership predication. The basic claim is that a sentence<br />

involving non-bare predication should be interpreted as ’X belongs to <strong>the</strong> kind Y’. The<br />

paragraph is organized as follows. I first present my background assumptions about kinds<br />

and articles (4.1. and 4.2.). Afterwards I present a pragmatic analysis <strong>of</strong> <strong>the</strong> contrast between<br />

bare and non-bare predication (4.3). I close <strong>the</strong> paragraph defending <strong>the</strong> claim that<br />

<strong>the</strong>re is a one-to-one correspondence between non-bare predication and kind-membership<br />

predication (4.4).<br />

41 Background on kinds<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

I follow Chierchia (1998) in his intuition that kinds are regularities that occur in nature.<br />

This translates into two constraints on kinds and <strong>the</strong>ir instantiations. The first (see (15))<br />

captures <strong>the</strong> intuition that for something to be regular it should be hypo<strong>the</strong>sized that <strong>the</strong>re<br />

41


could be more than one. Note though that for K to qualify as a kind in w0 it is not<br />

necessary for <strong>the</strong>re to be more than one or even one single instantiation <strong>of</strong> K in w0 (this<br />

makes it possible to talk about unicorns, dodos and new inventions as kinds).<br />

(15) For K to be a kind in w0 <strong>the</strong>re has to be at least one world in which K has more<br />

than one instantiation.<br />

The second constraint (see (16)) captures <strong>the</strong> intuition that <strong>the</strong> instantiations <strong>of</strong> kinds<br />

behave in a regular way, i.e. that <strong>the</strong>ir kind-membership is not accidental. Note though<br />

that it does not prohibit kinds to display properties varying over time nor for individuals<br />

to start or stop being instantiations <strong>of</strong> a kind (this is left to world knowledge).<br />

(16) If k is an instantiation <strong>of</strong> <strong>the</strong> kind K in w0 at tn and if k exists in a world wn<br />

accessible from w0 at tn k is an instantiation <strong>of</strong> <strong>the</strong> kind K in wn at tn.<br />

I will call (15) <strong>the</strong> non-uniqueness constraint and (16) <strong>the</strong> non-accidentality constraint on<br />

kinds and <strong>the</strong>ir instantiations.<br />

42 Background on articles<br />

I follow Partee (1987) in assuming that articles are default type-shifters from type to<br />

type e or type . In short this means that <strong>the</strong>y are markers <strong>of</strong> argumenthood and<br />

that <strong>the</strong>y cannot be omitted in absence <strong>of</strong> o<strong>the</strong>r determiners in argument position:<br />

(17) *I have cat.<br />

(18) *Man came to see me.<br />

I fur<strong>the</strong>rmore follow (Hawkins, 1991) and (Farkas, 2002) in assuming that <strong>the</strong> definite<br />

article is a uniqueness marker whereas <strong>the</strong> indefinite article is unmarked for uniqueness.<br />

This means that (19) signals that <strong>the</strong>re is only one teacher present in a particular setting<br />

whereas (20) is in principle neutral with respect to <strong>the</strong>re being one or more teachers.<br />

(19) I saw <strong>the</strong> teacher.<br />

(20) I saw a teacher.<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

As noted by Hawkins and Farkas it is <strong>the</strong> case though that by choosing <strong>the</strong> indefinite<br />

instead <strong>of</strong> <strong>the</strong> definite <strong>the</strong> speaker triggers <strong>the</strong> implicature that <strong>the</strong>re is more than one<br />

teacher.<br />

Finally, in line with Partee’s type-shifting analysis I expect indefinite articles to be omissible<br />

in predicate position. The instances <strong>of</strong> bare predication treated in this paper show that<br />

this expectation is borne out. The crucial question is why <strong>the</strong>y cannot always be omitted.<br />

The answer, I claim, does not lie in <strong>the</strong> semantics but in <strong>the</strong> pragmatics. The pragmatic<br />

analysis I defend is presented in 4.3.<br />

42


43 Non-bare predication and non-uniqueness<br />

The analysis I defend is cast in (Weak) Bi-directional Optimality Theory (cf. (Blutner,<br />

2000)) and is based on five standard assumptions. The first is that bare and non-bare<br />

predication are truth-conditionally equivalent (cf. (Partee, 1987)). The second assumption<br />

is that both bare and non-bare predication in principle trigger an implicature <strong>of</strong> nonuniqueness.<br />

This assumption builds on <strong>the</strong> insights <strong>of</strong> Hawkins and Farkas according to<br />

whom not using <strong>the</strong> definite triggers an implicature <strong>of</strong> non-uniqueness. The third assumption<br />

is that non-bare predication is syntactically more marked than bare predication (cf.<br />

(de Swart and Zwarts, To appear)). Syntactic markedness can be understood in terms <strong>of</strong><br />

projections: whereas non-bare predication involves DPs, bare predication only involves<br />

NPs (or NumPs). The fourth assumption is that conveying non-uniqueness is semantically<br />

more marked than conveying neutrality with respect to uniqueness (cf. (de Swart and<br />

Zwarts, To appear)). Semantic markedness can be understood in terms <strong>of</strong> compatibility:<br />

non-uniqueness is compatible with neutrality but neutrality is not necessarily compatible<br />

with non-uniqueness. The fifth and final assumption is that unmarked forms and meanings<br />

are preferred over marked forms and meanings (a standard assumption in <strong>the</strong> OT<br />

literature). The resulting (Weak) Bi-directional OT tableau is presented in (21).<br />

(21)<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

What comes out <strong>of</strong> this analysis is that bare predication is neutral with respect to uniqueness<br />

whereas non-bare predication marks non-uniqueness.<br />

44 Kinds and non-bare predication<br />

In 4.1. I claimed - on <strong>the</strong> basis <strong>of</strong> common intuitions - that kinds are subject to a nonuniqueness<br />

constraint. In 4.3. I claimed - on <strong>the</strong> basis <strong>of</strong> standard assumptions - that<br />

non-bare predication marks non-uniqueness whereas bare predication is neutral with respect<br />

to uniqueness. When we combine both claims it follows that non-bare predication<br />

is best suited to signal kind-membership. As I will demonstrate in what follows this is<br />

indeed what it does in languages like Dutch. I will show this on <strong>the</strong> basis <strong>of</strong> five predictions<br />

that follow from <strong>the</strong> claim that <strong>the</strong>re is one-to-one correspondence between non-bare<br />

predication and kind-membership predication.<br />

The first prediction is that all predication involving kind-membership has to involve <strong>the</strong><br />

indefinite article. That this is <strong>the</strong> case has been suggested by (Kupferman, 1991) and<br />

(Roy, 2006) and as far as I know this has never been challenged. Note that (14) is not<br />

a counterexample. (14) shows that bare predication may involve nouns that are usually<br />

associated with kinds but it is not an instance <strong>of</strong> kind-membership predication. Note also<br />

43


that kinds are not restricted to plants or animals but may involve things as diverse as bottles,<br />

chairs, ... in as far as <strong>the</strong>y show a sufficiently regular behaviour (see 4.1).<br />

The second prediction my claim about non-bare predication and kind-membership makes<br />

is that bare predication should be concerned with <strong>the</strong> predication <strong>of</strong> properties that are unlike<br />

those that link a kind to its instantiations. In view <strong>of</strong> <strong>the</strong> non-accidentality constraint<br />

on kinds and <strong>the</strong>ir instantiations (see 4.1) it is <strong>the</strong>n predicted that bare predication is concerned<br />

with accidental properties. To see that this is exactly what happens it is instructive<br />

to look at those nouns that usually appear in bare predication: nouns linked to pr<strong>of</strong>essions,<br />

religions, nationalities and titles. These ”do not depend on <strong>the</strong> inherent, natural properties<br />

<strong>of</strong> a person or what <strong>the</strong> person actually does, but on <strong>the</strong> social or cultural status <strong>of</strong> that<br />

person” (de Swart et al., 2007).<br />

The third prediction is that whenever a noun that is usually associated with kinds is used<br />

in bare predication it is reinterpreted in such a way that it no longer predicates a nonaccidental<br />

property. An example was given in (14): being a wolf in (14) is an accidental<br />

property that comes with <strong>the</strong> distribution <strong>of</strong> roles in a game.<br />

The fourth prediction is that whenever a noun that is usually not associated with kinds<br />

is used in non-bare predication it is reinterpreted in such a way that it starts predicating<br />

non-accidental properties. An example was given in (<strong>13</strong>b): for Sil to be a butcher is no<br />

longer seen as an accidental property but ra<strong>the</strong>r as something that is linked to his inherent<br />

properties. This explains why Sil needn’t be a butcher by pr<strong>of</strong>ession to make (<strong>13</strong>b) true.<br />

The fifth prediction is that whenever it is not clear whe<strong>the</strong>r something is an accidental<br />

property or not <strong>the</strong>re is variation in <strong>the</strong> predication that is used. One telling example is<br />

that <strong>of</strong> diseases like alcoholism. According to some alcoholism is a disease that people<br />

may or may not get, according to o<strong>the</strong>rs alcoholics are <strong>the</strong>mselves responsible and are<br />

not sick in <strong>the</strong> classical meaning <strong>of</strong> <strong>the</strong> word. Interestingly this division is reflected in <strong>the</strong><br />

use <strong>of</strong> <strong>the</strong> more clinical alcoholieker (Dutch, ’alcoholic’) and <strong>the</strong> more popular drinker<br />

(Dutch, ’drinker’). On google I found <strong>the</strong> former 43 times in bare predication and 8 times<br />

in non-bare predication whereas <strong>the</strong> latter appeared 364 times in non-bare predication and<br />

only 6 times in bare predication. 1<br />

5 Conclusion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

This paper started out as an investigation into <strong>the</strong> role <strong>of</strong> <strong>the</strong> indefinite article in predicate<br />

position. The analysis I defended is that through its competition with <strong>the</strong> bare form it<br />

marks non-uniqueness which in turn can be linked to kind-membership predication. This<br />

analysis is attractive in at least three respects. The first is that <strong>the</strong> indefinite article maintains<br />

its standard semantics and pragmatics and is not reduced to a vacuous item. The<br />

second is that it <strong>of</strong>fers a formalizable alternative to temporal analyses that were shown to<br />

make wrong predictions. The third is that it brings toge<strong>the</strong>r intuitions and claims from<br />

work on kinds and work on bare predication that lend <strong>the</strong>mselves to an interesting remix.<br />

1 The google search was done on www.google.nl (restricted to Dutch pages) and concerned searches <strong>of</strong><br />

<strong>the</strong> form ”is drinker” / ”is alcoholieker”.<br />

44


References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Beyssade, C. and Dobrovie-Sorin, C. (2005). Bare predicate nominals in dutch, <strong>Proceedings</strong><br />

<strong>of</strong> SALT 15.<br />

Blutner, R. (2000). Some aspects <strong>of</strong> optimality in natural language interpretation, Journal<br />

<strong>of</strong> Semantics 17.<br />

Broekhuis, H., Keizer, E. and Den Dikken, M. (2003). Modern grammar <strong>of</strong> Dutch. Occasional<br />

papers 4, Tilburg University, Tilburg.<br />

de Swart, H., Winter, Y. and Zwarts, J. (2005). Bare predicate nominals in dutch, in<br />

E. Maier, C. Bary and J. Huitink (eds), <strong>Proceedings</strong> <strong>of</strong> SuB9.<br />

de Swart, H., Winter, Y. and Zwarts, J. (2007). Bare nominals and reference to capacities,<br />

Natural Language and Linguistic Theory 25.<br />

de Swart, H. and Zwarts, J. (To appear). Nominals with and without an article: Distribution,<br />

interpretation and variation, in P. Hendriks, H. de Hoop, I. Krämer, H. de Swart<br />

and J. Zwarts (eds), Conflicts in Interpretation.<br />

Déprez, V. (2005). Morphological number, semantic number and bare nouns, Lingua 115.<br />

Farkas, D. (2002). Specificity distinctions, Journal <strong>of</strong> Semantics 19.<br />

Hawkins, J. (1991). On (in)definite articles: implicatures and (un)grammaticality prediction,<br />

Journal <strong>of</strong> Linguistics 27.<br />

Kupferman, L. (1991). Structure événementielle de l’ alternance un / ∅ devant les noms<br />

humains attributs, Langage 102.<br />

Matushansky, O. and Spector, B. (2005). Tinker, tailor, soldier, spy, in E. Maier, C. Bary<br />

and J. Huitink (eds), <strong>Proceedings</strong> <strong>of</strong> SuB9.<br />

Munn, A. and Schmitt, C. (2005). Number and indefinites, Lingua 115.<br />

Partee, B. (1987). Noun phrase interpretation and type-shifting principles, in J. Groenendijk,<br />

D. de Jongh and M. Stokh<strong>of</strong> (eds), Studies in Discourse Representation<br />

Theory and <strong>the</strong> Theory <strong>of</strong> Generalized Quantifiers, Foris, Dordrecht.<br />

Roy, I. (2006). Non-verbal predications: a syntactic analysis <strong>of</strong> predicational copular<br />

sentences, PhD <strong>the</strong>sis, University <strong>of</strong> Sou<strong>the</strong>rn California.<br />

45


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

46


DIAGRAMMATIC REASONING<br />

WITH ENHANCED STATIC CONSTRAINTS<br />

James Burton<br />

University <strong>of</strong> Brighton<br />

Abstract. This paper reports on ongoing work to create a pro<strong>of</strong>-carrying Domain Specific<br />

Embedded Language (DSEL) for diagrammatic logics, using Euler diagrams as a case study.<br />

The DSEL is written in Haskell with type system extensions that allow <strong>the</strong> exploitation <strong>of</strong><br />

a combination <strong>of</strong> ideas from Constructive Type Theory. These extensions <strong>of</strong>fer an increase<br />

in expressiveness over Hindley-Milner type systems and have been used for program verification.<br />

We use <strong>the</strong>se extensions to create enhanced static constraints to enforce invariants<br />

on diagrams and transformations (inference rules). Our work is at an early stage and we<br />

describe <strong>the</strong> goals and challenges ahead. The major goal is to create a DSEL for generalized<br />

constraint diagrams, a visual logic expressive enough to be useful for modelling s<strong>of</strong>tware,<br />

and to extract <strong>the</strong> types <strong>of</strong> <strong>the</strong> resulting diagrams for use as s<strong>of</strong>tware artefacts.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

A great deal <strong>of</strong> effort is spent on attempts to increase s<strong>of</strong>tware reliability and <strong>the</strong> productivity<br />

<strong>of</strong> programmers, by both <strong>the</strong> research community and <strong>the</strong> s<strong>of</strong>tware industry. Of<br />

<strong>the</strong> techniques employed (development methodologies, systematic modelling, automated<br />

testing), formal methods have been little used outside <strong>of</strong> <strong>the</strong> most safety-critical sectors<br />

where <strong>the</strong>y are used to verify semantic properties <strong>of</strong> s<strong>of</strong>tware and to assure desired runtime<br />

conditions. We believe <strong>the</strong> benefits <strong>of</strong> <strong>the</strong>ir more widespread use could be great, but<br />

<strong>the</strong> impact <strong>of</strong> factors inhibiting adoption needs to be reduced. These factors may include<br />

<strong>the</strong> fact that existing techniques are seen as difficult to use, time-consuming and requiring<br />

specialised expertise. There is, <strong>the</strong>refore, a need for more “lightweight” formal methods<br />

which are accessible to programmers with a minimum <strong>of</strong> specialised training and which<br />

fit in seamlessly with <strong>the</strong> tools <strong>the</strong>y employ. Sheard has said that enabling programmers<br />

to make statements about semantic properties <strong>of</strong> <strong>the</strong> code <strong>the</strong>y write directly, ra<strong>the</strong>r than<br />

turning to external tools with high barriers to entry (likely to be written by, and for, ma<strong>the</strong>maticians)<br />

will make it more likely that <strong>the</strong>y do so — in short, that <strong>the</strong> semantic gap<br />

between <strong>the</strong> tools for programming and those for formal reasoning is damaging to <strong>the</strong><br />

cause <strong>of</strong> both (Sheard, 2004).<br />

At <strong>the</strong> same time as <strong>the</strong> Unified Modelling Language (UML) was adopted as a standard<br />

visual language for modelling s<strong>of</strong>tware in <strong>the</strong> 1990s, breakthroughs occured in <strong>the</strong><br />

use <strong>of</strong> diagrams as visual logics (Shin, 1994; Hammer, 1995). Shin proved soundness and<br />

completeness results for <strong>the</strong> so-called Venn-II reasoning system, equivalent in expressive<br />

power to Monadic First Order Logic, and research began into a number <strong>of</strong> diagrammatic<br />

reasoning systems varying in notation and expressive power. The connection between<br />

<strong>the</strong> new formalised diagrams and those used in s<strong>of</strong>tware modelling was quickly made.<br />

Although <strong>the</strong> UML works well to describe <strong>the</strong> architecture <strong>of</strong> a system it is not always expressive<br />

enough to capture all invariants we might wish to enforce, a fact which led to <strong>the</strong><br />

development <strong>of</strong> <strong>the</strong> (non-graphical) Object Constraint Language (OCL). Kent proposed<br />

constraint diagrams as a purely diagrammatic alternative to <strong>the</strong> OCL, more appropriately<br />

47


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

complementing <strong>the</strong> UML’s visual nature (Kent, 1997). The constraint diagram in figure 1<br />

shows a constraint in a library management system. Amongst o<strong>the</strong>r things it states that<br />

people can only borrow books that are in <strong>the</strong> collections <strong>of</strong> libraries <strong>the</strong>y have joined.<br />

Figure 1: A constraint diagram and an Euler diagram.<br />

There are many reasons that we might want to use diagrams to represent information,<br />

including <strong>the</strong> potential <strong>of</strong> diagrams for well matchedness and free rides. A diagram is<br />

well matched to its subject if it presents <strong>the</strong> key features <strong>of</strong> that subject effectively and in<br />

a way that seems intuitively clear to <strong>the</strong> viewer (Gurr and Tourlas, 2000). A well matched<br />

diagram can make certain reasoning tasks appear to be easier when compared with a symbolic<br />

representation <strong>of</strong> <strong>the</strong> same information. Free rides occur when a diagram provides<br />

some information ‘naturally’ or ‘for free’ which would need to be explicitly stated in, or<br />

derived from, a symbolic representation (Shimojima, 2004). For example, in <strong>the</strong> Euler diagram<br />

in figure 1, <strong>the</strong> fact that <strong>the</strong> contour Spaniels is placed within GunDogs asserts directly<br />

that Spaniels ⊆ GunDogs but also allows <strong>the</strong> viewer to infer Spaniels ⊆ Dogs and<br />

Spaniels ∩ Cats = ∅. Details <strong>of</strong> well matchedness and free rides in constraint diagrams<br />

can be found in (Stapleton and Delaney, 2008). In some circumstances <strong>the</strong> expressive<br />

power <strong>of</strong> diagrams can produce ambiguity, or lead <strong>the</strong> viewer to make false inferences.<br />

However, many diagrammatic notations now have formal, unambiguous semantics, <strong>of</strong><br />

which Euler and constraint diagrams are prominent examples.<br />

Our ultimate goal is to create a Domain Specific Embedded Language (DSEL) for<br />

several systems <strong>of</strong> diagrammatic reasoning, with two main aims: to explore <strong>the</strong> benefits<br />

and boundaries <strong>of</strong> <strong>the</strong> emerging style <strong>of</strong> programming that mixes formal methods with<br />

programming, and to support <strong>the</strong> work which aims to establish visual logics as a valuable<br />

tool in formal methods.<br />

The DSEL will be written in Haskell and will consist <strong>of</strong> statically verified code which<br />

will allow <strong>the</strong> user to manipulate and reason with a variety <strong>of</strong> visual logics such as Euler<br />

diagrams, spider diagrams and constraint diagrams (see Section 2). The DSEL, <strong>the</strong>refore,<br />

shares one <strong>of</strong> <strong>the</strong> primary aims <strong>of</strong> visual logics — to make formal reasoning more accessible<br />

and widely used. Reasoning about design and implementation have traditionally<br />

taken place in separate phases <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware process, with <strong>the</strong> onus on <strong>the</strong> programmer<br />

to bridge <strong>the</strong> gap between <strong>the</strong> two. One <strong>of</strong> <strong>the</strong> benefits <strong>of</strong> combining both activities in one<br />

phase is that constraints modelled by a programmer using <strong>the</strong> DSEL will form s<strong>of</strong>tware<br />

components in <strong>the</strong>ir own right, resulting in diagrams with <strong>the</strong> same type as functions in<br />

48


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

<strong>the</strong> modelled system. This suggests that such constraints could eventually form part <strong>of</strong><br />

working s<strong>of</strong>tware, perhaps as part <strong>of</strong> a “trusted kernel” used by o<strong>the</strong>r components, following<br />

<strong>the</strong> approach <strong>of</strong> (Kiselyov and Shan, 2007). The form and function <strong>of</strong> <strong>the</strong> DSEL will<br />

<strong>the</strong>refore be closely linked — a formally specified language to assist formal reasoning.<br />

Advances in Programming Language Theory are typically explored in research languages<br />

before percolating into more widely used languages. This is especially true <strong>of</strong><br />

modern functional languages and, in particular, Haskell. The Haskell type system with<br />

<strong>the</strong> extensions provided by <strong>the</strong> GHC compiler make it possible to explore what Sheard<br />

called (when speaking <strong>of</strong> <strong>the</strong> closely related language Ωmega) “a new point in <strong>the</strong> design<br />

space <strong>of</strong> formal reasoning systems — part programming language, part logical framework”<br />

(Sheard, 2004) and to do so directly within <strong>the</strong> environment <strong>of</strong> a practical language<br />

with efficient implementations. The language features that enable this can be used to emulate<br />

<strong>the</strong> behaviour <strong>of</strong> fully dependently typed languages such as Epigram (Altenkirch,<br />

Mcbride and Mckinna, 2005), resulting in what have been called “pseudo-dependently<br />

typed” systems, described in Section 4. The syntactic clarity, referential transparency<br />

and similarity to ma<strong>the</strong>matical notation <strong>of</strong> functional languages are also <strong>of</strong> benefit to us.<br />

These features help us in our goal to minimise syntactic differences between <strong>the</strong> DSEL<br />

and <strong>the</strong> diagrammatic logics we implement, making it easier to demonstrate a clear mapping<br />

between <strong>the</strong> two. The point <strong>of</strong> this mapping is to demonstrate “literal preservation<br />

<strong>of</strong> syntactic relations under denotation”, as Hammer states <strong>the</strong> conditions for resemblance<br />

between a sign and that which it signifies (Hammer, 1995).<br />

In Section 2 we describe reasoning with Euler diagrams. Section 3 gives an overview<br />

<strong>of</strong> type <strong>the</strong>oretic features making <strong>the</strong>ir way into programming languages while Section<br />

4 looks ahead to <strong>the</strong> form our DSEL will take, using Euler diagrams as a case study. In<br />

Section 5 we consider <strong>the</strong> goals <strong>of</strong> <strong>the</strong> research, evaluate <strong>the</strong> strategies used to reach <strong>the</strong>m<br />

and identify some <strong>of</strong> <strong>the</strong> challenges ahead.<br />

2 Reasoning with Euler diagrams<br />

Although diagrams have <strong>of</strong>ten been used to aid understanding in ma<strong>the</strong>matical pro<strong>of</strong>s,<br />

<strong>the</strong>y have until fairly recently been treated as informal and secondary to formalized symbolic<br />

content. In <strong>the</strong> 1990s <strong>the</strong> work <strong>of</strong> Shin began to put diagrams on a different standing<br />

by proving soundness and completeness results for <strong>the</strong> Venn-II reasoning system, an extension<br />

and formalisation <strong>of</strong> earlier work by Venn and Peirce (Shin, 1994). Stapleton<br />

provides a summary <strong>of</strong> <strong>the</strong> history <strong>of</strong> diagrammatic reasoning since <strong>the</strong>n, which is now<br />

a rapidly evolving and active research area (Stapleton, 2007). What makes such logics<br />

interesting, given <strong>the</strong> existence <strong>of</strong> mature symbolic reasoning techniques, is <strong>the</strong> combination<br />

<strong>of</strong> formal reasoning with <strong>the</strong> compact and intuitive nature <strong>of</strong> diagrams referred<br />

to previously. We expect that this, and <strong>the</strong> efforts to create supporting tools, will make<br />

formal reasoning more accessible to non-logicians.<br />

An Euler diagram is a collection <strong>of</strong> closed curves called contours which represent sets,<br />

within an enclosing rectangle. Figure 2 shows an example with three contours, labelled<br />

A, B and C. Containment, intersection and disjointness are represented by <strong>the</strong> placement<br />

<strong>of</strong> contours, so <strong>the</strong> same diagram asserts C ⊆ A and B ∩ C = ∅. A zone is a set <strong>of</strong><br />

points in <strong>the</strong> diagram that can be described as being inside certain contours and outside<br />

all o<strong>the</strong>rs. The diagram in figure 2 has five zones; one inside A but outside B and C, one<br />

49


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 2: An Euler diagram.<br />

inside A and C but outside B, and so forth. The region outside <strong>of</strong> all contours is also a<br />

zone. Shading within a zone asserts <strong>the</strong> emptiness <strong>of</strong> <strong>the</strong> set represented by that zone. So,<br />

<strong>the</strong> shading <strong>of</strong> <strong>the</strong> diagram in figure 2 asserts A ∩ B = ∅ and A − C = ∅.<br />

Reasoning is carried out by <strong>the</strong> application <strong>of</strong> rules which transform one diagram into<br />

ano<strong>the</strong>r, such as Add Contour and Remove Shading; a sound and complete set is given in<br />

(Stapleton, Masth<strong>of</strong>f, Flower, Fish and Sou<strong>the</strong>rn, 2007). A pro<strong>of</strong> using Euler diagrams is<br />

formed by applying <strong>the</strong>se rules repeatedly to transform an initial diagram (<strong>the</strong> premise)<br />

into <strong>the</strong> target diagram (<strong>the</strong> conclusion); figure 3 shows a short example. The Add Shaded<br />

Zone rule is applied to transform d1 to d2. A new shaded zone can be added at any<br />

time since both a shaded zone and a missing zone assert <strong>the</strong> emptiness <strong>of</strong> <strong>the</strong> represented<br />

set; both d1 and d2 state that A and B are disjoint. The Add Contour rule is applied<br />

to transform d2 to d3. The new contour C intersects all existing zones without changing<br />

<strong>the</strong>ir shading. Since this operation introduces no new shading and <strong>the</strong> way that C is added<br />

ensures that no missing zones are created, d2 and d3 have <strong>the</strong> same meaning.<br />

Figure 3: An Euler diagram pro<strong>of</strong>.<br />

The diagrams are formalised using an abstract syntax. The abstraction <strong>of</strong> Euler diagrams<br />

that we present here is obtained from (Stapleton et al., 2007). Each zone is represented<br />

as a tuple <strong>of</strong> <strong>the</strong> set <strong>of</strong> labels <strong>of</strong> contours that <strong>the</strong> zone is inside and <strong>the</strong> set <strong>of</strong><br />

labels <strong>of</strong> contours <strong>the</strong> zone is outside. For example, in diagram d1, figure 3, <strong>the</strong> only zone<br />

inside A has <strong>the</strong> abstraction ({A}, {B}). Diagrams are represented as a tuple <strong>of</strong> <strong>the</strong> set<br />

<strong>of</strong> labels (L), <strong>the</strong> set <strong>of</strong> zones (Z) and <strong>the</strong> set <strong>of</strong> shaded zones (Z ∗ ). Thus, diagram d2 in<br />

figure 3 has abstraction:<br />

〈L = {A, B}, Z = {({A}, {B}), ({B}, {A}), ({A, B}, ∅), (∅, {A, B})}, Z ∗ = {({A, B}, ∅)}〉<br />

There are a number <strong>of</strong> logics that extend this system <strong>of</strong> Euler diagrams, including spider<br />

diagrams (Howse, Stapleton and Taylor, 2005) and <strong>the</strong> constraint diagrams mentioned<br />

50


previously.<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

3 Dependent Typing and Pro<strong>of</strong>-Carrying Code<br />

The Curry-Howard Isomorphism has a long history and arises from <strong>the</strong> observation <strong>of</strong> a<br />

correspondence between Hilbert-style deductive logic and combinatory models <strong>of</strong> computation.<br />

The work <strong>of</strong> Martin-Löf cast it as a more general principle linking logical formalisms<br />

and <strong>the</strong> type systems <strong>of</strong> programming languages (Martin-Löf, 1984). Ra<strong>the</strong>r<br />

than classifying values, types can be viewed as propositions; a value inhabiting type T<br />

corresponds to a pro<strong>of</strong> <strong>of</strong> T. Martin-Löf’s type <strong>the</strong>ory can be used as an environment for<br />

programming with dependent types (Nordstrom, Petersson and Smith, 1990). Dependent<br />

type systems are so-called because types may depend on a value, such as List a n, <strong>the</strong><br />

type <strong>of</strong> collections <strong>of</strong> elements <strong>of</strong> type a with length n. For different values <strong>of</strong> n we have<br />

different types. A sketch <strong>of</strong> <strong>the</strong> logical rules for type-safe list operations is given as type<br />

judgements below. We assume <strong>the</strong> types Nat (<strong>of</strong> Peano numbers with constructors Zero<br />

and Succ n) and List a n. Γ is a typing context and Γ ⊢ σ type means that σ is a type in<br />

Γ.<br />

Γ ⊢ Nat type Γ ⊢ Zero : Nat<br />

Γ ⊢ n : Nat<br />

Γ ⊢ Succ n : Nat<br />

Γ ⊢ t type<br />

Γ ⊢ empty t : List t Zero<br />

Γ ⊢ t type Γ ⊢ x : t Γ ⊢ n : Nat Γ ⊢ l : List t n<br />

Γ ⊢ cons x l : List t (Succ n)<br />

Γ ⊢ t type Γ ⊢ n : Nat Γ ⊢ l : List t (Succ n)<br />

Γ ⊢ tail l : List t n<br />

Γ ⊢ t type Γ, n : Nat ⊢ l : List t (Succ n)<br />

Γ ⊢ head l : t<br />

Dependent type <strong>the</strong>ory makes Curry-Howard (or propositions-as-types) useful in practical<br />

ways. The resulting type systems form <strong>the</strong> basis <strong>of</strong> automated <strong>the</strong>orem provers<br />

(Bertot and Casteran, 2004) and, on <strong>the</strong> o<strong>the</strong>r hand, purely functional and total programming<br />

languages (Altenkirch et al., 2005). The same insights inform more widely used<br />

languages at an accelerating rate, especially Haskell, which plays <strong>the</strong> dual rôle <strong>of</strong> research<br />

language and practical tool. The type system <strong>of</strong> Haskell with extensions is flexible<br />

enough to emulate many aspects <strong>of</strong> dependent typing and to create programs whose types<br />

act as pro<strong>of</strong> that <strong>the</strong>ir implementation conforms to <strong>the</strong>ir specification.<br />

4 Haskell and <strong>the</strong> DSEL for Euler Diagrams<br />

Programming our diagrammatic DSEL is at <strong>the</strong> prototype stage. Its foundation is a typelevel<br />

Set library which encodes and ensures constraints such as set membership, disjointness<br />

and so on. Above this will sit <strong>the</strong> implementation <strong>of</strong> several diagrammatic logics.<br />

Two diagrammatic transformations corresponding to inference rules in an Euler diagram<br />

system are presented as a type judgements below.<br />

51


In a language such as Haskell we may not mix types and terms in <strong>the</strong> way described in<br />

Section 3. The collection <strong>of</strong> techniques used to achieve something <strong>of</strong>ten called “pseudodependent<br />

typing” includes type-level representations <strong>of</strong> <strong>the</strong> indexing term supplied to <strong>the</strong><br />

type constructor; to use <strong>the</strong> example from Section 3, since we have no type-level numbers<br />

we represent n in List a n by types formed <strong>of</strong> <strong>the</strong> empty Haskell type constructors Z and<br />

Succ n, such as Succ (Succ Z ).<br />

It is important to distinguish type-level from term-level computations. In <strong>the</strong> termlevel<br />

<strong>of</strong> a programming language with partial functions <strong>the</strong> result <strong>of</strong> any function may be<br />

undefined (⊥), and so programs are not pro<strong>of</strong>s. Type functions like union below are not<br />

functions over values, are defined extensionally and exclude <strong>the</strong> undefined. The DSEL<br />

is comprised <strong>of</strong> two main components, <strong>the</strong> domain specific, dependently typed <strong>the</strong>ory<br />

<strong>of</strong> diagrammatic reasoning, which provides assurances about <strong>the</strong> correct formation <strong>of</strong><br />

diagrams and application <strong>of</strong> reasoning rules, and <strong>the</strong> interactive front end which makes use<br />

<strong>of</strong> this type system and is subject to <strong>the</strong> usual limitations <strong>of</strong> <strong>the</strong> host language. Although<br />

we do not use a dependently typed host language, our approach is similar in spirit to<br />

(Oury and Swierstra, 2008) who use Agda to enforce sophisticated constraints statically<br />

in a series <strong>of</strong> DSELs.<br />

Since type-level values are distinct from terms, special measures are required to handle<br />

<strong>the</strong>m at runtime. We use a combination <strong>of</strong> techniques involving empty and existential<br />

types (Peyton Jones, 2008) to do this. As an example <strong>of</strong> our strategy, <strong>the</strong> types A, B and<br />

C below are empty types used to represent <strong>the</strong> labels <strong>of</strong> contours in a diagram:<br />

data A ; data B ; data C data Nil<br />

data L a where data t ⊲ ts<br />

AL :: L A<br />

BL :: L B<br />

The type L a lifts labels into a more general type, allowing us to consider labels <strong>of</strong> any<br />

type. The type constructors Nil and ⊲ are used as <strong>the</strong> building blocks <strong>of</strong> sets <strong>of</strong> labels.<br />

LBox and LSetBox use “existential boxing” to wrap type-level values <strong>of</strong> LSet t, allowing<br />

us to handle <strong>the</strong> outer type at runtime but for <strong>the</strong> “boxed” value to remain available for<br />

inspection by constraints:<br />

data LSet t where data LBox = ∀a. LBox (L a)<br />

Empty :: LSet Nil data LSetBox = ∀t. LSetBox (LSet t)<br />

Ins :: L a → LSet t → LSet (a ⊲ t)<br />

By creating a function fromChar :: Char → LBox we can box runtime values and insert<br />

<strong>the</strong>m into boxed sets with a function that calls on fromChar, insertChar :: Char →<br />

LSetBox → LSetBox. When insertChar is used to add elements to a set <strong>of</strong> type<br />

LSetBox, a correspondence is enforced between <strong>the</strong> collection <strong>of</strong> values and <strong>the</strong> type<br />

<strong>of</strong> its LSet t parameter. The value <strong>of</strong> a collection can be seen as fully determined by<br />

<strong>the</strong> type <strong>of</strong> this parameter, which is a pro<strong>of</strong> ensuring that inserted elements are members<br />

<strong>of</strong> <strong>the</strong> resulting collection. Assurances for <strong>the</strong> semantics <strong>of</strong> sets may be encoded using<br />

constraints written using Indexed Type Families (Peyton Jones, 2008).<br />

41 Judgement Rules<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We model <strong>the</strong> tuples found in <strong>the</strong> Euler diagram abstraction with <strong>the</strong> types Z l1 l2 (zones)<br />

and D l z z ∗ (diagrams). The type judgements below are a fragment <strong>of</strong> a self-contained<br />

52


type <strong>the</strong>ory <strong>of</strong> Euler diagrams based on <strong>the</strong> abstract syntax given in (Stapleton et al.,<br />

2007). Once complete, this type <strong>the</strong>ory will be implemented using <strong>the</strong> techniques in <strong>the</strong><br />

previous section to produce a DSEL with enhanced static constraints.<br />

Two kinds <strong>of</strong> element appear in <strong>the</strong> judgement rules: typing judgements, e.g. x : a<br />

and type constraints, e.g. Γ ⊢ C x y type, meaning that <strong>the</strong> type C can be formed in<br />

<strong>the</strong> context Γ. Type constraints presumed to be defined in <strong>the</strong> Set library such as Disjoint<br />

appear capitalised, while functions from types to types are in lower case, e.g. union.<br />

We use <strong>the</strong> constraints Label, LabelSet, Zone and ZoneSet to restrict <strong>the</strong> input to type<br />

constructors.<br />

Supplied with disjoint sets <strong>of</strong> labels, l1 and l2 , Z constructs a zone:<br />

Γ ⊢ LabelSet l1 type Γ ⊢ LabelSet l2 type Γ ⊢ Disjoint l1 l2 type<br />

Γ ⊢ Z l1 l2 type<br />

The syntactic rules state that given a diagram D l z z ∗ , <strong>the</strong> zones z form a superset <strong>of</strong><br />

<strong>the</strong> shaded zones z ∗ . Also, for each zone Z l1 l2 in z and z ∗ , l1 ∪ l2 forms a partition over<br />

l. The Invs rule applies <strong>the</strong>se constraints to a diagram:<br />

Γ ⊢ Invs l z type Γ ⊢ Invs l z ∗ type Γ ⊢ Subset z ∗ z type<br />

Γ ⊢ D l z z ∗ type<br />

The Inv rule applies <strong>the</strong> relevant constraint to an individual zone. The base case for<br />

applying Inv is:<br />

Γ ⊢ Label l type<br />

Γ ⊢ Invs l Nil type<br />

The inductive case for applying Inv is:<br />

Γ ⊢ Label l type Γ ⊢ ZoneSet (z ⊲ zs) type Γ ⊢ Inv l z type Γ ⊢ Invs l zs type<br />

Γ ⊢ Invs l (z ⊲ zs) type<br />

Since l1 ∩ l2 = ∅, <strong>the</strong>y partition l if l1 ∪ l2 = l:<br />

Γ ⊢ Z l1 l2 type Γ ⊢ u : union l1 l2 Γ ⊢ LabelSet ls type Γ ⊢ Eq l u type<br />

Γ ⊢ Inv ls (Z l1 l2) type<br />

The quotes that begin <strong>the</strong> following subsections are from (Stapleton et al., 2007) from<br />

which we take reasoning rules and translate <strong>the</strong>m to typing judgements. The invariants are<br />

not tested after applying <strong>the</strong> rules since previous judgements guarantee that if a diagram<br />

can be formed, <strong>the</strong> invariants have been met.<br />

411 Remove Shaded Zone<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

“A shaded zone can be removed but only if <strong>the</strong>re is at least one zone inside each contour<br />

in <strong>the</strong> resulting diagram and <strong>the</strong> zone outside all <strong>the</strong> contours remains”. In figure 4, <strong>the</strong><br />

Remove Shaded Zone rule can be applied to transform d1 into d2.<br />

Γ ⊢ Zone x type<br />

Γ ⊢ D l z z ∗ type Γ ⊢ z ′ : delete x z<br />

Γ ⊢ z ∗′ : delete x z ∗ Γ ⊢ Member x z ∗ type<br />

Γ ⊢ transform RemoveShadedZone x (D l z z ∗ ) : (D l z ′ z ∗′ )<br />

53


412 Add Contour<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 4: Three Euler diagrams.<br />

“A contour can be added to a diagram provided its label is not already in <strong>the</strong> diagram. Each<br />

zone is split into two zones (one inside and one outside <strong>the</strong> new contour), and shading is<br />

preserved”. In figure 4 <strong>the</strong> Add Contour rule can be applied to transform d1 into d3.<br />

Before we can add contours we need a way <strong>of</strong> replacing all zones z : Z l1 l2 in a set<br />

with two copies <strong>of</strong> itself, one with an extra label added to l1, one with that same label<br />

added to l2.<br />

Γ ⊢ Label c type<br />

Γ ⊢ splitZones c Nil : Nil<br />

Γ ⊢ ZoneSet (z ⊲ zs) type Γ ⊢ Label c type<br />

Γ ⊢ z2 : insertLabel Excl c z Γ ⊢ z1 : insertLabel Incl c z<br />

Γ ⊢ splitZones c (z ⊲ zs) : (z1 ⊲ z2 ⊲ (splitZones c zs))<br />

Γ ⊢ Z l1 l2 type Γ ⊢ Label c type Γ ⊢ l3 : c ⊲ l1<br />

Γ ⊢ insertLabel Incl c (Z l1 l2) : (Z l3 l2)<br />

Γ ⊢ Z l1 l2 type Γ ⊢ Label c type Γ ⊢ l3 : c ⊲ l2<br />

Γ ⊢ insertLabel Excl c (Z l1 l2) : (Z l1 l3)<br />

Γ ⊢ Label c type<br />

Γ ⊢ D l z z ∗ type Γ ⊢ l ′ : c ⊲ l<br />

Γ ⊢ z ′ : splitZones c z Γ ⊢ z ∗′ : splitZones c z ∗<br />

Γ ⊢ transform AddContour c (D l z z ∗ ) : (D l ′ z ′ z ∗′ )<br />

5 Conclusions and Fur<strong>the</strong>r Work<br />

We have presented part <strong>of</strong> a DSEL for Euler diagrams that closely mirrors <strong>the</strong>ir abstract<br />

syntax and which allows us to inherit <strong>the</strong> definitions <strong>of</strong> reasoning rules in a seamless<br />

way. We have extended <strong>the</strong> approach <strong>of</strong> section 41 to a complete set <strong>of</strong> reasoning rules,<br />

providing a type <strong>the</strong>oretical version <strong>of</strong> Euler diagrams. Providing a self-contained type<br />

<strong>the</strong>ory for <strong>the</strong> DSEL (beginning with <strong>the</strong> simplest case <strong>of</strong> a set <strong>of</strong> rules for reasoning with<br />

Euler diagrams and extending this to more complex cases) will make results relating to<br />

<strong>the</strong> logics (soundness, completeness, etc.) transferable, giving <strong>the</strong> DSEL <strong>the</strong> status <strong>of</strong> a<br />

reasoning tool in its own right.<br />

Our goal is to extend <strong>the</strong> current approach to more expressive notations, such as generalized<br />

constraint diagrams, which are expressive enough to be used when modelling<br />

54


s<strong>of</strong>tware (Stapleton and Delaney, 2008). It is ultimately expected that <strong>the</strong> DSEL will be<br />

used by higher level tools which allow <strong>the</strong> user to select from contextually legitimate diagram<br />

transformations. Diagrams created using <strong>the</strong> DSEL (with or without <strong>the</strong> support <strong>of</strong><br />

additional tools) will have a type which captures <strong>the</strong> modelled constraint. If <strong>the</strong> modelled<br />

s<strong>of</strong>tware is written in <strong>the</strong> same language as <strong>the</strong> constraint and <strong>the</strong>re is a correspondence<br />

between <strong>the</strong> datatypes used in each, we may be able to use <strong>the</strong> constraint as part <strong>of</strong> a<br />

“trusted kernel” exporting a safe subset <strong>of</strong> constructors via <strong>the</strong> module system. This scenario,<br />

in which <strong>the</strong> programmer uses tools to model constraints <strong>the</strong>n applies <strong>the</strong>m directly<br />

within <strong>the</strong> implementation phase, will provide a more unified and, ideally, a more usable<br />

programming/verification environment than exists today.<br />

Combining types with terms requires careful design. Some <strong>of</strong> <strong>the</strong> solutions, such as<br />

existential boxing, introduce levels <strong>of</strong> indirection which are unnecessary in more specialised<br />

environments and which may threaten to obscure <strong>the</strong> relationship with underlying<br />

diagrammatic logics, at least superficially. If we were to use a language such as<br />

Coq or Epigram to implement <strong>the</strong> DSEL it is possible that we could find a more natural<br />

expression <strong>of</strong> many types and constraints. We believe however, given our central aim <strong>of</strong><br />

accessibility, that <strong>the</strong>se risks are <strong>of</strong>fset by <strong>the</strong> benefits <strong>of</strong> using a more practical and accessible<br />

language than is available in <strong>the</strong> current generation <strong>of</strong> dependently typed systems.<br />

The limitations <strong>of</strong> <strong>the</strong>se techniques and how <strong>the</strong>y might be used to form a general strategy<br />

to combine verification and programming are some <strong>of</strong> <strong>the</strong> subjects <strong>of</strong> <strong>the</strong> research. The<br />

research will support <strong>the</strong> longer term goals <strong>of</strong> <strong>the</strong> diagrammatic reasoning community by<br />

providing an implementation <strong>of</strong> various visual logics which can be clearly linked to <strong>the</strong>ir<br />

related abstract syntax. Once extended to <strong>the</strong> case <strong>of</strong> constraint diagrams, <strong>the</strong> DSEL has<br />

<strong>the</strong> potential to shrink <strong>the</strong> toolchain used by programmers who wish to make statements<br />

about <strong>the</strong> semantic properties <strong>of</strong> <strong>the</strong> code <strong>the</strong>y write. There are a number <strong>of</strong> interesting<br />

challenges involved in reaching that point, such as <strong>the</strong> issue <strong>of</strong> extracting <strong>the</strong> type <strong>of</strong> a<br />

diagram in a usable form. The work reported in this paper is a first step towards achieving<br />

<strong>the</strong>se goals.<br />

Acknowledgements<br />

I would like to express my sincere thanks to John Howse, Gem Stapleton and Richard<br />

Bosworth for <strong>the</strong>ir support and encouragement, and to <strong>the</strong> anonymous reviewers for <strong>the</strong>ir<br />

helpful comments. The author is supported by EPSRC Grant EP/P50<strong>13</strong>18/1.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Altenkirch, T., Mcbride, C. and Mckinna, J. (2005). Why dependent types matter, Available<br />

online http://www.cs.nott.ac.uk/˜txa/publ/ydtm.pdf Accessed 01/02/08.<br />

Bertot, Y. and Casteran, P. (2004). Interactive Theorem Proving and Program Development,<br />

SpringerVerlag.<br />

Gurr, C. and Tourlas, K. (2000). Towards <strong>the</strong> principled design <strong>of</strong> s<strong>of</strong>tware engineering<br />

diagrams, <strong>Proceedings</strong> <strong>of</strong> 22nd International Conference on S<strong>of</strong>tware Engineering,<br />

ACM Press, pp. 509–518.<br />

55


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Hammer, E. (1995). Logic and Visual Information, CSLI, Stanford.<br />

Howse, J., Stapleton, G. and Taylor (2005). Spider diagrams, LMS Journal <strong>of</strong> Computation<br />

and Ma<strong>the</strong>matics 8: 145–194.<br />

Kent, S. (1997). Constraint diagrams: Visualizing invariants in object oriented modelling,<br />

<strong>Proceedings</strong> <strong>of</strong> OOPSLA97, ACM Press, pp. 327–341.<br />

Kiselyov, O. and Shan, C.-C. (2007). Lightweight static capabilities, Electronic Notes in<br />

Theoretical Computer Science 174(7): 79–104.<br />

Martin-Löf, P. (1984). Constructive ma<strong>the</strong>matics and computer programming, Royal Society<br />

<strong>of</strong> London Philosophical Transactions Series A pp. 501–518.<br />

Nordstrom, B., Petersson, K. and Smith, J. M. (1990). Programming in Martin-Löf’s Type<br />

Theory, OUP.<br />

Oury, N. and Swierstra, W. (2008). The power <strong>of</strong> pi, Submitted to ICFP 2008.<br />

Available online http://www.cs.nott.ac.uk/˜wss/Publications/ThePowerOfPi.pdf Accessed<br />

01/05/08.<br />

Peyton Jones, S. (2008). Ghc language features, Accessed 01/02/08<br />

http://www.haskell.org/ghc/docs/latest/html/users guide/ghc-languagefeatures.html.<br />

Sheard, T. (2004). Languages <strong>of</strong> <strong>the</strong> future, SIGPLAN Notices 39(12): 119–<strong>13</strong>2.<br />

Shimojima, A. (2004). Inferential and expressive capacities <strong>of</strong> graphical representations:<br />

Survey and some generalizations, <strong>Proceedings</strong> <strong>of</strong> Diagrams 2004, Vol. 2980<br />

<strong>of</strong> LNAI, Springer, pp. 18–21.<br />

Shin, S. J. (1994). The Logical Status <strong>of</strong> Diagrams, CUP.<br />

Stapleton, G. (2007). Diagrammatic logics: Past, present and future, International Conference<br />

on Logic, Navya Nyaya and Applications, Jadavpur University, pp. 4–15.<br />

Stapleton, G. and Delaney, A. (2008). Evaluating and generalizing constraint diagrams,<br />

Accepted for Journal <strong>of</strong> Visual Languages and Computing. Available online from<br />

JVLC.<br />

Stapleton, G., Masth<strong>of</strong>f, J., Flower, J., Fish, A. and Sou<strong>the</strong>rn, J. (2007). Automated <strong>the</strong>orem<br />

proving in Euler diagrams systems, Journal <strong>of</strong> Automated Reasoning 39: 431–<br />

470.<br />

56


FICTIONAL CONTINGENCIES<br />

Gemma Celestino<br />

University <strong>of</strong> British Columbia & LOGOS Research Group<br />

Abstract. I argue that fictional contingencies, such as <strong>the</strong> one that, in Tolstoy’s Anna Karenina,<br />

Anna Karenina might not have fallen for Vronsky pose a serious problem to a descriptivist<br />

and possible worlds view <strong>of</strong> fiction such as <strong>the</strong> one defended by David Lewis and<br />

Gregory Currie. Their view cannot account for <strong>the</strong> fact that in Tolstoys Anna Karenina, it<br />

is Anna Karenina herself who contingently falls for Vronsky. In Tolstoy’s Anna Karenina,<br />

Anna Karenina falls for Vronsky in <strong>the</strong> actual world but she fails to fall for him in some<br />

possible world.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

An interesting issue that arises within <strong>the</strong> topic <strong>of</strong> fiction is <strong>the</strong> issue <strong>of</strong> how to account for<br />

<strong>the</strong> intuitive contingencies <strong>of</strong> fictional characters. For at least some <strong>of</strong> <strong>the</strong> things that occur<br />

to fictional characters within a story are supposed to happen only contingently. There is a<br />

way certain views on fiction could take to account for <strong>the</strong>se modal properties <strong>of</strong> fictional<br />

characters that I think is mistaken and I shall argue why in this paper.<br />

Gregory Currie recently advanced such an account in his “Characters and Contingency”<br />

(2003). But his account is one that must be attractive to any follower <strong>of</strong> <strong>the</strong><br />

Lewis-Currie descriptivist view <strong>of</strong> fictional names, or <strong>of</strong> what I take would be a natural<br />

two-dimensionalist extension <strong>of</strong> Robert Stalnaker’s position on true negative existentials<br />

and related matters. The account, in fact, only makes sense within a possible worlds<br />

framework <strong>of</strong> fiction. In short, <strong>the</strong> descriptivist view is <strong>the</strong> view that fictional names,<br />

unlike ordinary proper names, are, or are used by <strong>the</strong> author <strong>of</strong> <strong>the</strong> fiction, as non-rigid<br />

definite descriptions.<br />

First, I shall explain <strong>the</strong> problem <strong>of</strong> fictional contingencies and argue that <strong>the</strong> explanation<br />

Currie <strong>of</strong>fered does not work. This is a real problem for <strong>the</strong> descriptivist view <strong>of</strong><br />

fiction and I will also argue that. Secondly, I shall consider o<strong>the</strong>r alternatives to descriptivism<br />

within <strong>the</strong> possible worlds framework to conclude that no possible worlds view <strong>of</strong><br />

fiction looks promising. Finally, I will end up with some positive suggestions that I would<br />

like to develop soon somewhere else.<br />

2 The Problem <strong>of</strong> Fictional Contingencies<br />

I shall motivate <strong>the</strong> problem I want to address in this paper by introducing <strong>the</strong> following<br />

pair <strong>of</strong> sentences:<br />

(1) Necessarily, someone who did not fall for Vronsky would not be Anna Karenina<br />

(2) Someone who necessarily fell for Vronsky would not be Anna Karenina<br />

Despite <strong>the</strong> apparent inconsistency between <strong>the</strong>se two claims, both seem intuitively<br />

true. (1) is true because anything that a fictional story tells about its characters is essential<br />

to <strong>the</strong>m. Tolstoy’s story about Anna Karenina tells us, among o<strong>the</strong>r things, that Anna<br />

57


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Karenina falls for Vronsky. Hence, unlike what happens to non-fictional people like you<br />

and me, and due to its fictionality, it is a constitutive feature <strong>of</strong> Anna Karenina that she<br />

falls for Vronsky. Thus, it is necessary that she does. (2) is true because Tolstoy’s story<br />

is not a story in which Anna Karenina cannot but fall for Vronsky, but a story in which<br />

Anna Karenina falls for Vronsky only contingently. Thus, anyone who necessarily fell<br />

for Vronsky, who fell for Vronsky not contingently, would not be Anna Karenina.<br />

The apparent incompatibility or tension between (1) and (2) cannot be explained in<br />

terms <strong>of</strong> <strong>the</strong> distinction between truth in fiction and truth simpliciter, or any o<strong>the</strong>r similar<br />

distinction. For both seem to be true in one and <strong>the</strong> same reading. None <strong>of</strong> <strong>the</strong>m is true<br />

in <strong>the</strong> fiction. Ra<strong>the</strong>r, <strong>the</strong>y are about <strong>the</strong> fictional character Anna Karenina. They specify<br />

some <strong>of</strong> its necessary qualities.<br />

3 The Descriptivist Way Out <strong>of</strong> <strong>the</strong> Problem<br />

The view I want to show wrong in this paper would accept <strong>the</strong> truth <strong>of</strong> both claims and<br />

would explain it as follows: Anna Karenina possibly exists. That is to say, even if –as<br />

we all agree– Anna Karenina does not actually exist, <strong>the</strong>re is some o<strong>the</strong>r possible world<br />

where she does. For to be Anna Karenina is simply to play <strong>the</strong> Anna Karenina-role and<br />

to play <strong>the</strong> Anna Karenina-role merely amounts to satisfy <strong>the</strong> general definite description<br />

that could be extracted out from <strong>the</strong> story told by Tolstoy, constructed out <strong>of</strong> everything<br />

Tolstoy says about Anna in <strong>the</strong> story he tells, which is <strong>the</strong> exact meaning <strong>of</strong> <strong>the</strong> fictional<br />

name ‘Anna Karenina’, at least as it is used by Tolstoy.<br />

On this view, what one does when telling a fiction is to tell a story, which although not<br />

actual, is possible. It is to qualitatively describe part <strong>of</strong> some possible worlds o<strong>the</strong>r than<br />

<strong>the</strong> actual. It is to explain some ways <strong>the</strong> actual world might have been but is not. Thus,<br />

<strong>the</strong> view is that Anna Karenina could have existed and fallen for Vronsky even if in fact<br />

this never occurred and will never do in actuality. That Anna Karenina falls for Vronsky<br />

is as possible as my turning <strong>of</strong>f my laptop in a moment.<br />

What would explain <strong>the</strong> truth <strong>of</strong> (1), according to this view, is <strong>the</strong> fact that <strong>the</strong>re is<br />

no possible world where someone plays <strong>the</strong> role <strong>of</strong> Anna Karenina but does not fall for<br />

Vronsky. This is so precisely for part <strong>of</strong> what it means to play this role is to fall for<br />

Vronsky. Thus, it is true in every world that anyone who plays <strong>the</strong> Anna Karenina-role in<br />

that world falls for Vronsky.<br />

Never<strong>the</strong>less, (2) would be true as well because for every person who plays <strong>the</strong> Annarole<br />

in some possible world, <strong>the</strong>re is at least one more world where that same person does<br />

not fall for Vronsky, i.e. a world where she does not play <strong>the</strong> role <strong>of</strong> Anna Karenina (This<br />

would be so because it is impossible to necessarily fall in love). The existence <strong>of</strong> <strong>the</strong>se<br />

o<strong>the</strong>r possible worlds is what would explain <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> falling for Vronsky<br />

by Anna Karenina. In “Characters and Contingency”, Currie advances such an account <strong>of</strong><br />

<strong>the</strong> truth <strong>of</strong> (1) and (2).<br />

The reason why <strong>the</strong> explanation provided above does not work is that it does not explain<br />

what it has to explain, that is, <strong>the</strong> fact that in <strong>the</strong> fiction, Anna Karenina has <strong>the</strong><br />

property <strong>of</strong> falling for Vronsky but only contingently so. This amounts to <strong>the</strong> fact that<br />

Anna Karenina herself must have <strong>the</strong> property in every story-world –i.e. where <strong>the</strong> Anna<br />

Karenina-role is satisfied–, but at <strong>the</strong> same time she (Anna Karenina and no one else)<br />

must fail to have that property while being Anna Karenina at some world, which must<br />

58


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

be possible with respect to <strong>the</strong> story-world. But it is Anna Karenina herself who must<br />

have <strong>the</strong> contingent property at one world and lack it at ano<strong>the</strong>r. This is what contingency<br />

means. O<strong>the</strong>rwise, it is not true that Anna Karenina falls for Vronsky in a contingent way,<br />

but that someone else does. The problem is that <strong>the</strong> only way for this view to try to explain<br />

that contingency is by appealing to <strong>the</strong> possible worlds –which are not story-worlds–<br />

where <strong>the</strong> possible persons that, on this view, occupy <strong>the</strong> Anna Karenina-role, and thus<br />

are Anna, in some story-worlds, do not fall for Vronsky and, <strong>the</strong>reby, nei<strong>the</strong>r occupy <strong>the</strong><br />

Anna Karenina-role nor are Anna in <strong>the</strong>m.<br />

I see no way a possible worlds descriptivist view can handle this problem. However, I<br />

can see how one might reply. But <strong>the</strong> replies I envisage seem to be wrong as well.<br />

One might find <strong>the</strong> possible worlds explanation <strong>of</strong> fictional contingencies plausible<br />

and be easily misled into thinking that it is in fact right merely due to a natural tendency<br />

to forget what this possible worlds view tells us being Anna Karenina consists in and,<br />

as a result, come to have <strong>the</strong> following confused thought: that this person who does not<br />

fall for Vronsky in some world in which she does not occupy <strong>the</strong> Anna Karenina-role is,<br />

never<strong>the</strong>less, Anna Karenina also in such a world due to <strong>the</strong> fact that she is Anna Karenina<br />

in one <strong>of</strong> <strong>the</strong> worlds <strong>of</strong> <strong>the</strong> story, where she does occupy <strong>the</strong> Anna Karenina-role and does<br />

fall for Vronsky. But to evaluate this possible worlds view under this impression is to<br />

misunderstand what <strong>the</strong> view (at least, about being Anna Karenina) is.<br />

If that o<strong>the</strong>r person were to be Anna Karenina in any sense also in this o<strong>the</strong>r world<br />

where she does not fall for Vronsky, (1) would not be true. It would not be a necessary<br />

condition for being Anna Karenina to fell for Vronsky, for <strong>the</strong>re would be some possible<br />

worlds where Anna Karenina would not fall for him. These would precisely be <strong>the</strong> worlds<br />

where someone who occupies <strong>the</strong> Anna-role in one <strong>of</strong> <strong>the</strong> story-worlds exists and does<br />

not fall for Vronsky. As I argued above, however, <strong>the</strong>re is no such sense for <strong>the</strong> case <strong>of</strong><br />

being Anna Karenina. To think <strong>of</strong> that person, let’s say Jane, as being Anna Karenina also<br />

in that o<strong>the</strong>r world where she does not fall for Vronsky only because she does occupy <strong>the</strong><br />

Anna Karenina-role at some world, it is to mistake what being Anna Karenina is, on such<br />

a view, for what being Jane (or, in fact, any o<strong>the</strong>r real person) is. Currie explains this as<br />

follows: “Now consider Jane, a respectable inhabitant <strong>of</strong> <strong>the</strong> actual world. In <strong>the</strong> actual<br />

world she does not fall for Vronsky; in fact she never meets him. But, given what I have<br />

said just now, it may well be <strong>the</strong> case that Jane in some o<strong>the</strong>r world does fall for Vronsky;<br />

in that o<strong>the</strong>r world, Jane occupies <strong>the</strong> Anna-role. Does that make Jane, in this world,<br />

Anna Karenina? No. Being Anna is, according to me, something that happens to you in<br />

some worlds and not in o<strong>the</strong>rs. It happens to you in worlds where you occupy <strong>the</strong> Anna<br />

role. In any world in which Jane occupies that role she is Anna. But that does not make<br />

her Anna in this world. Being Anna is not at all like being Jane. The person who is Jane in<br />

one world is Jane in all worlds. Being Jane is a matter <strong>of</strong> being a certain individual; being<br />

Anna, on <strong>the</strong> o<strong>the</strong>r hand, is a matter <strong>of</strong> occupying a certain role. Moving up a semantic<br />

step we can say that “Jane” is a proper name <strong>of</strong> an individual, whereas “Anna”, where it is<br />

<strong>the</strong> proper name <strong>of</strong> anything, is <strong>the</strong> proper name <strong>of</strong> a function from worlds to individuals.<br />

Of course when Tolstoy says that Anna did this or that, we are not from <strong>the</strong> point <strong>of</strong> view<br />

<strong>of</strong> our imaginative engagement with <strong>the</strong> work, to understand this as meaning that a role<br />

did this or that. This is because it is part <strong>of</strong> <strong>the</strong> fiction that “Anna” is <strong>the</strong> name <strong>of</strong> a person.<br />

But “Anna”, as used by Tolstoy, is not in fact <strong>the</strong> name <strong>of</strong> a person, nor does it purport to<br />

be. Names are expressions used in order to pick out individuals, and Tolstoy does not use<br />

59


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

“Anna” in order to do this, nor does he expect us to believe that he is. “Anna”, as used by<br />

Tolstoy, is not a name.” (Currie 2003, p. 141)<br />

On <strong>the</strong> o<strong>the</strong>r hand, one might also contemplate <strong>the</strong> possibility <strong>of</strong> <strong>the</strong> fictional characters<br />

enjoying <strong>of</strong> a certain autonomy with respect to <strong>the</strong>ir stories in such a way that one could<br />

say that Tolstoy’s Anna Karenina could have had a different end, for instance. The idea<br />

being that <strong>the</strong> characters would be well defined since <strong>the</strong> very beginning <strong>of</strong> <strong>the</strong> fiction -this<br />

opening possibilities for <strong>the</strong>ir fate o<strong>the</strong>r than <strong>the</strong> ones that <strong>the</strong> author chose. Considering<br />

this, one might think that <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> properties <strong>of</strong> <strong>the</strong> characters could be<br />

reduced to <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> writing process itself. Anna Karenina, for instance,<br />

might not have fallen for Vronsky precisely because Tolstoy might not have written that<br />

she did. However, this possibility would not save Currie’s explanation <strong>of</strong> <strong>the</strong> fictional<br />

contingency, or descriptivism <strong>of</strong> fictional names, since it is a whole different explanation<br />

not compatible with <strong>the</strong>m. But also one could see that it would not work by considering<br />

<strong>the</strong> fact that one can write a fiction where characters have certain properties necessarily,<br />

and notwithstanding this, <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> writing process remains; <strong>the</strong> author could<br />

have written a different story or this story a bit different.<br />

The conclusions I think we should draw from all <strong>of</strong> this go far<strong>the</strong>r than <strong>the</strong> mere conclusion<br />

that <strong>the</strong> explanation <strong>of</strong> fictional contingencies I criticized is wrong and should be<br />

rejected. This problem that fictional contingencies pose and <strong>the</strong> incorrectness <strong>of</strong> this explanation<br />

indicate a deeper or more fundamental problem. It really shows why at least any<br />

descriptivist view that tries to explain fiction in terms <strong>of</strong> possible worlds –which seems to<br />

be <strong>the</strong>ir only way– is mistaken, and maybe it even shows that fiction cannot be accounted<br />

for in possible worlds terms at all; at least, for <strong>the</strong> case <strong>of</strong> fictions told by <strong>the</strong> use <strong>of</strong><br />

singular terms such as proper names. In short, <strong>the</strong> problem is that this possible worlds<br />

descriptivist view cannot explain <strong>the</strong> truth <strong>of</strong> pairs like (1) and (2). For, in particular, it<br />

cannot explain <strong>the</strong> possession <strong>of</strong> any fictional contingency by any fictional character.<br />

4 O<strong>the</strong>r Possible Descriptivist Ways Out<br />

If <strong>the</strong> possible worlds view has it that being Anna Karenina amounts to satisfy <strong>the</strong> nonrigid<br />

definite description, which has as a part <strong>the</strong> description <strong>of</strong> this woman as falling<br />

for Vronsky, it will not succeed in explaining that Anna Karenina falls for Vronsky only<br />

contingently. For <strong>the</strong> simple reason that any woman who would be Anna Karenina at<br />

all would be so only in some worlds and precisely in those worlds where she falls for<br />

Vronsky. One might think, even against what Currie seems to insist, that <strong>the</strong>re are two<br />

ways <strong>of</strong> being Anna Karenina, though: one <strong>of</strong> <strong>the</strong>m, <strong>the</strong> one we already contemplated<br />

and <strong>the</strong> one that Currie tells us; <strong>the</strong> o<strong>the</strong>r, <strong>the</strong> one that <strong>the</strong> possible worlds view would<br />

like to have, while keeping <strong>the</strong> previous one, which is to be someone who at some storyworld<br />

satisfies <strong>the</strong> description that ‘Anna Karenina is or conveys, even if she does not do<br />

so at some o<strong>the</strong>r possible worlds. In this sense anyone who met <strong>the</strong> description at some<br />

possible world, would be also Anna Karenina at all <strong>the</strong> o<strong>the</strong>r worlds where she existed<br />

even if she did not meet <strong>the</strong> description in <strong>the</strong>m. This last sense does not seem to be<br />

compatible with <strong>the</strong> view that claims that ‘Anna Karenina is used as a non-rigid definite<br />

description, and that when it is not, when it is used literally, does not refer at all. But lets<br />

assume for a moment it is for <strong>the</strong> sake <strong>of</strong> <strong>the</strong> argument.<br />

This way <strong>the</strong>re would be two ways <strong>of</strong> understanding <strong>the</strong> relevant pair <strong>of</strong> claims. Ac-<br />

60


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

cording to <strong>the</strong> interpretation corresponding to <strong>the</strong> first sense <strong>of</strong> being Anna Karenina,<br />

(1) would be true but (2) false. And according to <strong>the</strong> interpretation corresponding to <strong>the</strong><br />

second sense, while (2) would be true, (1) would be false. In none <strong>of</strong> <strong>the</strong>se two interpretations,<br />

one gets that both claims are true. Intuitively at least, however, <strong>the</strong>y seem to<br />

be true under one and <strong>the</strong> same interpretation. Both claims are about <strong>the</strong> features that<br />

characterize a fictional character, Anna Karenina. One <strong>of</strong> <strong>the</strong>se features is to be someone<br />

who falls for Vronsky; ano<strong>the</strong>r, to be someone who falls for Vronsky in a contingent way.<br />

One might think, though, that <strong>the</strong> intuitive truth <strong>of</strong> <strong>the</strong>se two claims may be very well<br />

accounted for by considering a different interpretation <strong>of</strong> <strong>the</strong>m in each case. However,<br />

<strong>the</strong>re is no independent reason to interpret <strong>the</strong>m this differently. This does not seem to be<br />

why we think <strong>the</strong>y are both true. This way out <strong>of</strong> <strong>the</strong> problem fictional contingencies pose<br />

to this view would be completely ad hoc.<br />

In any case, <strong>the</strong>re is no way on such a view to obtain what <strong>the</strong> view really needs. That<br />

is, that Anna Karenina, one and <strong>the</strong> same thing, has <strong>the</strong> property <strong>of</strong> falling for Vronsky,<br />

but lacks it at ano<strong>the</strong>r possible world. For it is a condition on being Anna Karenina that<br />

she does so contingently. This is what having a contingent property amounts to. Note that<br />

<strong>the</strong> independent reason to argue for <strong>the</strong> legitimacy <strong>of</strong> using two different interpretations<br />

cannot be that ‘Anna Karenina can be used both as a non-rigid definite description and as<br />

a rigid proper name and that while it is used as a non-rigid definite description in <strong>the</strong> case<br />

<strong>of</strong> (1), it is used as a rigid proper name in <strong>the</strong> case <strong>of</strong> (2). For, according to <strong>the</strong> possible<br />

worlds view, only within <strong>the</strong> fiction, ‘Anna Karenina is or comes to be used as an ordinary<br />

rigid proper name. We cannot use <strong>the</strong> proper names that are used in <strong>the</strong>se o<strong>the</strong>r possible<br />

worlds. For <strong>the</strong>se proper names are only possible, not actual. Note too that appeal to <strong>the</strong><br />

ambiguity in scope due to <strong>the</strong> interaction between modalities and definite descriptions in<br />

(1) and (2) does not work ei<strong>the</strong>r. For <strong>the</strong> problem is that we are dealing with fiction and<br />

fictional names and hence, <strong>the</strong>re are no individuals that could stand in <strong>the</strong> place <strong>of</strong> <strong>the</strong>se<br />

fictional characters o<strong>the</strong>r than <strong>the</strong> ones that satisfy <strong>the</strong> definite descriptions in question in<br />

each <strong>of</strong> <strong>the</strong> possible worlds. Thus, we can explain <strong>the</strong> consistency <strong>of</strong> <strong>the</strong> following pair<br />

<strong>of</strong> sentences:<br />

(3) Necessarily, <strong>the</strong> Queen <strong>of</strong> England is queen<br />

(4) The Queen <strong>of</strong> England may not have been queen<br />

by noticing <strong>the</strong> distinction in scope <strong>of</strong> <strong>the</strong> occurrences <strong>of</strong> <strong>the</strong> definite description ‘<strong>the</strong><br />

Queen <strong>of</strong> England in (3) and (4), and explain that (4) can be true compatibly with <strong>the</strong><br />

truth <strong>of</strong> (3) because <strong>the</strong>re is an individual –i.e. <strong>the</strong> Queen <strong>of</strong> England– who can exist in<br />

ano<strong>the</strong>r possible world and not be <strong>the</strong> Queen <strong>of</strong> England in it. As I said, unlike in <strong>the</strong><br />

case <strong>of</strong> fiction, this is possible precisely because <strong>the</strong>re is in fact an individual who is <strong>the</strong><br />

Queen <strong>of</strong> England in <strong>the</strong> actual world, whereas <strong>the</strong>re is no such individual for <strong>the</strong> definite<br />

description that <strong>the</strong> fictional name Anna Karenina allegedly abbreviates.<br />

5 Non-Descriptivist Possible Worlds Views <strong>of</strong> Fiction<br />

One might think that perhaps <strong>the</strong>re are o<strong>the</strong>r possible worlds views <strong>of</strong> fiction that are<br />

not descriptivist that could handle this problem <strong>of</strong> <strong>the</strong> fictional contingencies <strong>of</strong> fictional<br />

characters. I shall very briefly argue that <strong>the</strong> only available ones are not very attractive.<br />

Descriptivism seems to be <strong>the</strong> most plausible possible worlds view <strong>of</strong> fiction.<br />

I see two options: one might defend Meignonianism and say that fictional characters<br />

61


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

actually exist in some special mysterious way and that fictional names are like ordinary<br />

proper names that rigidly refer to <strong>the</strong>m. Or one might defend <strong>the</strong> view that fictional characters<br />

are abstract objects, which actually exist and to which <strong>the</strong> fictional names rigidly<br />

refer. Within this last option I see two fur<strong>the</strong>r options: one might say that <strong>the</strong>se abstract<br />

objects are only contingently so, so that in o<strong>the</strong>r worlds <strong>the</strong>se same objects exist but are<br />

concrete instead <strong>of</strong> abstract in <strong>the</strong>se worlds. The existence <strong>of</strong> <strong>the</strong>se contingently nonconcrete<br />

is defended by Bernard Linsky and Edward N. Zalta not with respect to fictional<br />

characters but with respect to mere possible objects –i.e. possibilia. Or one might defend<br />

that <strong>the</strong>se abstract objects, like any o<strong>the</strong>r abstract objects, are necessarily abstract, in<br />

which case, <strong>the</strong>y only can do what <strong>the</strong>ir fictions tell <strong>the</strong>y do in worlds that are impossible,<br />

for <strong>the</strong>re are things that only concrete objects can do. Thus, if <strong>the</strong>se abstract objects are<br />

to do <strong>the</strong>m, it can only occur in impossible worlds ra<strong>the</strong>r than possible ones. This is <strong>the</strong><br />

Millian view defended by Nathan Salmon.<br />

On <strong>the</strong> one hand, <strong>the</strong> first option, Meignonianism, is wholly mysterious and hence, no<br />

plausible at all. On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> only option left which explains fictions in terms <strong>of</strong><br />

possibilities is <strong>the</strong> option that sees fictional characters as contingently nonconcrete objects<br />

and, hence, consists in <strong>the</strong> very implausible claim that some actual abstract objects can be<br />

concrete and some actual concrete objects can be abstract. In view <strong>of</strong> <strong>the</strong> alternatives to<br />

descriptivism about fiction, I think we can conclude that fiction should not be dealt with<br />

in terms <strong>of</strong> possible worlds.<br />

6 Some Positive Suggestions<br />

I think this problem is easily solved once we simply abandon <strong>the</strong> idea <strong>of</strong> explaining fiction<br />

in terms <strong>of</strong> possible worlds. I would like to defend that story-worlds are not possible<br />

worlds even if <strong>the</strong>y are ontologically <strong>the</strong> same kind <strong>of</strong> thing: that is, sets <strong>of</strong> sentences<br />

or propositions. The difference between story worlds and possible worlds would just be<br />

that only <strong>the</strong> later represent possibilities with respect to <strong>the</strong> actual world. The fictional<br />

contingencies <strong>of</strong> fictional characters should be explained by appealing to those worlds<br />

which would be possible but only with respect to <strong>the</strong> world <strong>of</strong> <strong>the</strong> story and not with<br />

respect to our actual world.<br />

This way out <strong>of</strong> <strong>the</strong> problem would be possible because fictional names, on <strong>the</strong> o<strong>the</strong>r<br />

hand, are not abbreviated non-rigid definite descriptions, but merely empty rigid proper<br />

names, that is, proper names that do not have a referent. The meaning <strong>of</strong> fictional names<br />

should be derived, in my view, from <strong>the</strong> fact that part <strong>of</strong> <strong>the</strong> meaning <strong>of</strong> any proper name<br />

is <strong>the</strong> meaning <strong>of</strong> a rigid definite description associated with <strong>the</strong>m. Any proper name N,<br />

when used, in addition to rigidly refer to <strong>the</strong>ir bearer, semantically expresses some definite<br />

description like ‘<strong>the</strong> bearer <strong>of</strong> N or ‘<strong>the</strong> individual called N, where <strong>the</strong> token <strong>of</strong> <strong>the</strong> name<br />

N that occurs within that description is used <strong>the</strong> same way as <strong>the</strong> name N. This view<br />

about proper names in general is a view that I learnt from Manuel Garcia-Carpinteros<br />

work. Note that this view does not say that proper names are synonymous to definite<br />

descriptions, as Saul Kripke showed this is incorrect, and that we can compatibly say that<br />

Anna Karenina nei<strong>the</strong>r actually nor possibly exist.<br />

Finally, I also think that in addition to <strong>the</strong> fictional operator ‘in <strong>the</strong> fiction f, <strong>the</strong>re is<br />

ano<strong>the</strong>r fictional operator that we use, whe<strong>the</strong>r explicitly or implicitly, in our fictional<br />

discourse. When we use fictional names to talk about <strong>the</strong>m as fictional characters instead<br />

62


<strong>of</strong> as <strong>the</strong> individuals that <strong>the</strong>se fictional characters represent in <strong>the</strong> fictions, we ei<strong>the</strong>r say<br />

‘<strong>the</strong> fictional character N or we just utter <strong>the</strong> name N. It is my view that even in <strong>the</strong> later<br />

case, <strong>the</strong> expression ‘<strong>the</strong> fictional character is <strong>the</strong>re, though only in an implicit way. It<br />

is <strong>the</strong> interaction between this expression and fictional names that makes our fictional<br />

discourse when talking about fictional characters meaningful. How this interaction works<br />

is something we have yet to discover. I do not know.<br />

7 Conclusion<br />

I have argued that <strong>the</strong>re is a problem with <strong>the</strong> fictional contingent properties <strong>of</strong> fictional<br />

characters that descriptivism about fiction cannot solve. I have also argued that o<strong>the</strong>r<br />

alternative views on fiction that explain it in terms <strong>of</strong> possible worlds do not seem any<br />

plausible. Finally, I have provided some positive suggestions to develop in order to explain<br />

fiction and <strong>the</strong> problem posed by fictional contingencies. These are suggestions that<br />

I plan to develop soon.<br />

Acknowledgements<br />

I would like to thank <strong>the</strong> extremely useful comments to earlier drafts <strong>of</strong> this work that I<br />

have received from Manuel Garcia-Carpintero, Dominic McIver Lopes, Genoveva Marti,<br />

Francis Jeffry Pelletier, Pablo Rychter and Ori Simchen as well as <strong>the</strong> extremely useful<br />

patience and interest that Stefano Predelli showed in discussing it with me. I am also<br />

thankful to <strong>the</strong> anonymous referees for <strong>the</strong>ir interesting points that I have tried to include<br />

in this final version <strong>the</strong> best I could.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Currie, G. (1988). Fictional names, Australasian Journal <strong>of</strong> Philosophy 66.<br />

Currie, G. (1990). The Nature <strong>of</strong> Fiction, Cambridge: Cambridge University Press.<br />

Currie, G. (2003). Characters and contingency, Dialectica 57.<br />

Kripke, S. (1972). Naming and Necessity, Harvard University Press.<br />

Lewis, D. (1978/1983). Truth in fiction, Reprinted in David Lewis: Philosophical Papers<br />

I.: Oxford: Oxford University Press.<br />

Linsky, B. and Zalta, E. N. (1996). In defense <strong>of</strong> <strong>the</strong> contingently nonconcrete, Philosophical<br />

Studies 84/2-3.<br />

Salmon, N. (1998). Nonexistence, Nous 32.<br />

Stalnaker, R. (1999). Assertion, Context and Content, Oxford : Oxford University Press.<br />

63


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

64


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

MEANING & INFERENCE IN CASE OF CONFLICT<br />

Michael Franke<br />

Universiteit van Amsterdam<br />

Abstract. This paper applies a model <strong>of</strong> boundedly rational “level-k thinking” (c.f. Stahl<br />

and Wilson, 1995; Crawford, 2003; Camerer, Ho and Chong, 2004) to a classical concern <strong>of</strong><br />

game <strong>the</strong>ory: when is information credible and what shall I do with it if it is not? The<br />

model presented here extends and generalizes recent work in game-<strong>the</strong>oretic pragmatics<br />

(Stalnaker, 2006; Jäger, 2007; Benz and van Rooij, 2007). Pragmatic inference is modeled<br />

as a sequence <strong>of</strong> iterated best responses, defined here in terms <strong>of</strong> <strong>the</strong> interlocutors’ epistemic<br />

states. Credibility considerations are a special case <strong>of</strong> a more general pragmatic inference<br />

procedure at each iteration step. The resulting analysis <strong>of</strong> message credibility improves on<br />

previous game-<strong>the</strong>oretic analyses, is more general and places credibility in <strong>the</strong> linguistic<br />

context where it, arguably, belongs.<br />

1 Semantic Meaning and Credible Information in Signaling Games<br />

The perhaps simplest game-<strong>the</strong>oretic model <strong>of</strong> language use is a signaling game with<br />

meaningful signals. A sender S observes <strong>the</strong> state <strong>of</strong> <strong>the</strong> world t ∈ T in private and<br />

chooses a message m from a set <strong>of</strong> alternatives M all <strong>of</strong> which are assumed to be meaningful<br />

in <strong>the</strong> (unique and commonly known) language shared by S and a receiver R. In<br />

turn, R observes <strong>the</strong> sent message and chooses an action a from a given set A. In general,<br />

<strong>the</strong> pay<strong>of</strong>fs for both S and R depend on <strong>the</strong> state t, <strong>the</strong> sent message m and <strong>the</strong> action a<br />

chosen by <strong>the</strong> receiver. Formally, a SIGNALING GAME WITH MEANINGFUL SIGNALS is<br />

a tuple 〈{S, R} , T, Pr, M, [·] , A, US, UR〉 where Pr ∈ ∆(T ) is a probability distribution<br />

over T ; [·] : M → P(T ) is a semantic denotation function and US,R : M × A × T → R<br />

are utility functions for both sender and receiver. 1 We can conceive <strong>of</strong> such signaling<br />

games as abstract ma<strong>the</strong>matical models <strong>of</strong> a conversational context whose most important<br />

features <strong>the</strong>y represent: <strong>the</strong> interlocutors’ beliefs, behavioral possibilities and preferences.<br />

If a signaling game is a context model, <strong>the</strong> game’s solution concept is what yields a<br />

prediction <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> agents in <strong>the</strong> modelled conversational situation. The following<br />

easy example <strong>of</strong> a scalar implicature, e.g., <strong>the</strong> inference that not all students came<br />

when hearing <strong>the</strong> sentence “Some <strong>of</strong> <strong>the</strong> students came”, makes this distinction clear. A<br />

simple context model for this case is <strong>the</strong> signaling game G1: 2 <strong>the</strong>re are two states t∃¬∀ and<br />

t∀, two messages msome and mall with semantic meaning as indicated and two receiver<br />

interpretation actions a∃¬∀ or a∀ which correspond one-to-one with <strong>the</strong> states; sender and<br />

receiver pay<strong>of</strong>fs are aligned: an implementation <strong>of</strong> <strong>the</strong> standard assumption that conversation<br />

and implicature calculation revolve around <strong>the</strong> cooperative principle (Grice, 1989). A<br />

solution concept, whatever it may be, should <strong>the</strong>n ideally predict that S t∀ (S t∃¬∀) chooses<br />

msome (mall) and <strong>the</strong> receiver responds with action a∃¬∀ (a∀). 3<br />

1 I will assume throughout that (i) all sets T , M and A are non-empty and finite, that (ii) Pr(t) > 0 for<br />

all t ∈ T , that (iii) for each state t <strong>the</strong>re is at least one message m which is true in that state and that (iv) no<br />

message is contradictory, i.e., <strong>the</strong>re is no m for which [m] = ∅.<br />

2 Unless indicated, I assume that states are equiprobable in example games.<br />

3 For t ∈ T , I write S t as an abbreviation for “a sender <strong>of</strong> type t”.<br />

65


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

a∃¬∀ a∀ msome mall<br />

t∃¬∀ 1,1 0,0<br />

t∀ 0,0 1,1<br />

√<br />

√ √−<br />

G1: “Scalar Implicatures”<br />

amate aignore mhigh mlow<br />

√<br />

thigh 1,1 0,0<br />

tlow 1,0 0,1 −<br />

G2: “Partial Conflict”<br />

It is obvious that in order to arrive at this prediction, a special role has to be assigned to<br />

<strong>the</strong> conventional, semantic meaning <strong>of</strong> <strong>the</strong> messages involved. For instance, in <strong>the</strong> above<br />

example anti-semantic play, as we could call it, that simply reverses <strong>the</strong> use <strong>of</strong> messages,<br />

should be excluded. Most game-<strong>the</strong>oretic models <strong>of</strong> language use hard-wire semantic<br />

meaning into <strong>the</strong> game play, ei<strong>the</strong>r as a restriction on available moves <strong>of</strong> sender and receiver,<br />

or into <strong>the</strong> pay<strong>of</strong>fs, but in both cases effectively enforcing truthfulness and trust.<br />

This is fine as long as conversation is mainly cooperative and preferences aligned. But<br />

let’s face it: <strong>the</strong> central Gricean assumption <strong>of</strong> cooperation is an optimistic idealization<br />

after all; conflict, lies and deceit are as ubiquitous as air. But <strong>the</strong>n, hard-wiring <strong>of</strong> truthfulness<br />

and trust limits <strong>the</strong> applicability <strong>of</strong> our models as it excludes <strong>the</strong> possibility that<br />

senders may wish to mislead <strong>the</strong>ir audience. We should aim for more general models and,<br />

ideally, let <strong>the</strong> agents, not <strong>the</strong> modeller decide when to be truthful and what to trust.<br />

Opposed to hard-wiring truthfulness and trust, <strong>the</strong> most liberal case at <strong>the</strong> o<strong>the</strong>r end<br />

<strong>of</strong> <strong>the</strong> spectrum is to model communication, not considering reputation or fur<strong>the</strong>r psychological<br />

constraints at all, as cheap talk. Here messages do not impose restrictions on<br />

<strong>the</strong> game play and are entirely pay<strong>of</strong>f irrelevant: US,R(m, a, t) = US,R(m ′ , a, t) for all<br />

m, m ′ ∈ M, a ∈ A and t ∈ T . However, if talk is cheap, yet exogenously meaningful, <strong>the</strong><br />

question arises how to integrate semantic meaning into <strong>the</strong> game. Standard solution concepts,<br />

such as sequential equilibrium or rationalizability, are too weak to predict anything<br />

reasonable in this case: <strong>the</strong>y allow for nearly all anti-semantic play and also for babbling,<br />

where signals are sent, as it were, arbitrarily and <strong>the</strong>refore ignored by <strong>the</strong> receiver.<br />

In response to this problem, game <strong>the</strong>orists have proposed various refinements <strong>of</strong> <strong>the</strong><br />

standard solution concepts based on <strong>the</strong> notion <strong>of</strong> credibility. 4 The idea is that semantic<br />

meaning should be respected (in <strong>the</strong> solution concept) wherever this is reasonable in view<br />

<strong>of</strong> <strong>the</strong> possibly diverging preferences <strong>of</strong> interlocutors. As an easy example, look at game<br />

G2 where S is <strong>of</strong> ei<strong>the</strong>r a high quality or a low quality type, and where R would like<br />

to pair with S thigh only, while S wants to pair with R irrespective <strong>of</strong> her type. Interests<br />

are in partial conflict here and, intuitively, a costless, non-committing message mhigh<br />

is not credible, because S tlow would have all reason to send it untruthfully. Therefore,<br />

intuitively, R should ignore whatever S says in this game. In general, if nothing prevents<br />

S from babbling, lying or deceiving, she might as well do so; whenever she even has an<br />

incentive to, she certainly will. For <strong>the</strong> receiver <strong>the</strong> central question becomes: when is a<br />

signal credible and what should I do if it is not?<br />

This paper <strong>of</strong>fers a fresh look at this classical problem <strong>of</strong> game <strong>the</strong>ory. The novelty<br />

is, so to speak, a “linguistic turn”: I suggest that credibility considerations are pragmatic<br />

inferences, in some sense very much alike —and in ano<strong>the</strong>r sense very much unlike—<br />

conversational implicatures. I argue that this linguistic approach to credibility <strong>of</strong> information<br />

improves on <strong>the</strong> classical game-<strong>the</strong>oretic analyses by Farrell (1993) and Rabin<br />

4 The standards in <strong>the</strong> debate about credibility were set by Farrell (1993) for equilibrium and by Rabin<br />

(1990) for rationalizability. I will mainly focus on <strong>the</strong>se two classical papers here for reasons <strong>of</strong> space.<br />

66<br />

√−


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(1990). In order to implement conventional meaning <strong>of</strong> signals in a cheap talk model, <strong>the</strong><br />

present paper takes an epistemic approach to <strong>the</strong> solution <strong>of</strong> games: <strong>the</strong> model presented<br />

in this paper spells out <strong>the</strong> reasoning <strong>of</strong> interlocutors in terms <strong>of</strong> <strong>the</strong>ir beliefs about <strong>the</strong><br />

behavior <strong>of</strong> <strong>the</strong>ir opponents as a sequence <strong>of</strong> iterated best responses (IBR) which takes<br />

semantic meaning as a starting point. For clarity: <strong>the</strong> IBR model places no restriction<br />

whatsoever on <strong>the</strong> use <strong>of</strong> signals; conventional meaning is implemented merely as a focal<br />

element in <strong>the</strong> deliberation <strong>of</strong> agents. This way, <strong>the</strong> IBR model extends recent work<br />

in game-<strong>the</strong>oretic pragmatics (Jäger, 2007; Benz and van Rooij, 2007), to which it adds<br />

generality by taking diverging preferences into account and by implementing <strong>the</strong> basic assumptions<br />

<strong>of</strong> “level-k models” <strong>of</strong> reasoning in games (cf. Stahl and Wilson, 1995; Crawford,<br />

2003; Camerer et al., 2004). In particular, agents in <strong>the</strong> model are assumed to be<br />

boundedly rational in <strong>the</strong> sense that each agent computes only finitely many steps <strong>of</strong> <strong>the</strong><br />

best response sequence. Section 2 scrutinizes <strong>the</strong> notion <strong>of</strong> credibility, section 3 spells out<br />

<strong>the</strong> formal model and section 4 discusses its properties and predictions.<br />

2 Credibility and Pragmatic Inference<br />

The classical idea <strong>of</strong> message credibility is due to Farrell (1993). Farrell seeks an equilibrium<br />

refinement that pays due respect to <strong>the</strong> semantic meaning <strong>of</strong> messages. His notion<br />

<strong>of</strong> credibility is <strong>the</strong>refore tied to a given reference equilibrium as a status quo. According<br />

to Farrell, <strong>the</strong>n, a message m is FARRELL-CREDIBLE with respect to a given equilibrium<br />

if all t ∈ [m] prefer <strong>the</strong> receiver to interpret m literally, i.e., to play a best response to <strong>the</strong><br />

belief Pr(·| [m]) that m is true, over <strong>the</strong> equilibrium play, while no type t �∈ [m] does.<br />

A number <strong>of</strong> objections can be raised against Farrell-credibility. First <strong>of</strong> all, <strong>the</strong> definition<br />

requires all types in [m] to prefer a literal interpretation <strong>of</strong> m over <strong>the</strong> reference<br />

equilibrium. This makes sense, under Farrell’s Rich Language Assumption (RLA) that<br />

for every X ⊆ T <strong>the</strong>re is a message m with [m] = X. This assumption is prevalent in<br />

game-<strong>the</strong>oretic discussions <strong>of</strong> credibility, but restricts applicability. I will show in section<br />

4 that this assumption seriously restricts Rabin’s (1990) account. But for now, suffice<br />

it to say that, in particular, <strong>the</strong> RLA excludes models like G1, used to study pragmatic<br />

inference in <strong>the</strong> light <strong>of</strong> (partial) inexpressibility. I will drop <strong>the</strong> RLA here to aim for<br />

more generality and compatibility with linguistic pragmatics. 5 Doing so, implies amending<br />

Farrell-credibility to require only that some types in [m] prefer a literal interpretation<br />

<strong>of</strong> m over <strong>the</strong> reference equilibrium.<br />

Still, <strong>the</strong>re are fur<strong>the</strong>r problems. Mat<strong>the</strong>ws, Okuno-Fujiwara and Postlewaite (1991)<br />

criticize Farrell-credibility as being too strong. Their argument builds on example G3.<br />

Compared to <strong>the</strong> babbling equilibrium, in which R performs a3, messages m1 and m2 are<br />

intuitively credible: both S t1 , as well as S t2 have good reason to send m1 and m2 respectively.<br />

Communication seems possible and utterly plausible. However, nei<strong>the</strong>r message is<br />

Farrell-credible, because for i, j ∈ {1, 2} and i �= j not only S tj , but also S ti prefers R to<br />

play a best response to a literal interpretation <strong>of</strong> mj, which would trigger action aj, over<br />

5 A reviewer points out that <strong>the</strong> RLA has a correspondent in <strong>the</strong> linguistic world in Katz’s (1981) “principle<br />

<strong>of</strong> effability”. The reviewer supports dropping <strong>the</strong> RLA, because o<strong>the</strong>rwise pragmatic inferences is<br />

limited to context and effort considerations. It is also very common (and, to my mind, reasonable) to restrict<br />

attention to certain alternative expressions only, namely those that are salient (in context) after observing a<br />

message. Of course, game <strong>the</strong>ory is silent as to where <strong>the</strong> alternatives come from, since this is a question<br />

for <strong>the</strong> linguist, perhaps even <strong>the</strong> syntactician (cf. Katzir, 2007).<br />

67


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

a1 a2 a3 m1 m2<br />

t1 4,3 3,0 1,2 √<br />

t2 3,0 4,3 1,2 −<br />

G3: “Best Message Counts”<br />

√−<br />

a1 a2 a3 a4 m12 m23 m<strong>13</strong><br />

t1 4,5 5,4 0,0 1,4 √ √<br />

√ √−<br />

t2 0,0 4,5 5,4 1,4 √ √−<br />

t3 5,4 0,0 4,5 1,4 −<br />

G4: “Fur<strong>the</strong>r Iteration”<br />

<strong>the</strong> no-communication outcome a3. The problem with Farrell’s notion is obviously that<br />

just doing better than equilibrium is not enough reason to send a message, when sending<br />

ano<strong>the</strong>r message is even better for <strong>the</strong> sender. When evaluating <strong>the</strong> credibility <strong>of</strong> a<br />

message m, we have to take into account alternative forms that t �∈ [m] might want to<br />

send.<br />

Compare this with <strong>the</strong> scalar implicature in G1. Message msome is interpreted as communicating<br />

that <strong>the</strong> true state <strong>of</strong> affairs is t∃¬∀, because in t∀ <strong>the</strong> sender would have used<br />

mall. In o<strong>the</strong>r words, <strong>the</strong> receiver discards a state t ∈ [m] as a possible sender <strong>of</strong> m<br />

because that type has a better message to send. Of course, such pragmatic enrichment<br />

does not make a message intuitively incredible, as it is still used in line with its semantic<br />

meaning. Intuitively speaking, in G1 S even wants R to draw this pragmatic inference.<br />

This is, <strong>of</strong> course, different in G2. In general, if S wants to mislead, she intuitively<br />

wants <strong>the</strong> receiver to adopt a certain belief, but she does not want <strong>the</strong> receiver to realize<br />

that this belief might be false: we could say, somewhat loosely, that S wants her purported<br />

communicative intention to be recognized (and acted upon), but she does not want<br />

her deceptive intention to be recognized. Never<strong>the</strong>less, if <strong>the</strong> receiver does manage to<br />

recognize a deceptive intention, this too may lead to some kind <strong>of</strong> pragmatic inference,<br />

albeit one that <strong>the</strong> sender did not intend <strong>the</strong> receiver to draw. While <strong>the</strong> implicature in G1<br />

rules out a semantically feasible possibility, credibility considerations, in a sense, do <strong>the</strong><br />

exact opposite: message mhigh is pragmatically weakened in G2 by ruling in state tlow.<br />

Despite <strong>the</strong> differences, <strong>the</strong>re is a common core to both implicature and credibility<br />

inference. Here and <strong>the</strong>re, <strong>the</strong> receiver seems to reason: which types <strong>of</strong> senders would<br />

send this message given that I believe it literally? Indeed, exactly this kind <strong>of</strong> reasoning<br />

underlies Benz and van Rooij’s (2007) model <strong>of</strong> implicature calculation for <strong>the</strong> purely<br />

cooperative case. The driving observation <strong>of</strong> this paper is that <strong>the</strong> same reasoning might<br />

not only rule out states t ∈ [m] to yield implicatures but may also rule in states t �∈ [m].<br />

When <strong>the</strong> latter is <strong>the</strong> case, m seems intuitively incredible. Still, <strong>the</strong> reasoning pattern<br />

by which implicatures and credibility-based inferences are computed is <strong>the</strong> same. On<br />

superficial reading, this view on message credibility can be found in Stalnaker (2006)<br />

: 6 call a message m BVRS-CREDIBLE (Benz, van Rooij, Stalnaker) iff for some types<br />

t ∈ [m], but for no type t �∈ [m] S t ’s expected utility <strong>of</strong> sending m given that R interprets<br />

literally is at least as great as S t ’s expected utility <strong>of</strong> sending any alternative message m ′ .<br />

The notion <strong>of</strong> BvRS-credibility matches our intuitions in all <strong>the</strong> cases discussed so far,<br />

but it is, in a sense, self-refuting, as G4 from Mat<strong>the</strong>ws et al. (1991) shows. In this game,<br />

all <strong>the</strong> available messages m12, m23 and m<strong>13</strong> are BvRS-credible, because if R interprets<br />

6 It is unfortunately not entirely clear to me what exactly Stalnaker’s proposal amounts to, as insightful<br />

as it might be, because <strong>the</strong> account is not fully spelled out formally. The basic idea seems to be that<br />

(something like) <strong>the</strong> notion <strong>of</strong> BvRS-credibility, as it is called here, should be integrated as a constraint on<br />

receiver beliefs —believe a message iff it is BvRS-credible— into an epistemic model <strong>of</strong> <strong>the</strong> game toge<strong>the</strong>r<br />

with some appropriate assumption <strong>of</strong> (common) belief in rationality. The class <strong>of</strong> game models that satisfies<br />

rationality and credibility constraints would <strong>the</strong>n ultimately define how signals are used and interpreted.<br />

68


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

literally S t1 will use message m12, S t2 will use message m23 and S t3 will use message m<strong>13</strong>.<br />

No message is used untruthfully by any type. However, if R realizes that exactly S t1 uses<br />

message m12, he would ra<strong>the</strong>r not play a2, but a1. But if <strong>the</strong> sender realizes that message<br />

m12 triggers <strong>the</strong> receiver to play a1, suddenly S t3 wants to send m12 untruthfully. This<br />

example shows that BvRS-credibility is a reliable start, but stops too short. If messages<br />

are deemed credible and <strong>the</strong>refore believed, this may create an incentive to mislead. What<br />

seems needed to rectify <strong>the</strong> formal analysis <strong>of</strong> message credibility is a fully spelled-out<br />

model <strong>of</strong> iterated best responses that starts in <strong>the</strong> Benz-van-Rooij-Stalnaker way and <strong>the</strong>n<br />

carries on iterating. Here is such a model.<br />

3 The IBR Model and its Assumptions<br />

3.1 Assumptions: Focal Meaning and Bounded Rationality<br />

The IBR model presented in this paper rests on three assumptions with which it also sets<br />

itself apart from previous best-response models in formal pragmatics (Jäger, 2007; Benz<br />

and van Rooij, 2007; Jäger, 2008). The first assumption is <strong>the</strong> Focal Meaning Assumption:<br />

semantic meaning is focal in <strong>the</strong> sense that <strong>the</strong> sequence <strong>of</strong> best responses starts with a<br />

purely semantic truth-only sender strategy. Semantic meaning is also assumed focal in<br />

<strong>the</strong> sense that throughout <strong>the</strong> IBR sequence R believes messages to be truthful unless<br />

S has a positive incentive to be untruthful. This is <strong>the</strong> second, so called Truth Ceteris<br />

Paribus Assumption (TCP). These two (epistemic) assumptions assign semantic meaning<br />

its proper place in this model <strong>of</strong> cheap-talk communication.<br />

The third assumption is <strong>the</strong> Bounded Rationality Assumption: I assume that players<br />

in <strong>the</strong> game have limited resources which allow <strong>the</strong>m to reason only up to some finite<br />

iteration depth k. At <strong>the</strong> same time I take agents to be overconfident: each agent beliefs<br />

that she is smarter than her opponent. Camerer et al. (2004) make an empirical case for<br />

<strong>the</strong>se assumptions about <strong>the</strong> psychology <strong>of</strong> reasoners. 7 However, for simplicity, I do not<br />

implement Camerer et al.’s (2004) Cognitive Hierarchy Model in full. Camerer et al.<br />

assume that each agent who is able to reason up to strategic depth k has a proper belief<br />

about <strong>the</strong> population distribution <strong>of</strong> players who reason up to depth l < k, but I will<br />

assume here, just to keep things simple, that each player believes that she is exactly one<br />

step ahead <strong>of</strong> her opponent (cf. Crawford, 2003; Crawford, 2007). (I will discuss this<br />

simplifying assumption critically in section 4.)<br />

3.2 Beliefs & Best Responses<br />

Given a signaling game, a SENDER SIGNALING-STRATEGY is a function σ ∈ S =<br />

(∆(M)) T and a RECEIVER RESPONSE-STRATEGY is a function ρ ∈ R = (∆(A)) M .<br />

In order to define which strategies are best responses to a given belief, we need to define<br />

<strong>the</strong> game-relevant beliefs <strong>of</strong> both S and R. Since <strong>the</strong> only uncertainty <strong>of</strong> S concerns what<br />

R will do, <strong>the</strong> set <strong>of</strong> relevant SENDER BELIEFS ΠS is just <strong>the</strong> set <strong>of</strong> receiver responsestrategies:<br />

ΠS = R. On <strong>the</strong> receiver’s side, we may say, with some redundancy, that <strong>the</strong>re<br />

7 A good intuitively accessible example why this should be is a so-called beauty contest game (cf. Ho,<br />

Camerer and Weigelt, 1998). Each player from a group <strong>of</strong> size n > 2 chooses a number from 0 to 100. The<br />

player closest to 2/3 <strong>the</strong> average wins. When this game is played with a group <strong>of</strong> subjects who have never<br />

played <strong>the</strong> game before, <strong>the</strong> usual group average lies somewhere between 20 to 30. This is quite far from<br />

<strong>the</strong> group average 0 which we would expect from common (true) belief in rationality. Everybody seems to<br />

believe that <strong>the</strong>y are just a smarter than everybody else, without noticing <strong>the</strong>ir own limitations.<br />

69


are three components in any game-relevant belief (cf. Battigalli, 2006): firstly, R has a<br />

prior belief Pr(·) about <strong>the</strong> true state <strong>of</strong> <strong>the</strong> world; secondly, he has a belief about <strong>the</strong><br />

sender’s signaling strategy; and thirdly, he has a posterior belief about <strong>the</strong> true state after<br />

hearing a message. Posteriors should be derived by Bayesian update from <strong>the</strong> former two<br />

components, but also specify R’s beliefs after unexpected surprise messages. Taken to-<br />

ge<strong>the</strong>r, <strong>the</strong> set <strong>of</strong> relevant RECEIVER BELIEFS ΠR is <strong>the</strong> set <strong>of</strong> all triples 〈π 1 R , π2 R , π3 R<br />

〉 for<br />

which π1 R = Pr, π2 R ∈ S = (∆(M))T and π3 R ∈ (∆(T ))M such that for any t ∈ T and<br />

m ∈ M if π2 R (t, m) �= 0, <strong>the</strong>n:<br />

π 3 R(m, t) =<br />

π1 R (t) × π2 R (t, m)<br />

�<br />

t ′ ∈T π1 R (t′ ) × π2 R (t′ , m) .<br />

Given a sender belief ρ ∈ ΠS, say that σ is a BEST RESPONSE SIGNALING STRATEGY<br />

to belief ρ iff for all t ∈ T and m ∈ M we have:<br />

σ(t, m) �= 0 → m ∈ arg max<br />

m ′ �<br />

ρm<br />

∈M<br />

′(a) × US(m ′ , a, t)<br />

The set <strong>of</strong> all such best responses to belief ρ is denoted by S(ρ). Given a receiver belief<br />

πR ∈ ΠR say that ρ is a BEST RESPONSE STRATEGY to belief πR iff for all m ∈ M and<br />

a ∈ A we have:<br />

ρ(m, a) �= 0 → a ∈ arg max<br />

a ′ ∈A<br />

�<br />

t∈T<br />

a∈A<br />

π 3 R(m, t) × UR(m, a ′ , t)<br />

The set <strong>of</strong> all such best responses to belief πR is denoted by R(πR). Also, if Π ′ R ⊆ ΠR is<br />

a set <strong>of</strong> receiver beliefs, let R(Π ′ R R(πR).<br />

) = �<br />

πR∈Π ′ R<br />

3.3 Strategic Types and <strong>the</strong> IBR sequence<br />

In line with <strong>the</strong> Bounded Rationality Assumption <strong>of</strong> Section 3.1, I assume that senders<br />

and receivers are <strong>of</strong> different strategic types. Strategic types correspond to <strong>the</strong> level k <strong>of</strong><br />

strategic depth a player in <strong>the</strong> game performs (while believing she <strong>the</strong>reby outperfoms her<br />

opponent by exactly one step <strong>of</strong> reasoning). I will give an inductive definition <strong>of</strong> strategic<br />

types in terms <strong>of</strong> players beliefs, starting with a fixed strategy σ∗ 0 <strong>of</strong> S0. 8 Then, for any<br />

k ≥ 0, Rk is characterized by a belief set π∗ Rk ⊆ ΠR that S is a level-k sender and Sk+1 is<br />

characterized by a belief π∗ Sk+1 ∈ ΠS that R is a level-k receiver.<br />

I assume that S0 plays according to <strong>the</strong> signaling strategy σ∗ 0 which simply sends any<br />

true message with equal probability in all states. There need not be any belief to which<br />

this is a best response, as level-0 senders are (possibly irrational) dummies to implement<br />

<strong>the</strong> Focal Meaning Assumption. R0 <strong>the</strong>n believes that he is facing S0. With unique σ∗ 0,<br />

which sends all messages in M with positive probability (M is finite and contains no<br />

contradictions), R0 is characterized entirely by <strong>the</strong> unique belief π∗ Ro that S plays σ∗ 0.<br />

In general, Rk believes that he is facing a level-k sender. For k > 0, Sk is characterized<br />

by a belief π∗ Sk ∈ ΠS. Rk consequently believes that Sk plays a best response σk ∈<br />

S(π∗ Sk ) to this belief. We can leave this unrestricted and assume that Rk considers any<br />

) possible. But it will transpire that for an intuitively appealing analysis <strong>of</strong><br />

σk ∈ S(π ∗ Sk<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

8 I will write Sk and Rk to refer to a sender or receiver <strong>of</strong> strategic type k. Likewise, S t k<br />

<strong>of</strong> strategic type k and knowledge type t.<br />

70<br />

refers to a sender


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

message credibility we need to assume that Rk takes Sk to be truthful all else being equal<br />

(see also discussion in section 4). We implement <strong>the</strong> TCP Assumption <strong>of</strong> Section 3.1 as<br />

a restriction S∗ (π∗ Sk ) ⊆ S(π∗ ) on signaling strategies held possible by R. Of course,<br />

Sk<br />

even when restricted, <strong>the</strong>re need not be a unique signaling strategy here. As a general<br />

tie-break rule, assume <strong>the</strong> “principle <strong>of</strong> insufficient reason” that all σk ∈ S∗ (π∗ ) are Sk<br />

equiprobable to Rk. That means that Rk effectively believes that his opponent is playing<br />

response strategy<br />

σ ∗ �<br />

σ∈S<br />

k(t, m) =<br />

∗ (π∗ S ) σ(t, m)<br />

k .<br />

|S ∗ (π ∗ Sk )|<br />

This fixes Rk’s beliefs about <strong>the</strong> behavior <strong>of</strong> his opponent, but it need not fix Rk’s belief<br />

π 3 R about surprise messages. Since this matter is intricate and moreover Rk’s counterfactual<br />

beliefs do not play a crucial role in any examples discussed in this paper, I will not<br />

pursue this issue at all in this paper (but see also footnote 10 below). In general, let us<br />

and whose third<br />

say that Rk is characterized by any belief whose second component is σ∗ k<br />

component satisfies some (coherent, but possibly vacuous) assumption about <strong>the</strong> interpretation<br />

<strong>of</strong> surprise messages. Let, π∗ Rk ⊆ ΠR be <strong>the</strong> set <strong>of</strong> all such beliefs. Rk is <strong>the</strong>n fully<br />

characterized by π∗ Rk .<br />

In turn, Sk+1 believes that her opponent is a level-k receiver who plays a best response<br />

ρk ∈ R(π∗ Rk ). With <strong>the</strong> above tie-break rule Sk+1 is fully characterized by <strong>the</strong> belief<br />

3.4 Credibility and Inference<br />

ρ ∗ k(m, a) =<br />

�<br />

ρ∈R(π ∗ R k )<br />

ρ(m, a)<br />

|R(π∗ Rk )|<br />

.<br />

Define that a signal m is k-OPTIMAL in t iff σ∗ k+1 (t, m) �= 0. The set <strong>of</strong> k-optimal messages<br />

in t are all messages that Rk+1 believes St k+1 might send (thus taking <strong>the</strong> TCP<br />

Assumption into account). 9 Similarly, distill from R’s beliefs his INTERPRETATION-<br />

STRATEGY δ : M → P(T ) as given by belief πR: δπR (m) = {t ∈ T | π3 R (m, t) �= 0}.<br />

This simply is <strong>the</strong> support <strong>of</strong> <strong>the</strong> posterior beliefs <strong>of</strong> R after receiving message m. Let’s<br />

write δk for <strong>the</strong> interpretation strategy <strong>of</strong> a level-k receiver.<br />

For any k > 0, since Sk believes to face Rk−1 with interpretation strategy δk−1, wanting<br />

to send message m would intuitively count as an attempt to mislead if sent by St k just in<br />

case t �∈ δk−1(m). Such an attempt would moreover be untruthful if t �∈ [m]. While<br />

Rk−1 would be deceived, Rk would see through <strong>the</strong> attempted deception. From Rk’s<br />

point <strong>of</strong> view, who adheres to <strong>the</strong> TCP Assumption, a message m is incredible if it is<br />

k − 1-optimal in some t �∈ [m]. But <strong>the</strong>n Rk will include t in his interpretation <strong>of</strong><br />

m: recognizing a deceptive intention leads to pragmatic inference. In general, we should<br />

consider a message m credible unless some type t �∈ [m] would want to use m somewhere<br />

along <strong>the</strong> IBR sequence; precisely, m is CREDIBLE iff δk(m) ⊆ [m] for all k ≥ 0. 10<br />

9 Without <strong>the</strong> TCP Assumption, 0-optimality would be equivalent to <strong>the</strong> notion <strong>of</strong> an optimal assertion<br />

in Benz and van Rooij (2007).<br />

10 It may seem that messages which would not be sent by any type (after <strong>the</strong> first round or later) come out<br />

credible under this definition, which would not be a good prediction. (Thanks to Daniel Rothschild (p.c.) for<br />

pointing this out to me.) However, this is not quite right: we get into this predicament only for some versions<br />

<strong>of</strong> <strong>the</strong> IBR sequence, not for o<strong>the</strong>rs. It all depends on how <strong>the</strong> receiver forms his counterfactual beliefs. If,<br />

for instance, we assume that R rationalizes observed behavior even if it surprises him, we can keep <strong>the</strong><br />

71


4 Discussion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

a1 a2 m12 m3<br />

t1 1,1 0,0 √<br />

−<br />

t2 0,0 1,1 √<br />

√−<br />

t3 0,0 1,1 -<br />

G5: “White Lie”<br />

Pr(t) a1 a2 a3 m12 m23<br />

t1 1/8 1,1 0,0 0,0 √<br />

√ √−<br />

t2 3/4 0,0 1,1 0,0 √<br />

t2 1/8 0,0 0,0 1,1 −<br />

G6: “Some Game without a Name”<br />

The IBR model makes intuitively correct predictions about message credibility for <strong>the</strong><br />

games considered so far. In G1, R0 responds to msome with <strong>the</strong> appropriate action a∃¬∀,<br />

but still interprets δ0(msome) = {t∃¬∀, t∀}. In turn, R1 interprets as δ1(msome) = {t∃¬∀}; he<br />

has pragmatically enriched <strong>the</strong> semantic meaning by taking <strong>the</strong> sender’s pay<strong>of</strong>f structure<br />

and available messages into account. After one round a fixed-point is reached, with fully<br />

revealing credible signaling in accordance with intuition. In G2, IBR predicts that both<br />

S thigh<br />

1 and S tlow<br />

1 will use mhigh which is <strong>the</strong>refore not credible. In G3, also fully revealing<br />

communication is predicted and for G4 IBR predicts that all messages are credible for R0<br />

and R1, but not for R2, hence incredible as such. In general, <strong>the</strong> IBR model predicts that<br />

communication in games <strong>of</strong> pure coordination is always credible:<br />

Proposition 4.1. Take a signaling game with T = A and US,R(·, t, t ′ ) = c > 0 if t = t ′<br />

and 0 o<strong>the</strong>rwise. Then δk(m) ⊆ [m] for all k and m.<br />

Pro<strong>of</strong>. Clearly, δ0(m) ⊆ [m] for arbitrary m. So assume that δk(m) ⊆ [m]. In this case<br />

S t k+1 will use m only if t ∈ δk(m). But <strong>the</strong>n t ∈ [m] and <strong>the</strong>refore δk+1(m) ⊆ [m].<br />

However, <strong>the</strong> IBR model does not guarantee generally that communication is credible<br />

even when preferences are perfectly aligned, i.e., US = UR. This may seem surprising at<br />

first, but is due naturally to <strong>the</strong> possibility <strong>of</strong>, what we could call, white lies: untruthful<br />

signaling that is beneficial for <strong>the</strong> receiver. These may occur if <strong>the</strong> set <strong>of</strong> available signals<br />

is not expressive enough. As an easy example, consider G5 where St2 will use m3<br />

untruthfully to induce action a2, which, however, is best for both receiver and sender.<br />

To understand <strong>the</strong> central role <strong>of</strong> <strong>the</strong> TCP assumption in <strong>the</strong> present proposal, consider<br />

<strong>the</strong> game G6. In G6, R0 has <strong>the</strong> following posterior beliefs: after hearing message m12 he<br />

rules out t3 and believes that t2 is three times as likely as t1; similarly, after hearing message<br />

m23 he rules out t1 and believes that t2 is three times as likely as t3. Consequently,<br />

R0 responds to both signals with a2. Now, S t1<br />

1 , for instance, does not care which mes-<br />

sage to choose from, as far as her expected utilities are concerned. But R1 never<strong>the</strong>less<br />

assumes that S t1<br />

1 speaks truthfully. It’s thanks to <strong>the</strong> TCP Assumption that IBR predicts<br />

messages to be credible in this game.<br />

G6 also shows a difference between <strong>the</strong> IBR model and Rabin’s (1990) model <strong>of</strong> credible<br />

communication, which superficially look very similar. Rabin’s model consists <strong>of</strong> two<br />

components: <strong>the</strong> first component is a definition <strong>of</strong> message credibility which is almost a<br />

two-step iteration <strong>of</strong> best responses starting from <strong>the</strong> semantic meaning; <strong>the</strong> second component<br />

is iterated strict dominance around a fixed core set <strong>of</strong> Rabin-credible messages<br />

definition unchanged: if no type whatsoever has an outstanding reason to send m, <strong>the</strong> receiver’s posterior<br />

beliefs after m will support any type. So, unless m is tautologous, it is incredible. Still, Rothschild’s<br />

criticism is appropriate: <strong>the</strong> definition <strong>of</strong> message credibility <strong>of</strong>fered here is, in a sense, incomplete as long<br />

as we do not properly define <strong>the</strong> receiver’s counterfactual beliefs; something left for ano<strong>the</strong>r occasion.<br />

72


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

being sent truthfully and believed. In particular, Rabin requires for m to be credible that<br />

m induces, when taken literally, exactly <strong>the</strong> set <strong>of</strong> all sender-best actions (from <strong>the</strong> set <strong>of</strong><br />

actions that are inducible by some receiver belief) <strong>of</strong> all t ∈ [m]. This is defensible under<br />

<strong>the</strong> Rich Language Assumption, but both messages in G6 fail this requirement. Consequently,<br />

with no credible message to restrict iterated strict dominance, Rabin’s model<br />

predicts a total anything-goes for game G6. This shows <strong>the</strong> limited applicability <strong>of</strong> approaches<br />

to message credibility that are inseparable from <strong>the</strong> Rich Language Assumption.<br />

The present notion <strong>of</strong> message credibility and <strong>the</strong> IBR model are not restricted in this<br />

sense and fare well with (partial) inexpressibility and <strong>the</strong> resulting inferences.<br />

To wrap up: as a solution concept, <strong>the</strong> epistemic IBR model <strong>of</strong>fers, basically, a set <strong>of</strong><br />

beliefs, viz., beliefs obtained under certain assumptions about <strong>the</strong> psychology <strong>of</strong> agents<br />

from a sequence <strong>of</strong> iterated best responses. I do not claim that this model is a reasonable<br />

model for human reasoning in general. Certainly, <strong>the</strong> simplifying assumption that<br />

players believe that <strong>the</strong>y are facing a level-k opponent, and not possibly a level-l < k opponent,<br />

is highly implausible proportional to k, but especially so for agents that have, in<br />

a manner <strong>of</strong> speaking, already reasoned <strong>the</strong>mselves through a circle multiple times. (It is<br />

easily verified that for finite M and T <strong>the</strong> IBR sequence always enters a circle after some<br />

k ∈ N.) 11 Still, I wish to defend that <strong>the</strong> IBR model does capture (our intuitions about)<br />

certain aspects <strong>of</strong> (idealized) linguistic behavior, namely pragmatic inference in cooperative<br />

and non-cooperative situations. Whe<strong>the</strong>r it is a plausible model <strong>of</strong> belief formation<br />

and reasoning in <strong>the</strong> envisaged linguistic situations is ultimately an empirical question.<br />

In conclusion, <strong>the</strong> IBR model <strong>of</strong>fers a novel perspective on message credibility and<br />

<strong>the</strong> pragmatic inferences based on this notion. The model generalizes existing game<strong>the</strong>oretical<br />

models <strong>of</strong> pragmatic inference by taking conflicting interests into account. It<br />

also generalizes game-<strong>the</strong>oretic accounts <strong>of</strong> credibility by giving up <strong>the</strong> Rich Language<br />

Assumption. The explicitly epistemic perspective on agents’ deliberation assigns a natural<br />

place to semantic meaning in cheap-talk signaling games as a focal starting point. It also<br />

highlights <strong>the</strong> unity in pragmatic inference: in this model both credibility-based inferences<br />

and implicatures are different outcomes <strong>of</strong> <strong>the</strong> same reasoning process.<br />

Acknowledgements<br />

I’d like to thank Tikitu de Jager, Robert van Rooij, Daniel Rothschild, Marc Staudacher<br />

and three anonymous referees for insightful comments, help and discussion. I moreover<br />

benefited greatly from discussing with Gerhard Jäger an early version <strong>of</strong> his paper (Jäger,<br />

2008), which also defines and applies a general iterated best response model different<br />

from what I did here. Also, I am thankful to Sven Lauer for waking my interest by first<br />

explaining to me with enormous patience some puzzles about credibility that I did not<br />

fully understand at <strong>the</strong> time (see Lauer, 2007). Errors are my own.<br />

11 It is tempting to assume that “looping reasoners” may have an Aha-Erlebnis and to extend <strong>the</strong> IBR<br />

sequence by transfinite induction assuming, for instance, that level-ω players best respond to <strong>the</strong> belief<br />

that <strong>the</strong> IBR sequence is circling. I do not know whe<strong>the</strong>r this is necessary and/or desirable for linguistic<br />

applications. We should keep in mind though that in some cases human reasoners may not get to <strong>the</strong> ideal<br />

level <strong>of</strong> reasoning in this model and in o<strong>the</strong>rs <strong>the</strong>y might even go beyond it.<br />

73


References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Battigalli, P. (2006). Rationalization in signaling games: Theory and applications, International<br />

Game Theory Review 8(1): 67–93.<br />

Benz, A. and van Rooij, R. (2007). Optimal assertions and what <strong>the</strong>y implicate, Topoi<br />

26: 63–78.<br />

Camerer, C. F., Ho, T.-H. and Chong, J.-K. (2004). A cognitive hierarchy model <strong>of</strong> games,<br />

The Quarterly Journal <strong>of</strong> Economics 119(3): 861–898.<br />

Crawford, V. P. (2003). Lying for strategic advantage: Rational and boundedly rational<br />

misrepresentation <strong>of</strong> intentions, American Economic Review 93(1): <strong>13</strong>3–149.<br />

Crawford, V. P. (2007). Let’s talk it over: Coordination via preplay communication with<br />

level-k thinking. Unpublished Manuscript.<br />

Farrell, J. (1993). Meaning and credibility in cheap-talk games, Games and Economic<br />

Behavior 5: 514–531.<br />

Grice, P. H. (1989). Studies in <strong>the</strong> Ways <strong>of</strong> Words, Harvard University Press.<br />

Ho, T.-H., Camerer, C. and Weigelt, K. (1998). Iterated dominance and iterated best<br />

response in experimental “p-beauty contests”, The American Economic Review<br />

88(4): 947–969.<br />

Jäger, G. (2007). Game dynamics connects semantics and pragmatics, in A.-V. Pietarinen<br />

(ed.), Game Theory and Linguistic Meaning, Elsevier, pp. 89–102.<br />

Jäger, G. (2008). Game <strong>the</strong>ory in semantics and pragmatics. Manuscript, University <strong>of</strong><br />

Bielefeld.<br />

Katz, J. J. (1981). Language and O<strong>the</strong>r Abstract Objects, Basil Blackwell.<br />

Katzir, R. (2007). Structurally-defined alternatives. To appear in Linguistics and Philosophy.<br />

Lauer, S. (2007). Some kinds <strong>of</strong> deception do not occur: Credibility and <strong>the</strong> maxim <strong>of</strong><br />

sincerity. Unpublished Manuscript. Amsterdam, Stanford.<br />

Mat<strong>the</strong>ws, S. A., Okuno-Fujiwara, M. and Postlewaite, A. (1991). Refining cheap talk<br />

equilibria, Journal <strong>of</strong> Economic Theory 55: 247–273.<br />

Rabin, M. (1990). Communication between rational agents, Journal <strong>of</strong> Economic Theory<br />

51: 144–170.<br />

Stahl, D. O. and Wilson, P. W. (1995). On players’ models <strong>of</strong> o<strong>the</strong>r players: Theory and<br />

experimental evidence, Games and Economic Behavior 10: 218–254.<br />

Stalnaker, R. (2006). Saying and meaning, cheap talk and credibility, in A. Benz, G. Jäger<br />

and R. van Rooij (eds), Game Theory and Pragmatics, Palgrave MacMillan, pp. 83–<br />

100.<br />

74


TOWARDS A NEW CHARACTERISATION<br />

OF CHOMSKY'S HIERARCHY VIA ACCEPTANCE PROBABILITY<br />

Michael Hartwig<br />

Multimedia University, Cyberjaya, Malaysia<br />

Abstract. Researchers have recently studied <strong>the</strong> acceptance probability <strong>of</strong> P and<br />

NP languages hoping to find new ways <strong>of</strong> differentiating both classes. The paper<br />

outlines <strong>the</strong> authors findings related to <strong>the</strong> acceptance probability <strong>of</strong> regular and<br />

context-free languages, which we describe using <strong>the</strong> term <strong>of</strong> a difference shrinking<br />

chain. A first pro<strong>of</strong> technique, <strong>the</strong> inflating lemma, based on above results and able<br />

to separate higher languages from regular languages up to star height 1 as well as<br />

some incentives to apply those techniques to higher classes are given.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

“The major quest for <strong>the</strong> complexity <strong>the</strong>ory community is finding methods that may<br />

separate classes.” (Buhrmann & Torenvliet 2005) Although <strong>the</strong>re has been made an<br />

impressive progress recently within <strong>the</strong> area <strong>of</strong> complexity <strong>the</strong>ory <strong>the</strong> need for new,<br />

creative approaches that may result in methods that could be used to separate classes<br />

has not diminished and is nicely exemplified by <strong>the</strong> long outstanding P vs. NP problem.<br />

One <strong>of</strong> <strong>the</strong> recent approaches included <strong>the</strong> study <strong>of</strong> properties <strong>of</strong> <strong>the</strong> acceptance<br />

probability function <strong>of</strong> such languages, that is, <strong>the</strong> study <strong>of</strong> <strong>the</strong> form <strong>of</strong> <strong>the</strong> graph <strong>of</strong> <strong>the</strong><br />

function which takes as an argument a natural number n and returns <strong>the</strong> ratio between<br />

<strong>the</strong> number <strong>of</strong> accepted words <strong>of</strong> length n in <strong>the</strong> given language and all possible words<br />

<strong>of</strong> <strong>the</strong> same length. This study has lead to to many discoveries like <strong>the</strong> so called phase<br />

transition in <strong>the</strong> acceptance probability graph <strong>of</strong> NP complete problems (Clote &<br />

Kranakis 2002, Dubois et. al. 2000). There has been hope that if we were able to<br />

describe mentioned phase transition with more and more precision (Achlioptas et al.<br />

2001, Kirousis et al. 1998) we would <strong>the</strong>n also be able to separate P from NP.<br />

Unfortunately, this has not yet happened.<br />

Like o<strong>the</strong>r researchers we have <strong>the</strong>refore turned our attention to smaller classes<br />

like regular and context free languages first. Given such a language, we define <strong>the</strong><br />

density function dL(n) = |L ∩ Σ n | counting <strong>the</strong> number <strong>of</strong> words <strong>of</strong> length n in L. The<br />

study <strong>of</strong> <strong>the</strong> density <strong>of</strong> regular languages has a longer history (Schützenberger 1962,<br />

Eilenberg 1974, Rozenberg et al. 1997, Bodirsky at al. 2004). Languages with a density<br />

function that can be bounded from above by a polynomial (i.e. <strong>the</strong>re exists a polynomial<br />

p(x) such that dL(n) ≤ p(n)) are called sparse. If on <strong>the</strong> o<strong>the</strong>r hand <strong>the</strong>re exists a real<br />

number h > 1 such that dL(n) ≥ h n for infinitely many n ≥ 0 <strong>the</strong>n L is called dense<br />

(Demain 2003, Krieger 2007). Notice that <strong>the</strong> language a*b* is a sparse language, while<br />

<strong>the</strong> language that includes all words over a binary alphabet that start with <strong>the</strong> letter a<br />

(i.e. a(a+b)*) is dense. As described in (Szilard et al. 1992, Rozenberg et al. 1997) a<br />

regular language is sparse “if and only if it can be represented as a finite union <strong>of</strong><br />

regular expressions <strong>of</strong> <strong>the</strong> form xy1*z1...ym*zm, where x, y1, z1, ..., ym, zm are all strings in<br />

Σ*”. Such regular languages are also called SLRE and equivalent to bounded regular<br />

75


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

languages (Habermehl et al. 2000). Never<strong>the</strong>less, it is not difficult to see that <strong>the</strong><br />

majority <strong>of</strong> all regular languages are dense. (Flajolet 1987) demonstrated that a regular<br />

language is ei<strong>the</strong>r sparse or dense, which was recently generalized to context-free<br />

languages (Ilie 2000, Incitti 2000). While it is interesting in its own right to study such<br />

properties, (Demaine et al. 2003) could show that only sparse regular languages have<br />

<strong>the</strong> power to restrict NP complete problems such that <strong>the</strong>y are polynomially solvable. In<br />

o<strong>the</strong>r words, that <strong>the</strong> intersection <strong>of</strong> such a regular language with an NP complete<br />

problems results in a language from P. (Eisman et al. 2005) proposed ano<strong>the</strong>r<br />

application by stating that <strong>the</strong> density function could be used in some application areas<br />

such as streaming algorithms, where “rapid computation must be performed (<strong>of</strong>ten in a<br />

single pass)”.<br />

Still we feel that it is <strong>of</strong>ten more interesting to study <strong>the</strong> acceptance probability<br />

Acc(L, n) = |L ∩ Σ n | / |Σ n | <strong>of</strong> a given language ra<strong>the</strong>r than its density, that is <strong>the</strong> ratio<br />

between <strong>the</strong> number <strong>of</strong> accepted words and all possible words <strong>of</strong> a given length. As<br />

mentioned above, a(a+b)* has exponential density but it has only stable acceptance<br />

probability as Acc(a(a+b)*, n) = 0.5, which seems to describe <strong>the</strong> quantity <strong>of</strong> accepted<br />

words more appropriately. Secondly, such a different view allows us to combine both<br />

sparse and dense languages and study common properties. In (Hartwig et al. 2006a,<br />

Hartwig et al. 2006b) we could show that <strong>the</strong> acceptance probability graph is indeed<br />

expressive enough to separate complexity classes making it an acceptable candidate in<br />

above mentioned quest. The objective in using such properties to separate mentioned<br />

classes is hereby to familiarize ourselves with properties, techniques, applications and<br />

aimed at getting a better understanding <strong>of</strong> possible uses <strong>of</strong> acceptance probability<br />

graphs in higher classes. In (Hartwig et al. 2006a) we described <strong>the</strong> acceptance<br />

probability <strong>of</strong> very low regular languages and in (Hartwig et al. 2006b) we presented a<br />

pro<strong>of</strong> technique (<strong>the</strong> inflating lemma) that is powerful enough to separate many higher<br />

languages from regular languages up to star height 1 and can be compared with <strong>the</strong> well<br />

known pumping lemma (Sisper 1997) 1 .<br />

Inflating Lemma If L ∈ REG(1) and L has increasing acceptance probability <strong>the</strong>n<br />

<strong>the</strong>re exist a length n0 and natural number k ≥ 1 such that for all w ∈ L with |w| ≥ n0:<br />

w = pr ∈ L → p(Σ k )*r ⊆ L.<br />

An example application would be <strong>the</strong> following pro<strong>of</strong>.<br />

Example (MAJORITY does not belong to REG(1)) L = {w | w ∈ Σ* and w has more<br />

(or equal) a than b}∉ REG(1).<br />

Pro<strong>of</strong>. Acc(L, n) is constantly increasing; hence <strong>the</strong> inflating lemma can be applied. But<br />

none <strong>of</strong> <strong>the</strong> words accepted can be inflated. We could take any word and position and<br />

insert (or: inflate with) as many b’s as needed until <strong>the</strong> word has more b’s than a’s.<br />

□<br />

1 Although <strong>the</strong> inflating lemma seems to have only limited applicability <strong>the</strong> following work suggests<br />

that every regular language has ei<strong>the</strong>r increasing, stable or decreasing chains. Fur<strong>the</strong>rmore, if L is regular<br />

and <strong>of</strong> decreasing acceptance probability, <strong>the</strong>n <strong>the</strong> lemma could be applied to <strong>the</strong> complement <strong>of</strong> L.<br />

76


The following paper continues this work by providing an overview on <strong>the</strong> status <strong>of</strong> our<br />

work on <strong>the</strong> acceptance probability <strong>of</strong> regular and context free languages over binary<br />

alphabets claiming that both classes have acceptance probability graphs that can be split<br />

into ei<strong>the</strong>r increasing, decreasing or stable chains with a decreasing (or shrinking)<br />

difference. We think that <strong>the</strong> minimal number <strong>of</strong> mentioned chains should be studied in<br />

more detail and put into a relationship to <strong>the</strong> size <strong>of</strong> any program or machine accepting<br />

<strong>the</strong> language. Knowing that NP complete problems exhibit phase transitions in <strong>the</strong>ir<br />

acceptance probability graphs switching from difference shrinking to difference<br />

increasing sections and vice versa we believe that techniques making use <strong>of</strong> those<br />

properties may contribute to <strong>the</strong> separation <strong>of</strong> higher classes, too.<br />

2 Preliminaries<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We use <strong>the</strong> following definitions: The alphabet for all strings is Σ = {a, b}. The length<br />

<strong>of</strong> a string w is given by |w|, all sets L1, L2,.. are considered subsets <strong>of</strong> Σ*. A regular<br />

expression e over Σ is built from all symbols in Σ, <strong>the</strong> symbol λ, <strong>the</strong> binary operators +,<br />

· and <strong>the</strong> unary operator *. The language specified by a regular expression is denoted by<br />

L(e) and is referred to as a regular language (Kleene 1956, Kulloch et al. 1943). We call<br />

a regular expression to be unambiguous (or non overlapping) if and only if its<br />

corresponding NFA is unambiguous. “An NFA is called unambiguous if for each word<br />

w <strong>the</strong>re is at most one path from <strong>the</strong> initial state to a final state that spells out w.”<br />

(Bruggemann-Klein et al. 2007, Moreira et al. 2005)) It is important to know that all<br />

regular languages are unambiguous (Giammarresi et al. 2001) and can henceforward be<br />

described by an unambiguous regular expression. sh(e) computes <strong>the</strong> star height <strong>of</strong> a<br />

regular expression and REG(1) specifies all regular languages having a star height <strong>of</strong> 1<br />

or less.<br />

As mentioned in <strong>the</strong> introduction, <strong>the</strong> density <strong>of</strong> a language counts <strong>the</strong> number <strong>of</strong><br />

accepted words per given length and is defined as<br />

dL(n) = |L ∩ Σ n |,<br />

while <strong>the</strong> acceptance probability <strong>of</strong> a language is defined as <strong>the</strong> ratio between <strong>the</strong><br />

number <strong>of</strong> accepted words dL(n) and <strong>the</strong> number <strong>of</strong> all words <strong>of</strong> a given length,<br />

Acc(L, n) = |L ∩ Σ n | / |Σ n |.<br />

3 Regular acceptance probability<br />

3.1 Low regular languages<br />

Describing <strong>the</strong> acceptance probability <strong>of</strong> a finite language is straightforward.<br />

Lemma (Finite Languages) For any finite language L: Acc(L, n) = O(0).<br />

Pro<strong>of</strong>. If L is finite <strong>the</strong>n <strong>the</strong>re exists a length after which no word is accepted by <strong>the</strong><br />

language. The acceptance probability reaches 0.<br />

77


□<br />

Regular languages which can be described by a regular expression having star height<br />

0 or at most one expression using <strong>the</strong> star operator and being <strong>of</strong> <strong>the</strong> form (a+b)* have<br />

constant acceptance probability.<br />

Lemma (Simple Regular Languages) If L = w1(a+b)*w2 with w1, w2 words <strong>the</strong>re exist<br />

a constant c such that:<br />

Acc(L, n) = O(c).<br />

Pro<strong>of</strong>. The smallest accepted word <strong>of</strong> <strong>the</strong> language L is <strong>of</strong> length |w| = |w1| + |w2|. As<br />

<strong>the</strong>re is only one such smallest word, Acc(L, |w|) = 1/2 |w| = c. For any length n greater<br />

than |w| we can say that dL(n) = 2 · dL(n-1). Henceforward <strong>the</strong> acceptance ratio<br />

remains stable.<br />

□<br />

It is <strong>the</strong>n not difficult to see that also any unification <strong>of</strong> simple regular languages (in<br />

<strong>the</strong> above sense) will again only yield a language with constant acceptance<br />

probability.<br />

1<br />

0 . 8<br />

0 . 6<br />

0 . 4<br />

0 . 2<br />

0<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

0 1 2 3 4 5 6 7 8<br />

Figure 1. Acceptance probability graphs <strong>of</strong> low regular languages.<br />

Left L1 = {a, aba} (finite), right L2 = ab(a+b)*.<br />

3.2 Regular languages having one star<br />

Languages built upon regular expressions using <strong>the</strong> star operator at most once include<br />

also languages with a decreasing acceptance probability, if <strong>the</strong> expression under <strong>the</strong> star<br />

is not entirely composed <strong>of</strong> (a+b)* expressions. The length <strong>of</strong> <strong>the</strong> expression under <strong>the</strong><br />

star defines <strong>the</strong> step width d decomposing <strong>the</strong> acceptance probability graph into d<br />

chains. We will have d-1 chains with <strong>the</strong> acceptance probability O(0) and one chain<br />

being ei<strong>the</strong>r stable or decreasing.<br />

1<br />

0 . 8<br />

0 . 6<br />

0 . 4<br />

0 . 2<br />

0<br />

1<br />

0 . 8<br />

0 . 6<br />

0 . 4<br />

0 . 2<br />

0<br />

0 1 2 3 4 5 6 7 8<br />

0 1 2 3 4 5 6 7 8<br />

Figure 2. Acceptance probability graph <strong>of</strong> L3 =b(ba)*. L3 has a step width <strong>of</strong> 2 with one chain being<br />

stable (dL3(0) = dL3(2) = ... = 0), while <strong>the</strong> remaining elements belong to a chain with its peaks<br />

constantly decreasing by ¾.<br />

78


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Lemma (Regular Languages with One Star) For any regular language L = w1w2*w3 with w1,<br />

w3 words and sh(w2) = 0 <strong>the</strong>re exists a minimal length n0 such that for all n > n0:<br />

Acc(L, n) ≤ Acc(L, n-|w2|).<br />

Pro<strong>of</strong>. The length d = |w2| is usually referred to as a step width for this language touching <strong>the</strong><br />

peaks <strong>of</strong> <strong>the</strong> acceptance probability graph. The number <strong>of</strong> accepted words <strong>of</strong> any length can be<br />

traced back to <strong>the</strong> number <strong>of</strong> accepted words with length n-|w2| as we can apply <strong>the</strong> word under<br />

<strong>the</strong> star. Henceforward, Acc(L, n) = c · Acc(L, n-|w2|). c is easily determined from w2 and <strong>the</strong><br />

fact that <strong>the</strong> chains are ei<strong>the</strong>r decreasing or stable is obvious and follows also directly from <strong>the</strong><br />

inflating lemma.<br />

□<br />

3.3 Regular languages up to star height 1<br />

Regular languages up to star height 1 provide already a wide range <strong>of</strong> different<br />

acceptance probability graphs.<br />

Lemma (Regular Languages up to Star Height 1) If L ∈ REG(1) <strong>the</strong>n <strong>the</strong>re exists<br />

constants s, um, and vm such that:<br />

dL(s) = u0,<br />

dL(s+1) = u1,<br />

... ,<br />

dL(s+m) = um<br />

dL(n) = u1dL(n-v1) + u2dL(n-v2) + .. + umdL(n-vm)<br />

Pro<strong>of</strong>. (Sketch) See (Hartwig 2008) for <strong>the</strong> complete pro<strong>of</strong>. If L ∈ REG(1) <strong>the</strong>n L has an<br />

unambiguous regular expression <strong>of</strong> <strong>the</strong> following form:<br />

L = L1 + L2 + ... + Lk<br />

where Li = Ri0Ri1...Rit with sh(Rij) ≤ 1<br />

Calculating <strong>the</strong> number <strong>of</strong> accepted words for each Li is done successively starting from<br />

left. The number <strong>of</strong> accepted words <strong>of</strong> length n for Ri0 can be determined from <strong>the</strong><br />

length's <strong>of</strong> all expressions under <strong>the</strong> star. For example, let<br />

L4 = b (aa + bbb)* b (ab + bba)* b<br />

we would have R4,0 = (aa + bbb)* and L4,1 = (ab + bba)*, which would give us for R4,0:<br />

dR4,0(3) = 1 // as |b| + |b| + |b| = 3<br />

dR4,0(n) = dR4,0(n-|aa|) + dR4,0(n-|bbb|)<br />

= dR4,0(n-2) + dR4,0(n-3)<br />

This process continues until <strong>the</strong> last expression within Li is reached consequently<br />

adding all <strong>the</strong> accepted words <strong>of</strong> formerly considered components.<br />

79


dR4,1(n) = dR4,1(n-|ab|) + dR4,1(n-|bba|) + No_acc_words_for_R4,0<br />

= dR4,1(n-2) + dR4,1(n-3) + dR4,0(n)<br />

And this would give us in our (simple) case,<br />

dL4(n) = dR4,1(n)<br />

Above result (here depending on R4,0 and R4,1) could <strong>the</strong>n be converted into a recursive<br />

formula referring only to itself and obeying <strong>the</strong> requirements. In <strong>the</strong> example case,<br />

dL4(3) = 1,<br />

dL4(n) = 2dL4(n-2) +2dL4(n-3) - dL4(n-4) - 2dL4(n-5) – dL4(n-6).<br />

0.20<br />

0.16<br />

0.12<br />

0.08<br />

0.04<br />

0.00<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

1 2 3 4 5 6 7 8<br />

1 2 3 4 5 6 7 8<br />

Figure 3. Acceptance probability graphs <strong>of</strong> higher regular languages up to star height 1. Left<br />

L4 = b (aa + bbb)* b (ab + bba)* b) from above example,<br />

right L5 =a (a+b)* + (b + ba)*) with a union operator also outside <strong>the</strong> star.<br />

To describe <strong>the</strong> acceptance probability graphs <strong>of</strong> such regular and higher languages we<br />

introduced <strong>the</strong> term <strong>of</strong> a difference shrinking chain.<br />

Definition (Difference Shrinking Chain) We call a language to have a difference<br />

shrinking chain, if <strong>the</strong>re exists a step width d and length n0 such that for all i ≥ 0:<br />

|Acc(L, n0+(i+2)d) - Acc(L, n0+(i+1)d)| ≤ |Acc(L, n0+(i+1)d) - Acc(L, n0+i·d)|<br />

Δ 1<br />

Figure 4. An example language with only difference shrinking chains. A chain is called<br />

difference shrinking if for such a chain and any length n <strong>the</strong> speed <strong>of</strong> <strong>the</strong> increase (or<br />

decrease) slows constantly, i.e. Δ2 ≤ Δ1.<br />

We call a language to be difference shrinking, if <strong>the</strong>re exists a step width d ≥ 1<br />

decomposing <strong>the</strong> acceptance probability graph into d difference shrinking chains. We<br />

call a language to be a regular increasing language, if it can be decomposed into at<br />

least one increasing and 0 or more stable chains. (Regular decreasing languages are<br />

defined in a similar way.) A language is fur<strong>the</strong>rmore called strongly increasing if<br />

80<br />

1.00<br />

0.80<br />

0.60<br />

0.40<br />

0.20<br />

0.00<br />

Δ 2<br />


only one increasing chain completely describes <strong>the</strong> graph. While similar concepts<br />

apply to strongly decreasing languages, such languages are also called to have<br />

monotone acceptance probability.<br />

Lemma (Star Height 1 Languages are Difference Shrinking) If L ∈ REG(1), <strong>the</strong>n L<br />

is difference shrinking.<br />

Pro<strong>of</strong>. See (Hartwig 2008).<br />

The pro<strong>of</strong> includes an algorithm that is able to compute for any given regular language<br />

a step width, which might not be <strong>the</strong> minimal step width but which is decomposing <strong>the</strong><br />

language's acceptance probability graph into such difference shrinking chains. It is not<br />

difficult to see that most <strong>of</strong> <strong>the</strong> regular languages up to star height 1 have also only<br />

monotone chains and we claim that it is also true for <strong>the</strong> languages left out.<br />

4 The acceptance probability <strong>of</strong> context free languages<br />

Calculating <strong>the</strong> number <strong>of</strong> accepted words <strong>of</strong> a regular language with a star height <strong>of</strong> 2<br />

or higher seems to require a different approach. Let L6 = (w1*w2*)*, we could <strong>the</strong>n<br />

compute accepted words <strong>of</strong> length n as follows: dL6(n) = dL6(1)*dL6(n-1) +<br />

dL6(2)*dL6(n-2) + ... A word <strong>of</strong> length n is a composition <strong>of</strong> an accepted word <strong>of</strong><br />

length c ≤ n from w1 and an accepted word <strong>of</strong> length n-c from w2. Surprisingly <strong>the</strong><br />

same approach will also work in <strong>the</strong> calculation <strong>of</strong> <strong>the</strong> acceptance probability <strong>of</strong> a<br />

context-free language as <strong>the</strong> following examples suggest.<br />

Example. Let G1 be <strong>the</strong> following grammar:<br />

S => SaN | a<br />

N => bN | bb<br />

We could compute <strong>the</strong> number <strong>of</strong> accepted words that are derived from each <strong>of</strong> <strong>the</strong> given non<br />

terminals. The rule S => SaN specifies that a terminal word can be constructed from any<br />

smaller word from S and N as long as <strong>the</strong> sum <strong>of</strong> <strong>the</strong>ir length's equals n-1. (n-1, because <strong>the</strong><br />

letter a makes up <strong>the</strong> one place.) This would bring us to <strong>the</strong> following:<br />

dS(1) = 1<br />

dN(2) = 1<br />

n_1<br />

dS(n) = !<br />

i=0<br />

dN(n) = dN(n-1)<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

_d S_i_d N _n_1_ i _<br />

Having S as <strong>the</strong> start symbol, we can calculate <strong>the</strong> number <strong>of</strong> accepted words for <strong>the</strong> given<br />

grammar with dG1(n) = dS(n). Being also a regular language (a(abbb + )*), <strong>the</strong> number <strong>of</strong><br />

accepted words could also be calculated with ds(1) = 1, ds(n) = ds(n-1) + ds(n-3)<br />

following thoughts from <strong>the</strong> previous chapters.<br />

Example. Let G2 be <strong>the</strong> following grammar:<br />

81


S => aSb | ab<br />

Although being a truly context-free language, calculating <strong>the</strong> language's density<br />

remains quite simple and suggests that <strong>the</strong> acceptance probability <strong>of</strong> all context-free<br />

languages can completely be described with a form similar to <strong>the</strong> one presented for<br />

star-height 1 languages.<br />

dS(2) = 1<br />

dS(n) = dS(n-2)<br />

Above examples and referring to <strong>the</strong> Chomsky-Schutzenberger Theorem stating that<br />

for every context free language and PDA M = (Q, Σ, Γ, δ, q0, Z0, F) <strong>the</strong>re is a regular<br />

language R, <strong>the</strong> Dyck set D2 and two homomorphisms g, h such that L(M) = h(g −1 (D2)<br />

∩ R) we <strong>the</strong>n claim that context-free languages are equally difference shrinking and<br />

monotone.<br />

While we can foresee challenges in <strong>the</strong> use <strong>of</strong> our results related to higher<br />

classes in <strong>the</strong> construction <strong>of</strong> new pro<strong>of</strong> techniques, <strong>the</strong> long outstanding P vs NP<br />

problem should provide enough incentives to make an attempt. The phase transition<br />

that such NP complete problems exhibit, is only possible because <strong>the</strong> language's<br />

acceptance probability switches from sections being difference shrinking to difference<br />

increasing as shown in <strong>the</strong> example below.<br />

Figure 6. Example languages from NP complete having acceptance probability graphs with<br />

sections <strong>of</strong> increasing difference (some <strong>of</strong> <strong>the</strong>m indicated).<br />

We think that finding <strong>the</strong> minimal step width for a given language would help in <strong>the</strong><br />

search for new pro<strong>of</strong> techniques. As mentioned earlier, <strong>the</strong> minimal step width should<br />

indicate more properties related to <strong>the</strong> complexity <strong>of</strong> accepting <strong>the</strong> language.<br />

5 Conclusions<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We have given a first overview related to a new attempt in characterizing classes from<br />

<strong>the</strong> Chomsky Hierarchy using properties derived from <strong>the</strong> language's acceptance<br />

probability graphs. Regular languages up to star height 1 have <strong>the</strong>refore graphs that<br />

can be split into difference shrinking chains. Current research suggests that this holds<br />

also for context-free languages. Knowing that NP complete languages usually have<br />

graphs performing a phase transition between difference shrinking and difference<br />

increasing sections, we recommended fur<strong>the</strong>r work. Especially <strong>the</strong> problem <strong>of</strong> finding<br />

82


<strong>the</strong> minimal step width seems to be crucial in <strong>the</strong> construction <strong>of</strong> new pro<strong>of</strong><br />

techniques.<br />

Class Acceptance Probability Properties<br />

finite Acc(L, n) = 0<br />

simple<br />

regular<br />

Acc(L, n) = 2 dL(n-1) / 2 n<br />

one star Acc(L, n) = c dL(n-d) / 2 n<br />

star height 1 Acc(L, n) = [ u1dL(n-v1)<br />

+ u2dL(n-v2)<br />

+ ...<br />

+ umdL(n-vm) ] / 2 n<br />

regular,<br />

n_d<br />

context-free Acc(L, n) = ! _d S_i_d N _n_ d _i_...._/ 2<br />

i=0<br />

n<br />

context<br />

sensitive<br />

convergent to 0<br />

convergent to a<br />

constant (stable)<br />

as above & at most one<br />

decreasing chain<br />

monotone 2 , difference<br />

shrinking chains<br />

monotone, difference<br />

shrinking chains 3<br />

Acc(L, n) = ? ? as above & difference<br />

increasing chains, non<br />

monotonic chains<br />

Table 1. Acceptance probability <strong>of</strong> different classes from <strong>the</strong> Chomsky Hierarchy (state <strong>of</strong><br />

<strong>the</strong> art, <strong>the</strong> class <strong>of</strong> context free languages is currently looked at).<br />

Acknowledgments<br />

We'd like to thank <strong>the</strong> anonymous referees for <strong>the</strong>ir comments.<br />

References<br />

H. Buhrmann & L. Torenvliet (2005). 'A Post's Program for Complexity Theory', BEATCS 85<br />

(pp. 41-51)<br />

P. Clote & E. Kranakis (2002). 'Boolean Functions and Computation Models', Springer,<br />

M. Hartwig et al. (2006a). 'In Search <strong>of</strong> a New Pro<strong>of</strong> Technique', M2USIC06<br />

M. Hartwig et al. (2006b). 'Proving Non Regularity using Acceptance Probability Techniques',<br />

CSCM2006<br />

A. Bruggemann-Klein & R. Mesing. (2007). 'Regular Expressions into Finite Automata,<br />

http://webcourse.cs.technion.ac.il/236826/Spring2005/ho/WCFiles/RegularExpressions into Finite<br />

Automata.doc<br />

D. Giammarresi, R. Montalbano, D. Wood (2001). 'Block-Deterministic Languages',<br />

ICTCS01<br />

M. Sisper (1997). 'Introduction to <strong>the</strong> Theory <strong>of</strong> Computation', PWS Publishing Company (pp.<br />

2 Claimed for some languages.<br />

3 Claimed.<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

83


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

63ff.)<br />

O. Dubois et al. (2000). 'Typical Random 3-SAT Formulae and <strong>the</strong> Satisfiability Threshold',<br />

SODA '00 (pp. 126-127)<br />

D. Achlioptas et al. (2001). 'The Phase Transition in 1-in-k SAT and NAE 3-SAT', SODA '01<br />

(pp. 721-722)<br />

L. Kirousis et al. (1998). 'Approximating <strong>the</strong> unsatisfiability threshold <strong>of</strong> random formulas',<br />

Random Structures and Algorithms 12(3) (pp. 253-269)<br />

D. Achlioptas et al. (2001). 'A Sharp Threshold yields in Pro<strong>of</strong> Complexity Yields a<br />

Lower Bound for Satisfiability Search', Journal <strong>of</strong> Comp. & Sys. Sci. 68 (2)<br />

M. Hartwig (2008), 'Regular Languages up to Star Height 1 have Difference Shrinking<br />

Acceptance Probability', TMFCS-08<br />

M. Bodirsky et al. (2004), 'Efficiently computing <strong>the</strong> density <strong>of</strong> regular languages', <strong>Proceedings</strong><br />

<strong>of</strong> Latin American INformatics (LATIN'04), pages 262-270, Buenos Aires<br />

M.P. Schützenberger (1962), 'Finite counting automata', Information and Control 5(2), 91-107<br />

S. Eilenberg (1974), 'Auomata, Languages, and Machines', Academic Press, Inc., Orlando,<br />

Florida, USA<br />

A. Szilard et al.(1992), 'Characterizing Regular Languages with Polynomial Densities', Lecture<br />

Notes in Computer Science, Volume 629, Springer, 494-503<br />

G. Rozenberg et al. (1997), 'Handbook <strong>of</strong> Formal Languages', Chapter 2: Regular Languages,<br />

Springer<br />

E. D. Demaine et al. (2003), 'On Universally Easy Classes for NP-complete Problems',<br />

Theoretical Computer Science, Vol. 304, pages 471-476<br />

D. Krieger et al. (2007), 'Finding <strong>the</strong> Growth Rate <strong>of</strong> a Regular Language in Polynomial Time',<br />

CoRR abs/0711.4990<br />

P. Habermehl et al. (2000), 'A Note on SLRE', http://citeseer.ist.psu.edu/375870.html<br />

P. Flajolet (1987), 'Analytic Models and Ambiguity <strong>of</strong> Context-Free Languages', TCS, 49:283-<br />

309<br />

L. Ilie et al. (2000), 'A Characterization <strong>of</strong> Polyslender Context-Free Languages', Theoret.<br />

Informatics Appl., 34(1):77-86<br />

R. Incitti (2000), 'The Growth Function <strong>of</strong> Context-Free Languages', Theoretical Computer<br />

Science, 255:601-605<br />

G. Eisman et al. (2005), 'Approximate Recognition <strong>of</strong> Non-regular Languages by Finite<br />

Automata', <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Twenty-Eighth Australasian Computer Science Conference<br />

(ACSC2005), Newcastle, Australia<br />

S. Kleene (1956), 'Representation <strong>of</strong> events in nerve nets and finite automata', Automata<br />

Studies, Princeton University Press, Princeton, USA, 3-42<br />

W. S. Kulloch et al. (1943), 'A logical calculus <strong>of</strong> <strong>the</strong> ideas immanent in <strong>the</strong> nervous activity',<br />

Bull. Math. Biophys, 5:115-<strong>13</strong>3<br />

N. Moreira et al. (2005), 'On <strong>the</strong> Density <strong>of</strong> Languages Representing Finite Set Partitions',<br />

Journal <strong>of</strong> Integer Sequences, Vol. 8<br />

84


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

DISTANCE EFFECTS IN SENTENCE PROCESSING<br />

Simon Hopp<br />

University <strong>of</strong> Konstanz<br />

Abstract. This paper reports results from two experiments investigating distance<br />

effects in sentence processing. It is well known that <strong>the</strong> processing difficulty <strong>of</strong><br />

dependency relation increases with <strong>the</strong> distance between <strong>the</strong> two items concerned.<br />

The paper addresses <strong>the</strong> question what exactly determines ‘distance’: Time or<br />

amount <strong>of</strong> linguistic material between <strong>the</strong> first and <strong>the</strong> second item. Experiment 1<br />

disentangles <strong>the</strong>se factors and suggests that linguistic material is <strong>the</strong> source <strong>of</strong><br />

difficulty. Experiment 2 investigates <strong>the</strong> role <strong>of</strong> <strong>the</strong> characteristics <strong>of</strong> that<br />

intervening material. The logic <strong>of</strong> this experiment is based on Gibson’s (2000)<br />

claim that <strong>the</strong> ease <strong>of</strong> integrating a word into <strong>the</strong> CPPM decreases with <strong>the</strong> number<br />

<strong>of</strong> newly introduced discourse referents. In particular, experiment 2 asks whe<strong>the</strong>r<br />

adverbials which do not introduce new discourse referents have <strong>the</strong> same effect.<br />

The results indicate that while intervening discourse referents elicit <strong>the</strong> expected<br />

effect, adverbials do not show any effect at all.<br />

1 Working Memory and Sentence Processing<br />

In cognitive science <strong>the</strong>re is a broad agreement that a certain kind <strong>of</strong> store is necessary<br />

for all kinds <strong>of</strong> complex cognitive tasks such as mental arithmetic or language<br />

processing. The following example (cf. Gibson 2000) illustrates <strong>the</strong> need for a short<br />

term store in sentence processing.<br />

(1) The reporter [that <strong>the</strong> senator attacked] admitted <strong>the</strong> error.<br />

In (1) <strong>the</strong> short term store (or working memory) has to keep <strong>the</strong> determiner phrase (DP)<br />

<strong>the</strong> reporter active over <strong>the</strong> period <strong>of</strong> time in which <strong>the</strong> relative clause is processed, to<br />

ensure that <strong>the</strong> human sentence parser is able to link <strong>the</strong> DP to <strong>the</strong> verb admitted and<br />

<strong>the</strong>n to check <strong>the</strong> grammatical features correctly. Since sentences can contain several<br />

dependencies between items and <strong>the</strong>se items can be separated by fur<strong>the</strong>r items, storing<br />

linguistic information over a short time is a basic requirement for sentence processing.<br />

As has long been noticed in linguistic <strong>the</strong>ory, sentences like (1) <strong>of</strong>ten lead to processing<br />

difficulties (e.g. Just & Carpenter 1992). One <strong>of</strong> <strong>the</strong> reasons for this fact is <strong>the</strong> distance<br />

between <strong>the</strong> linguistic items dependent on each o<strong>the</strong>r. It seems that integrating a word w<br />

into <strong>the</strong> CPPM (Current Partial Phrase Marker) is <strong>of</strong>ten adversely affected by <strong>the</strong><br />

distance between w and information within <strong>the</strong> CPPM necessary for integrating w.<br />

However, it is still unclear why prior pieces in <strong>the</strong> CPPM are difficult to retrieve at later<br />

points. There are two prominent mechanisms that are said to be responsible for<br />

forgetting over a short term: The amount <strong>of</strong> time that passes between two items and<br />

linguistic material that has to be processed between two items. According to time-based<br />

decay earlier information might already have faded away at <strong>the</strong> point when it is needed<br />

again. In current models <strong>of</strong> working memory involvement in sentence processing, timebased<br />

decay ei<strong>the</strong>r plays a decisive role (e.g., Lewis & Vasishth 2005) or is taken as one<br />

possible candidate for contributing to <strong>the</strong> cost <strong>of</strong> integrating a word into <strong>the</strong> sentence<br />

85


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(Levy et al. 2007). The alternatives to <strong>the</strong>ories <strong>of</strong> time-based decay are event-based<br />

models (cf. Lewandowsky et al. 2004). Those models admit that forgetting in working<br />

memory is observed over time, but <strong>the</strong>y predict that time is not <strong>the</strong> crucial factor for this<br />

phenomenon. Some event-based models argue that it is ra<strong>the</strong>r interference <strong>of</strong> linguistic<br />

material that leads to processing difficulties (eg. Nairne 1990) Items that have already<br />

been processed may be forgotten by <strong>the</strong> time <strong>the</strong>y are needed again, because new<br />

incoming material interferes. Clarifying <strong>the</strong> role <strong>of</strong> time-based decay versus<br />

interference-based forgetting is complicated because normally amount <strong>of</strong> linguistic<br />

material and amount <strong>of</strong> time are confounded.<br />

2 Case Checking as a Test Case<br />

In this paper, I present two experiments that were run to investigate <strong>the</strong> nature <strong>of</strong><br />

forgetting in working memory. 1 The focus was on <strong>the</strong> process <strong>of</strong> linking and checking<br />

in German verb-final clauses adhering to <strong>the</strong> scheme in (2). When integrating <strong>the</strong> verb<br />

in clause-final position, <strong>the</strong> case <strong>of</strong> NP must be retrieved until <strong>the</strong> end <strong>of</strong> <strong>the</strong> sentence in<br />

order to check it against <strong>the</strong> case feature <strong>of</strong> <strong>the</strong> verb.<br />

(2) .. dass NP[case: X] … {distance} … verb[case: Y]<br />

An example <strong>of</strong> a verb-final clause in German, as it was used in <strong>the</strong> following<br />

experiments, is given in (3).<br />

(3) Ich glaube, dass die <strong>Student</strong>in das wichtige Buch gelesen hat.<br />

I think that <strong>the</strong> student(fem) <strong>the</strong> important book read has<br />

‘I think, that <strong>the</strong> student has read <strong>the</strong> important book.’<br />

The auxiliary in clause-final position hat asks for nominative case in <strong>the</strong> NP die<br />

<strong>Student</strong>in. The memory trace <strong>of</strong> <strong>the</strong> case feature <strong>of</strong> this NP has to be memorized over a<br />

certain distance until <strong>the</strong> auxiliary hat is reached. The human sentence parser is <strong>the</strong>n<br />

able to link <strong>the</strong> two dependent items - NP and verb - and to check <strong>the</strong> case features <strong>of</strong><br />

both items. If, however, <strong>the</strong> distance between <strong>the</strong> verb and <strong>the</strong> related NP is too long<br />

than working memory is unable to keep <strong>the</strong> memory trace until <strong>the</strong> end <strong>of</strong> <strong>the</strong> sentence.<br />

In this case processing difficulties arise which can be measured experimentally.<br />

As mentioned above, amount <strong>of</strong> linguistic material and amount <strong>of</strong> time are<br />

normally confounded. In <strong>the</strong> first experiment <strong>the</strong> two factors were disentangled to<br />

investigate <strong>the</strong>ir respective impact on <strong>the</strong> human sentence parser independently. This<br />

builds on related work by Lewandowsky et al (2004) and Saito & Miyake (2004).<br />

The second experiment focused on sentence complexity according to <strong>the</strong><br />

Dependency Locality Theory (DLT) (Gibson 2000). The DLT assumes that <strong>the</strong> costs <strong>of</strong><br />

integrating a word w increase with <strong>the</strong> number <strong>of</strong> new discourse referents intervening<br />

between w and information needed to integrate w. For case-checking in German this<br />

prediction has not been tested so far.<br />

3 Experiment 1: Time-Based Decay versus Interference<br />

As shown in (3), <strong>the</strong> issue <strong>of</strong> forgetting in working memory was addressed by<br />

investigating <strong>the</strong> process <strong>of</strong> case-checking during <strong>the</strong> parsing <strong>of</strong> German verb-final<br />

clauses. By integrating <strong>the</strong> verb in clause-final position, <strong>the</strong> case <strong>of</strong> NP must be<br />

1 The experiments were part <strong>of</strong> a bigger project on sentence processing toge<strong>the</strong>r with Markus Bader.<br />

86


etrieved in order to check it against <strong>the</strong> case feature <strong>of</strong> <strong>the</strong> verb. If <strong>the</strong> intervening<br />

distance is to long, essential information about case features will be lost at a later point<br />

when it is needed again. To be able to investigate <strong>the</strong> nature <strong>of</strong> ‘distance’ <strong>the</strong> crucial<br />

factors have to be disentangled. This is achieved by manipulating <strong>the</strong> factors<br />

independently. First <strong>of</strong> all, a procedure was chosen that allowed to present <strong>the</strong> stimuli<br />

experimenter-paced in a non-cumulative word-by-word fashion (for details see section<br />

Procedure). Two different presentation rates, one for a slow and one for a fast<br />

presentation, were preset. Second, <strong>the</strong> intervening material between <strong>the</strong> related items<br />

was manipulated. Sentences as in (3) were created in a long and in a short version.<br />

Additional adverbials (e.g. ‘für die letzte Prüfung im Mai’) were inserted for <strong>the</strong> long<br />

versions, as can be seen in (4):<br />

(4) Ich glaube, dass die <strong>Student</strong>in (für die letzte Prüfung im Mai)<br />

I think that <strong>the</strong> student(fem) ( for <strong>the</strong> last exam in may)<br />

das wichtige Buch gelesen hat.<br />

<strong>the</strong> important book read has<br />

‘I think, that <strong>the</strong> student has read <strong>the</strong> important book.’<br />

A cross-combination <strong>of</strong> <strong>the</strong> two independently manipulated factors led to four different<br />

conditions that were presented (see Figure 1). Sentence (a) is a short sentence presented<br />

in <strong>the</strong> fast presentation rate (short-fast). Sentence (b) contains additional material and is<br />

also presented in <strong>the</strong> fast pace (long-fast). Sentences (c) and (d) are both presented in<br />

<strong>the</strong> slow pace. Note that (c) does not contain any additional material (short-slow),<br />

whereas (d) contains an additional adverbial (long-slow). Note especially that<br />

conditions (b) and (c) differ in <strong>the</strong> amount <strong>of</strong> intervening material, but - due to <strong>the</strong><br />

different presentation rates - <strong>the</strong>y are matched in <strong>the</strong> amount <strong>of</strong> time.<br />

a<br />

b<br />

c<br />

d<br />

NP1 das wichtige Buch<br />

V AUX<br />

NP1<br />

NP1<br />

NP1<br />

Figure 1: Presentation Time <strong>of</strong> all 4 Sentence Types <strong>of</strong> Experiment 1<br />

This design allows analyzing <strong>the</strong> impact <strong>of</strong> both factors independently. As this experiment<br />

partly builds on work <strong>of</strong> Lewandowsky et al. (2004) <strong>the</strong> terminology for <strong>the</strong> crucial factors will<br />

be adopted and labeled Time (amount <strong>of</strong> time) and Event (intervening material).<br />

Participants and Material<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

für die letzte Prüfung im Mai das wichtige Buch<br />

das wichtige Buch<br />

für die letzte Prüfung im Mai das wichtige Buch<br />

V AUX<br />

V AUX<br />

V AUX<br />

16 students <strong>of</strong> <strong>the</strong> University <strong>of</strong> Konstanz participated for course credit or payment. All<br />

participants were native speakers <strong>of</strong> German and naive with respect to <strong>the</strong> purpose <strong>of</strong> <strong>the</strong><br />

experiment.<br />

128 sentences were created, each in 16 versions according to <strong>the</strong> factors Voice (active<br />

versus passive), Status (grammatical versus ungrammatical), Time (fast versus slow) and Event<br />

(long versus short). Table 1 shows a Sample Stimuli Item <strong>of</strong> Experiment 1.<br />

87


Table 1. Sample Stimuli Item <strong>of</strong> Experiment 1<br />

Intervening material for all „(adverbial)“: ([…] für die letzte Prüfung im August […])<br />

([…] for <strong>the</strong> last exam in august […])<br />

(Active/ Grammatical)<br />

Der Dozent h<strong>of</strong>ft, dass die <strong>Student</strong>in (adverbial) das wichtige Buch gelesen hat<br />

<strong>the</strong> lecturer hopes that <strong>the</strong>(nom) student(fem) (adverbial) <strong>the</strong> important book read has<br />

'The lecturer hopes, that <strong>the</strong> student has read <strong>the</strong> important book (for <strong>the</strong> last exam in august).'<br />

(Passive/ Grammatical)<br />

Der Dozent h<strong>of</strong>ft, dass der <strong>Student</strong>in (adverbial) das wichtige Buch besorgt wurde<br />

<strong>the</strong> lecturer hopes that <strong>the</strong>(dat) student(fem) (adverbial) <strong>the</strong> important book obtained was<br />

'The lecturer hopes, that <strong>the</strong> important book (for <strong>the</strong> last exam in august) was obtained for <strong>the</strong> student.'<br />

(Active/ Ungrammatical)<br />

Der Dozent h<strong>of</strong>ft, dass der <strong>Student</strong>in (adverbial) das wichtige Buch gelesen hat<br />

<strong>the</strong> lecturer hope that <strong>the</strong>(dat) student(fem) (adverbial) <strong>the</strong> important book read has<br />

'The lecturer hopes, that <strong>the</strong> student has read <strong>the</strong> important book (for <strong>the</strong> last exam in august).'<br />

(Passive/Ungrammatical)<br />

Der Dozent h<strong>of</strong>ft, dass die <strong>Student</strong>in (adverbial) das wichtige Buch besorgt wurde.<br />

<strong>the</strong> lecturer hopes that <strong>the</strong>(nom) student(fem) (adverbial) <strong>the</strong> important book obtained was.<br />

'The lecturer hopes, that <strong>the</strong> important book (for <strong>the</strong> last exam in august) was obtained for <strong>the</strong> student.'<br />

The length <strong>of</strong> intervening material and presentation rate were manipulated<br />

independently. The factor Event (intervening material) was varied by adding adverbials<br />

<strong>of</strong> six words for <strong>the</strong> long version (cf. Table 1). The factor Time (presentation rate) was<br />

ei<strong>the</strong>r slow (188ms/word + 25ms/character) or fast 369ms/word + 44ms/character).<br />

Procedure<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In both experiments <strong>the</strong> speeded grammaticality judgment method was used. In this<br />

procedure sentences are presented in a word-by-word fashion. Each trial begins with <strong>the</strong><br />

presentation <strong>of</strong> <strong>the</strong> words "Bitte Leertaste drücken" ("Please Press Spacebar") to start<br />

<strong>the</strong> sentence. After pressing <strong>the</strong> spacebar, a fixation point appears in <strong>the</strong> center <strong>of</strong> <strong>the</strong><br />

screen for 1050ms. Thereafter <strong>the</strong> sentence is shown word by word in <strong>the</strong> center <strong>of</strong> <strong>the</strong><br />

screen. Immediately after <strong>the</strong> last word <strong>the</strong> participants are asked to judge <strong>the</strong><br />

grammaticality <strong>of</strong> <strong>the</strong> sentence as fast as possible by pressing one <strong>of</strong> two response<br />

buttons. Type <strong>of</strong> response and response time are recorded automatically. If a subject<br />

does not give a response within 2000ms after <strong>the</strong> last word appeared, <strong>the</strong> words "zu<br />

langsam" ("too slow") are shown and <strong>the</strong> trial is finished. In both experiments each<br />

subject received at least 10 practice items before <strong>the</strong> experimental sessions started.<br />

In experiment 1, all sentences were presented in two separate blocks in two<br />

different paces (according to <strong>the</strong> manipulations <strong>of</strong> <strong>the</strong> factor Time in a slow and in a fast<br />

pace). Every participant had to fulfill <strong>the</strong> experiment in both paces within one<br />

experimental session. Each block contained half <strong>of</strong> <strong>the</strong> entire set <strong>of</strong> sentences. Therefore<br />

each participant saw half <strong>of</strong> <strong>the</strong> sentences in <strong>the</strong> slow condition and <strong>the</strong> o<strong>the</strong>r half in <strong>the</strong><br />

fast condition. The order <strong>of</strong> <strong>the</strong> two blocks alternated between participants. The<br />

sentences were presented with filler sentences. The proportion <strong>of</strong> experimental<br />

sentences to filler sentences was 1:1. Filler sentences covered a range <strong>of</strong> various<br />

constructions and were half grammatical and half ungrammatical. Most <strong>of</strong> <strong>the</strong> fillers<br />

served as experimental items in two o<strong>the</strong>r experiments.<br />

88


Results<br />

The percentages <strong>of</strong> correct judgments in Experiment 1 are shown in Figure 2<br />

(grammatical conditions) and Figure 3 (ungrammatical conditions). Statistical analyses<br />

were conducted with subject as <strong>the</strong> random factor (F1) and with sentences as <strong>the</strong><br />

random factor (F2). The following main effects occurred: First, a significant effect <strong>of</strong><br />

<strong>the</strong> factor Event is obtained (F1(1,15)=22.30, p


4 Experiment 2: The Role <strong>of</strong> Complexity in Sentence Parsing<br />

Experiment 2 investigated <strong>the</strong> role <strong>of</strong> sentence complexity according to Gibson’s<br />

Distance Locality Theory (Gibson 2000) in <strong>the</strong> context <strong>of</strong> verb-final clauses in German.<br />

The DLT is a resource-driven model <strong>of</strong> language processing. The model assumes two<br />

major kinds <strong>of</strong> resource use. First, integrating a new word w into <strong>the</strong> current structure<br />

causes some cost (integration cost). Second, keeping <strong>the</strong> structure in memory also<br />

causes a certain kind <strong>of</strong> cost (storage cost). A central idea <strong>of</strong> <strong>the</strong> DLT is locality. Gibson<br />

assumes that <strong>the</strong> cost <strong>of</strong> integrating a new element into <strong>the</strong> current structure depends on<br />

<strong>the</strong> distance between <strong>the</strong> new element and <strong>the</strong> related element already processed. The<br />

assumption is that <strong>the</strong> distance is defined by <strong>the</strong> amount <strong>of</strong> discourse referents that are<br />

newly introduced between <strong>the</strong> items concerned.<br />

If this is so, an interesting question is whe<strong>the</strong>r material not introducing a new<br />

discourse referent also affects <strong>the</strong> ease <strong>of</strong> integrating w into <strong>the</strong> CPPM. This was tested<br />

in experiment 2 by <strong>the</strong> means <strong>of</strong> adverbial material. The crucial factors <strong>of</strong> experiment 2<br />

<strong>the</strong>refore are: Adverbial and Discourse Referents (DR).<br />

Participants and Material<br />

16 students <strong>of</strong> <strong>the</strong> University <strong>of</strong> Konstanz participated for course credit or payment. All<br />

participants were native speakers <strong>of</strong> German and naive with respect to <strong>the</strong> purpose <strong>of</strong><br />

<strong>the</strong> experiment.<br />

We created 128 sentences, each in 16 versions according to <strong>the</strong> factors Voice<br />

(active versus passive), Status (grammatical versus ungrammatical), Adverbial (NoAdv<br />

versus Adv) and Discourse Referents (0 DR versus 2 DR).<br />

Table 2 shows a Sample Stimuli <strong>of</strong> Experiment 2.<br />

Ich vermute, dass […]<br />

I guess , that […]<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Table 2. Sample Stimuli Item <strong>of</strong> Experiment 2<br />

(NoAdv. / 0 DR)<br />

[…] meine Pr<strong>of</strong>essorin, die sehr gut erklärt, eine freie Stelle ausgeschrieben hat.<br />

[…] my pr<strong>of</strong>essor(fem) who very good explains a vacant position <strong>of</strong>fered has<br />

‘I guess that my pr<strong>of</strong>essor, who explains very well, has <strong>of</strong>fered a vacant position.’<br />

(Adv. / 0 DR)<br />

[…] meine Pr<strong>of</strong>essorin, die immer wieder sehr gut erklärt, eine freie Stelle ausgeschrieben hat.<br />

[…] my pr<strong>of</strong>essor(fem) who again and again very good explains a vacant position <strong>of</strong>fered has<br />

‘I guess that my pr<strong>of</strong>essor, who explains very well repeatedly, has <strong>of</strong>fered a vacant position.’<br />

(NoAdv. / 2 DR)<br />

[…] meine Pr<strong>of</strong>essorin, die dem <strong>Student</strong>en das Skript ausleiht, eine freie Stelle ausgeschrieben hat.<br />

[…] my pr<strong>of</strong>essor(fem) who <strong>the</strong> student(dat) <strong>the</strong> script lends a vacant position <strong>of</strong>fered has<br />

‘I guess that my pr<strong>of</strong>essor, who lends <strong>the</strong> script to <strong>the</strong> student, has <strong>of</strong>fered a vacant position.’<br />

(Adv. / 2 DR)<br />

[…] meine Pr<strong>of</strong>essorin, die dem <strong>Student</strong>en doch noch das Skript ausleiht, eine freie Stelle<br />

[…] my pr<strong>of</strong>essor(fem) who <strong>the</strong> student(dat) eventually <strong>the</strong> script lends a vacant position<br />

ausgeschrieben hat.<br />

<strong>of</strong>fered has<br />

‘I guess that my pr<strong>of</strong>essor, who eventually lends <strong>the</strong> script to <strong>the</strong> student, has <strong>of</strong>fered a vacant position.’<br />

90


The complexity <strong>of</strong> relative clauses was manipulated in a two-factorial way. First,<br />

<strong>the</strong> relative clause contains ei<strong>the</strong>r 0 or 2 new NP-related discourse referents. The event<br />

referent introduced by <strong>the</strong> verb is ignored as it is introduced in all four relative clause<br />

types. Second, <strong>the</strong> relative clause does or does not contain an additional adverbial <strong>of</strong><br />

two words. Both factors were crossed. The resulting conditions are shown below in<br />

Figure 4. Relative-clause complexity increases from (a) to (d). Fur<strong>the</strong>rmore, (b) and (c)<br />

are matched according to <strong>the</strong> number <strong>of</strong> words <strong>the</strong>y contain, but <strong>the</strong>y differ in <strong>the</strong>ir<br />

internal structure. As one can see below, (b) contains additional adverbials <strong>of</strong> two words<br />

(“immer wieder”), but does not include any newly introduced discourse referents.<br />

Sentence type (c), on <strong>the</strong> o<strong>the</strong>r hand, only introduces two new discourse referents<br />

(“<strong>Student</strong>en” and “Skript”).<br />

Procedure<br />

In this experiment <strong>the</strong> same procedure, <strong>the</strong> speeded grammaticality judgment task, as in<br />

experiment 1 was used. In experiment 2 no manipulation <strong>of</strong> <strong>the</strong> presentation time was<br />

accomplished. The experiment was conducted in a one block. A presentation rate <strong>of</strong><br />

252ms per word + additional 28ms per letter was used.<br />

a<br />

b<br />

c<br />

d<br />

Results<br />

NP1 die sehr gut erklärt<br />

NP2 V<br />

NP1<br />

NP1<br />

NP1<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

die immer wieder sehr gut erklärt<br />

die dem <strong>Student</strong>en das neue Skript ausleiht<br />

die dem <strong>Student</strong>en doch noch das neue Skript ausleiht<br />

Figure 4: Length <strong>of</strong> Relative Clauses (According to <strong>the</strong> Number <strong>of</strong> Words)<br />

NP2 V<br />

NP2 V<br />

NP2 V<br />

The percentages <strong>of</strong> correct judgments in Experiment 2 are provided in Figure 5 (for<br />

grammatical conditions) and Figure 6 (for ungrammatical conditions). Statistical<br />

analyses revealed main effects for <strong>the</strong> factors Status (F1(1,15)= 26.57, p < .001;<br />

F2(1,15)= 2<strong>13</strong>.43, p


Percentage Correct (%)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

92<br />

89<br />

noNP<br />

noAdv<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

86<br />

83<br />

2NPs<br />

noAdv<br />

91<br />

87<br />

noNP<br />

Adverbial<br />

Active Passive<br />

79 78<br />

2NPs<br />

Adverbial<br />

Figure 5. Percentages <strong>of</strong> correct judgments for<br />

Grammatical Sentences<br />

5 General Discussion<br />

Percentage Correct (%)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

61 60<br />

noNP<br />

n o A d v<br />

56<br />

51<br />

2NPs<br />

n o A d v<br />

68<br />

58<br />

noNP<br />

A d v<br />

Active Passive<br />

56<br />

51<br />

2NPs<br />

A d v<br />

Figure 6. Percentages <strong>of</strong> correct judgments<br />

for Ungrammatical Sentences<br />

In Experiment 1, <strong>the</strong> factors Time and Event were disentangled to investigate <strong>the</strong> nature <strong>of</strong><br />

distance in sentence processing. The experiment had a clear-cut outcome for both factors.<br />

First, <strong>the</strong> factor Event clearly affects sentence processing. This especially can be seen in<br />

ungrammatical passive sentences. In that condition a decrease in <strong>the</strong> percentages <strong>of</strong> correct<br />

judgments <strong>of</strong> about 14% between long compared to short sentences can be found. As earlier<br />

experimental work has shown, ungrammatical passive sentences are always judged less<br />

reliably (cf. Bader & Bayer 2006). More material to process increases processing difficulty<br />

immensely, which results in a higher error rate <strong>of</strong> long sentences compared to short<br />

sentences. Second, <strong>the</strong> factor Time does not seem to affect sentence processing as predicted<br />

by time-based models. For short sentences, <strong>the</strong> slow presentation rate resulted in better<br />

performance than <strong>the</strong> fast presentation rate. This goes against <strong>the</strong> predictions. Long ra<strong>the</strong>r<br />

than short time intervals should affect sentence processing adversely (note that <strong>the</strong> fast<br />

presentation rate was not too fast, as can be seen in high percentages <strong>of</strong> correct judgments<br />

with up to 92%). For long sentences <strong>the</strong> presentation rate had no effect at all. The results<br />

suggest that time-based decay does not contribute to <strong>the</strong> difficulty <strong>of</strong> integrating a new word<br />

into <strong>the</strong> CPPM.<br />

Experiment 2 has two major results. First, confirming prior results, <strong>the</strong> number <strong>of</strong> new<br />

discourse referents had a major effect. Sentences containing two new discourse referents in<br />

<strong>the</strong> relative clause received significantly more judgment errors. Second, an intervening<br />

adverbial had no effect at all. This clearly can be found in <strong>the</strong> sentences which were equal in<br />

length according to <strong>the</strong> number <strong>of</strong> words <strong>the</strong>y contain, but which were manipulated with<br />

different material. Sentences that contained new discourse referents but no additional<br />

adverbial received substantially more judgment errors than sentences containing <strong>the</strong> same<br />

amount <strong>of</strong> words, but only containing additional adverbials. The results suggest that <strong>the</strong><br />

pure linear distance between w and information necessary to integrate w cannot be <strong>the</strong><br />

source <strong>of</strong> <strong>the</strong> observed difficulty. In particular, finding no differences between (a) versus (b)<br />

and (c) versus (d), but a substantial difference between (b) and (c) (cf. Figure 4) argues<br />

against <strong>the</strong>ories assuming that time or pure length - not introducing a new discourse referent<br />

- leads to forgetting in working memory. The results <strong>the</strong>refore support <strong>the</strong> Dependency<br />

Locality Theory <strong>of</strong> Gibson (2000).<br />

92


References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

M. Bader & J. Bayer (2006). Case and Linking in Language Comprehension. Evidence<br />

from German, Springer, Dordrecht.<br />

E. Gibson (2000). ‘The dependency locality <strong>the</strong>ory: A distance-based <strong>the</strong>ory <strong>of</strong><br />

linguistic complexity’. In A. Marantz et al. (eds.), Image Languae, Brain. MIT Press.<br />

S. Hopp & M. Bader (in prep.). ‘Forgetting in Working Memory: Interference versus<br />

Decay? Evidence from German Sentence Processing’.<br />

M. A. Just & P. A. Carpenter (1992). ‘A Capacity Theory <strong>of</strong> Comprehension: Individual<br />

Differences in Working Memory’. Psychological Review, vol. 99, no.1.<br />

R. L. Lewis & S. Vasishth. (2005). ‘An activation-based model <strong>of</strong> sentence processing<br />

as skilled memory retrieval’. Cognitive Science 29.<br />

R. Levy et al. (2007). ‘The syntactic complexity <strong>of</strong> Russian relative clauses’. Paper<br />

presented at <strong>the</strong> Annual Conference on Human Sentence Processing – CUNY 2007,<br />

San Diego, CA.<br />

S. Lewandowsky et al. (2004). ‘Time does not cause forgetting in short-term serial<br />

recall’. Psychonomic Bulletin & Review 11.<br />

J. S. Nairne (1990). ‘A feature model <strong>of</strong> immediate memory’. Memory & Condition, 18<br />

Saito, S., & Miyake, A. (2004). On <strong>the</strong> nature <strong>of</strong> forgetting and <strong>the</strong> processing-storage<br />

relationship in reading span performance. Journal <strong>of</strong> Memory and Language, 20.<br />

93


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

94


A SALIENCE-DRIVEN APPROACH TO<br />

SPEECH RECOGNITION FOR HUMAN-ROBOT INTERACTION<br />

Pierre Lison<br />

German Research Center for Artificial Intelligence<br />

Abstract. We present an implemented model for speech recognition in natural environments<br />

which relies on contextual information about salient entities to prime utterance recognition.<br />

The hypo<strong>the</strong>sis underlying our approach is that, in situated human-robot interaction, speech<br />

recognition performance can be significantly enhanced by exploiting knowledge about <strong>the</strong><br />

immediate physical environment and <strong>the</strong> dialogue history. To this end, visual salience (objects<br />

perceived in <strong>the</strong> physical scene) and linguistic salience (previously referred-to objects<br />

within <strong>the</strong> current dialogue) are integrated into a single cross-modal salience model. The<br />

model is dynamically updated as <strong>the</strong> environment evolves, and is used to establish expectations<br />

about uttered words which are most likely to be heard given <strong>the</strong> context. The update is<br />

realised by continously adapting <strong>the</strong> word-class probabilities specified in <strong>the</strong> statistical language<br />

model. The present article discusses <strong>the</strong> motivations behind our approach, describes<br />

our implementation as part <strong>of</strong> a distributed, cognitive architecture for mobile robots, and<br />

reports <strong>the</strong> evaluation results on a test suite.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Recent years have seen increasing interest in service robots endowed with communicative<br />

capabilities. In many cases, <strong>the</strong>se robots must operate in open-ended environments<br />

and interact with humans using natural language to perform a variety <strong>of</strong> service-oriented<br />

tasks. Developing cognitive systems for such robots remains a formidable challenge.<br />

S<strong>of</strong>tware architectures for cognitive robots are typically composed <strong>of</strong> several cooperating<br />

subsystems, such as communication, computer vision, navigation and manipulation<br />

skills, and various deliberative processes such as symbolic planners (Langley, Laird and<br />

Rogers, 2005).<br />

These subsystems are highly interdependent. It is not enough to equip <strong>the</strong> robot with<br />

basic functionalities for dialogue comprehension and production to make it interact naturally<br />

in situated dialogues. We also need to find meaningful ways to relate language,<br />

action and situated reality, and enable <strong>the</strong> robot to use its perceptual experience to continuously<br />

learn and adapt itself to <strong>the</strong> environment.<br />

The first step in comprehending spoken dialogue is automatic speech recognition [ASR].<br />

For robots operating in real-world noisy environments, and dealing with utterances pertaining<br />

to complex, open-ended domains, this step is particularly error-prone. In spite <strong>of</strong><br />

continuous technological advances, <strong>the</strong> performance <strong>of</strong> ASR remains for most tasks at<br />

least an order <strong>of</strong> magnitude worse than that <strong>of</strong> human listeners (Moore, 2007).<br />

One strategy for addressing this issue is to use context information to guide <strong>the</strong> speech<br />

recognition by percolating contextual constraints to <strong>the</strong> statistical language model (Gruenstein,<br />

Wang and Seneff, 2005). In this paper, we follow this approach by defining a contextsensitive<br />

language model which exploits information about salient objects in <strong>the</strong> visual<br />

scene and linguistic expressions in <strong>the</strong> dialogue history to prime recognition. To this end,<br />

95


a salience model integrating both visual and linguistic salience is used to dynamically<br />

compute lexical activations, which are incorporated into <strong>the</strong> language model at runtime.<br />

Our approach departs from previous work on context-sensitive speech recognition by<br />

modeling salience as inherently cross-modal, instead <strong>of</strong> relying on just one particular<br />

modality such as gesture (Chai and Qu, 2005), eye gaze (Qu and Chai, 2007) or dialogue<br />

state (Gruenstein et al., 2005). The FUSE system described in (Roy and Mukherjee, 2005)<br />

is a closely related approach, but limited to <strong>the</strong> processing <strong>of</strong> object descriptions, whereas<br />

our system was designed from <strong>the</strong> start to handle generic situated dialogues (cf. §3.3).<br />

The structure <strong>of</strong> <strong>the</strong> paper is as follows: in <strong>the</strong> next section we briefly introduce <strong>the</strong><br />

s<strong>of</strong>tware architecture in which our system has been developed. We <strong>the</strong>n describe <strong>the</strong><br />

salience model, and explain how it is utilised within <strong>the</strong> language model used for ASR.<br />

We finally present <strong>the</strong> evaluation <strong>of</strong> our approach, followed by conclusions.<br />

Figure 1: Robotic platform (left) and example <strong>of</strong> a real visual scene (right)<br />

2 Architecture<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Our approach has been implemented as part <strong>of</strong> a distributed cognitive architecture (Hawes,<br />

Sloman, Wyatt, Zillich, Jacobsson, Kruijff, Brenner, Berginc and Skocaj, n.d.). Each subsystem<br />

consists <strong>of</strong> a number <strong>of</strong> processes, and a working memory. The processes can<br />

access sensors, effectors, and <strong>the</strong> working memory to share information within <strong>the</strong> subsystem.<br />

Figure 2 illustrates <strong>the</strong> spoken dialogue comprehension. Numbers 1-11 in <strong>the</strong><br />

figure indicate <strong>the</strong> usual sequential order for <strong>the</strong> processes..<br />

The speech recognition utilises Nuance Recognizer v8.5 toge<strong>the</strong>r with a statistical language<br />

model (§ 3.4). For <strong>the</strong> online update <strong>of</strong> word class probabilities according to <strong>the</strong><br />

salience model, we use <strong>the</strong> “just-in-time grammar” functionality provided by Nuance.<br />

Syntactic parsing is based on an incremental chart parser 1 for Combinatory Categorial<br />

Grammar (Steedman and Baldridge, 2003), and yields a set <strong>of</strong> interpretations – that is,<br />

1 Built on top <strong>of</strong> <strong>the</strong> OpenCCG NLP library: http://openccg.sf.net<br />

96


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 2: Schematic view <strong>of</strong> <strong>the</strong> architecture for spoken dialogue comprehension<br />

97


logical forms expressed as ontologically rich, relational structures (Baldridge and Kruijff,<br />

2001). Figure 3 gives an example <strong>of</strong> such logical form.<br />

These interpretations are <strong>the</strong>n packed into a single representation (Oepen and Carroll,<br />

2000; Kruijff, Lison, Benjamin, Jacobsson and Hawes, in submission), a technique which<br />

enables us to efficiently handle syntactic ambiguity.<br />

Once <strong>the</strong> packed logical form is built, it is retrieved by <strong>the</strong> dialogue recognition module,<br />

which performs dialogue-level analysis tasks such as discourse reference resolution<br />

and dialogue move interpretation, and consequently updates <strong>the</strong> dialogue structure.<br />

@w1:cognition(want ∧<br />

ind ∧<br />

pres ∧<br />

(i1 : person ∧ I ∧<br />

sg ∧<br />

(t1 : action-motion ∧ take ∧<br />

y1 : person ∧<br />

(m1 : thing ∧ mug ∧<br />

unique ∧<br />

sg ∧<br />

specific singular)) ∧<br />

(y1 : person ∧ you ∧<br />

sg))<br />

Figure 3: Logical form generated for <strong>the</strong> utterance ‘I want you to take <strong>the</strong> mug’<br />

Linguistic interpretations must finally be associated with extra-linguistic knowledge<br />

about <strong>the</strong> environment – dialogue comprehension hence needs to connect with o<strong>the</strong>r subarchitectures<br />

like vision, spatial reasoning or planning. We realise this information binding<br />

between different modalities via a specific module, called <strong>the</strong> “binder”, which is responsible<br />

for <strong>the</strong> ontology-based mediation accross modalities (Jacobsson, Hawes, Kruijff<br />

and Wyatt, 2008).<br />

3 Approach<br />

3.1 Motivation<br />

As psycholinguistic studies have shown, humans do not process linguistic utterances in<br />

isolation from o<strong>the</strong>r modalities. Eye-tracking experiments notably highlighted that, during<br />

utterance comprehension, humans combine, in a closely time-locked fashion, linguistic<br />

information with scene understanding and world knowledge (Altmann and Kamide,<br />

2004; Knoeferle and Crocker, 2006).<br />

These observations – along with many o<strong>the</strong>rs – <strong>the</strong>refore provide solid evidence for <strong>the</strong><br />

embodied and situated nature <strong>of</strong> language and cognition (Lak<strong>of</strong>f, 1987; Barsalou, 1999).<br />

Humans thus systematically exploit dialogue and situated context to guide attention<br />

and help disambiguate and refine linguistic input by filtering out unlikely interpretations.<br />

Our approach is essentially an attempt to reproduce this mechanism in a robotic system.<br />

3.2 Salience modeling<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In our implementation, we define salience using two main sources <strong>of</strong> information:<br />

1. <strong>the</strong> salience <strong>of</strong> objects in <strong>the</strong> perceived visual scene;<br />

98


2. <strong>the</strong> linguistic salience or “recency” <strong>of</strong> linguistic expressions in <strong>the</strong> dialogue history.<br />

In <strong>the</strong> future, o<strong>the</strong>r sources could be added, for instance <strong>the</strong> possible presence <strong>of</strong> gestures<br />

(Chai and Qu, 2005), eye gaze tracking (Qu and Chai, 2007), entities in large-scale<br />

space (Zender and Kruijff, 2007), or <strong>the</strong> integration <strong>of</strong> a task model – as salience generally<br />

depends on intentionality (Landragin, 2006).<br />

3.2.1 Visual salience<br />

Via <strong>the</strong> “binder”, we can access <strong>the</strong> set <strong>of</strong> objects currently perceived in <strong>the</strong> visual scene.<br />

Each object is associated with a concept name (e.g. printer) and a number <strong>of</strong> features,<br />

for instance spatial coordinates or qualitative propreties like colour, shape or size.<br />

Several features can be used to compute <strong>the</strong> salience <strong>of</strong> an object. The ones currently<br />

used in our implementation are (1) <strong>the</strong> object size and (2) its distance relative to <strong>the</strong> robot<br />

(e.g. spatial proximity). O<strong>the</strong>r features could also prove to be helpful, like <strong>the</strong> reachability<br />

<strong>of</strong> <strong>the</strong> object, or its distance from <strong>the</strong> point <strong>of</strong> visual focus – similarly to <strong>the</strong> spread <strong>of</strong><br />

visual acuity across <strong>the</strong> human retina. To derive <strong>the</strong> visual salience value for each object,<br />

we assign a numeric value for <strong>the</strong> two variables, and <strong>the</strong>n perform a weighted addition.<br />

The associated weights are determined via regression tests.<br />

At <strong>the</strong> end <strong>of</strong> <strong>the</strong> processing, we end up with a set Ev <strong>of</strong> visual objects, each <strong>of</strong> which<br />

is associated with a numeric salience value s(ek), with 1 ≤ k ≤ |Ev|.<br />

3.2.2 Linguistic salience<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

There is a vast amount <strong>of</strong> literature on <strong>the</strong> topic <strong>of</strong> linguistic salience. Roughly speaking,<br />

linguistic salience can be characterised ei<strong>the</strong>r in terms <strong>of</strong> hierarchical recency, according<br />

to a tree-like model <strong>of</strong> discourse structure, or in terms <strong>of</strong> linear recency <strong>of</strong> mention<br />

(Kelleher, 2005). Our implementation can <strong>the</strong>orically handle both types <strong>of</strong> linguistic<br />

salience, but, at <strong>the</strong> time <strong>of</strong> writing, only <strong>the</strong> linear recency is calculated.<br />

To compute <strong>the</strong> linguistic salience, we extract a set El <strong>of</strong> potential referents from <strong>the</strong><br />

discourse structure, and for each referent ek we assign a salience value s(ek) equal to<br />

<strong>the</strong> distance (measured on a logarithmic scale) between its last mention and <strong>the</strong> current<br />

position in <strong>the</strong> discourse structure.<br />

3.2.3 Cross-modal salience model<br />

Once <strong>the</strong> visual and linguistic salience are computed, we can proceed to <strong>the</strong>ir integration<br />

into a cross-modal statistical model. We define <strong>the</strong> set E as <strong>the</strong> union <strong>of</strong> <strong>the</strong> visual and<br />

linguistic entities: E = Ev ∪ El, and devise a probability distribution P (E) on this set:<br />

P (ek) = δv IEv(ek) sv(ek) + δl IEl (ek) sl(ek)<br />

|E|<br />

where IA(x) is <strong>the</strong> indicator function <strong>of</strong> set A, and δv, δk are factors controlling <strong>the</strong><br />

relative importance <strong>of</strong> each type <strong>of</strong> salience. They are determined empirically, subject to<br />

<strong>the</strong> following constraint to normalise <strong>the</strong> distribution :<br />

δv<br />

�<br />

ek∈Ev<br />

s(ek) + δl<br />

�<br />

ek∈El<br />

(1)<br />

s(ek) = |E| (2)<br />

The statistical model P (E) thus simply reflects <strong>the</strong> salience <strong>of</strong> each visual or linguistic<br />

entity: <strong>the</strong> more salient, <strong>the</strong> higher <strong>the</strong> probability.<br />

99


3.3 Lexical activation<br />

In order for <strong>the</strong> salience model to be <strong>of</strong> any use for speech recognition, a connection<br />

between <strong>the</strong> salient entities and <strong>the</strong>ir associated words in <strong>the</strong> ASR vocabulary needs to<br />

be established. To this end, we define a lexical activation network, which lists, for each<br />

possible salient entity, <strong>the</strong> set <strong>of</strong> words activated by it. The network specifies <strong>the</strong> words<br />

which are likely to be heard when <strong>the</strong> given entity is present in <strong>the</strong> environment or in<br />

<strong>the</strong> dialogue history. It can <strong>the</strong>refore include words related to <strong>the</strong> object denomination,<br />

subparts, common properties or affordances. The salient entity laptop will activate words<br />

like ‘laptop’, ‘notebook’, ‘screen’, ‘opened’, ‘ibm’, ‘switch on/<strong>of</strong>f’, ‘close’, etc. The list<br />

is structured according to word classes, and a weight can be set on each word to modulate<br />

<strong>the</strong> lexical activation: supposing a laptop is present, <strong>the</strong> word ‘laptop’ should receive a<br />

higher activation than, say, <strong>the</strong> word ‘close’, which is less situation specific.<br />

The use <strong>of</strong> lexical activation networks is a key difference between our model and (Roy<br />

and Mukherjee, 2005), which relies on a measure <strong>of</strong> “descriptive fitness” to modify <strong>the</strong><br />

word probabilities. One advantage <strong>of</strong> our approach is <strong>the</strong> possibility to go beyond object<br />

descriptions and activate word types denoting subparts, properties or affordances <strong>of</strong><br />

objects 2 .<br />

If <strong>the</strong> probability <strong>of</strong> specific words is increased, we need to re-normalise <strong>the</strong> probability<br />

distribution. One solution would be to decrease <strong>the</strong> probability <strong>of</strong> all non-activated words<br />

accordingly. This solution, however, suffers from a significant drawback: our vocabulary<br />

contains many context-independent words like ‘thing’, or ‘place’, whose probability<br />

should remain constant. To address this issue, we mark an explicit distinction in our<br />

vocabulary between context-dependent and context-independent words.<br />

In <strong>the</strong> current implementation, <strong>the</strong> lexical activation network is constructed semimanually,<br />

using a simple lexicon extraction algorithm. We start with <strong>the</strong> list <strong>of</strong> possible<br />

salient entities, which is given by<br />

1. <strong>the</strong> set <strong>of</strong> physical objects <strong>the</strong> vision subsystem can recognise ;<br />

2. <strong>the</strong> set <strong>of</strong> nouns specified in <strong>the</strong> CCG lexicon with ‘object’ as ontological type.<br />

For each entity, we <strong>the</strong>n extract its associated lexicon by matching domain-specific syntactic<br />

patterns against a corpus <strong>of</strong> dialogue transcripts.<br />

3.4 Language modeling<br />

We now detail <strong>the</strong> language model used for <strong>the</strong> speech recognition – a class-based trigram<br />

model enriched with contextual information provided by <strong>the</strong> salience model.<br />

3.4.1 Corpus generation<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We need a corpus to train any statistical language model. Unfortunately, no corpus <strong>of</strong><br />

situated dialogue adapted to our task domain was available. Collecting in-domain data via<br />

Wizard <strong>of</strong> Oz experiments is a very costly and time-consuming process, so we decided<br />

to follow <strong>the</strong> approach advocated in (Weilhammer, Stuttle and Young, 2006) instead and<br />

generate a class-based corpus from a task grammar we had at our disposal.<br />

Practically, we first collected a small set <strong>of</strong> WOz experiments, totalling about 800<br />

utterances. This set is <strong>of</strong> course too small to be directly used as a corpus for language<br />

2 In <strong>the</strong> context <strong>of</strong> a laptop object, ‘screen’ and ‘switch on/<strong>of</strong>f’ would for instance be activated.<br />

100


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

model training, but sufficient to get an intuitive idea <strong>of</strong> <strong>the</strong> kind <strong>of</strong> utterances we had to<br />

deal with.<br />

Based on it, we designed a domain-specific context-free grammar able to cover most<br />

<strong>of</strong> <strong>the</strong> utterances. Weights were <strong>the</strong>n automatically assigned to each grammar rule by<br />

parsing our initial corpus, hence leading to a small stochastic context-free grammar.<br />

As a last step, this grammar is randomly traversed a large number <strong>of</strong> times, which gives<br />

us <strong>the</strong> generated corpus.<br />

3.4.2 Salience-driven, class-based language models<br />

The objective <strong>of</strong> <strong>the</strong> speech recognizer is to find <strong>the</strong> word sequence W ∗ which has <strong>the</strong><br />

highest probability given <strong>the</strong> observed speech signal O and a set E <strong>of</strong> salient objects:<br />

W ∗ = arg max<br />

W<br />

P (O|W) ×<br />

� �� �<br />

P (W|E)<br />

� �� �<br />

acoustic model salience-driven language model<br />

For a trigram language model, <strong>the</strong> probability <strong>of</strong> <strong>the</strong> word sequence P (w n 1 |E) is:<br />

P (w n 1 |E) �<br />

(3)<br />

n�<br />

P (wi|wi−1wi−2; E) (4)<br />

i=1<br />

Our language model is class-based, so it can be fur<strong>the</strong>r decomposed into word-class<br />

and class transitions probabilities. The class transition probabilities reflect <strong>the</strong> language<br />

syntax; we assume <strong>the</strong>y are independent <strong>of</strong> salient objects. The word-class probabilities,<br />

however, do depend on context: for a given class – e.g. noun -, <strong>the</strong> probability <strong>of</strong> hearing<br />

<strong>the</strong> word ‘laptop’ will be higher if a laptop is present in <strong>the</strong> environment. Hence:<br />

P (wi|wi−1wi−2; E) = P (wi|ci; E)<br />

� �� �<br />

× P (ci|ci−1, ci−2)<br />

� �� �<br />

word-class probability class transition probability<br />

We now define <strong>the</strong> word-class probabilities P (wi|ci; E):<br />

P (wi|ci; E) = �<br />

P (wi|ci; ek) × P (ek) (6)<br />

ek∈E<br />

To compute P (wi|ci; ek), we use <strong>the</strong> lexical activation network specified for ek:<br />

⎧<br />

⎪⎨<br />

P (wi|ci) + α1 if wi ∈ activatedWords(ek)<br />

P (wi|ci) − α2 if wi /∈ activatedWords(ek) ∧<br />

P (wi|ci; ek) =<br />

wi ⎪⎩<br />

∈ contextDependentWords<br />

P (wi|ci) else<br />

The optimum value <strong>of</strong> α1 is determined using regression tests. α2 is computed relative<br />

to α1 in order to keep <strong>the</strong> sum <strong>of</strong> all probabilities equal to 1:<br />

α2 =<br />

|activatedWords|<br />

× α1<br />

|contextDependentWords| − |activatedWords|<br />

These word-class probabilities are dynamically updated as <strong>the</strong> environment and <strong>the</strong><br />

dialogue evolves and incorporated into <strong>the</strong> language model at runtime.<br />

101<br />

(5)<br />

(7)


4 Evaluation<br />

4.1 Evaluation procedure<br />

We evaluated our approach using a test suite <strong>of</strong> 250 spoken utterances recorded during<br />

Wizard <strong>of</strong> Oz experiments. The participants were asked to interact with <strong>the</strong> robot while<br />

looking at a specific visual scene. We designed 10 different visual scenes by systematic<br />

variation <strong>of</strong> <strong>the</strong> nature, number and spatial configuration <strong>of</strong> <strong>the</strong> objects presented. Figure<br />

4 gives an example <strong>of</strong> a visual scene.<br />

The interactions could include descriptions, questions and commands. No particular<br />

tasks were assigned to <strong>the</strong> participants. The only constraint we imposed was that all<br />

interactions with <strong>the</strong> robot had to be related to <strong>the</strong> shared visual scene.<br />

Figure 4: Sample visual scene including three objects: a box, a ball, and a chocolate bar.<br />

4.2 Results<br />

Table 1 summarises our experimental results. Due to space constraints, we focus our<br />

analysis on <strong>the</strong> WER <strong>of</strong> our model compared to <strong>the</strong> baseline – that is, compared to a<br />

class-based trigram model not based on salience.<br />

4.3 Analysis<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Word Error Rate<br />

[WER]<br />

Classical LM Salience-driven LM<br />

vocabulary size 25.04 % 24.22 %<br />

� 200 words (NBest 3: 20.72 %) (NBest 3: 19.97 %)<br />

vocabulary size 26.68 % 23.85 %<br />

� 400 words (NBest 3: 21.98 %) (NBest 3: 19.97 %)<br />

vocabulary size 28.61 % 23.99 %<br />

� 600 words (NBest 3: 24.59 %) (NBest 3: 20.27 %)<br />

Table 1: Comparative results <strong>of</strong> recognition performance<br />

As <strong>the</strong> results show, <strong>the</strong> use <strong>of</strong> a salience model can enhance <strong>the</strong> recognition performance<br />

in situated interactions: with a vocabulary <strong>of</strong> about 600 words, <strong>the</strong> WER is indeed reduced<br />

by 16.1 % compared to <strong>the</strong> baseline. According to <strong>the</strong> Sign test, <strong>the</strong> differences for <strong>the</strong><br />

last two tests (400 and 600 words) are statistically significant. As we could expect, <strong>the</strong><br />

salience-driven approach is especially helpful when operating with a larger vocabulary,<br />

102


where <strong>the</strong> expectations provided by <strong>the</strong> salience model can really make a difference in <strong>the</strong><br />

word recognition.<br />

The word error rate remains never<strong>the</strong>less quite high. This is due to several reasons.<br />

The major issue is that <strong>the</strong> words causing most recognition problems are – at least in<br />

our test suite – function words like prepositions, discourse markers, connectives, auxiliaries,<br />

etc., and not content words. Unfortunately, <strong>the</strong> use <strong>of</strong> function words is usually not<br />

context-dependent, and hence not influenced by salience. We estimated that 89 % <strong>of</strong> <strong>the</strong><br />

recognition errors were due to function words. Moreover, our chosen test suite is constituted<br />

<strong>of</strong> “free speech” interactions, which <strong>of</strong>ten include lexical items or grammatical<br />

constructs outside <strong>the</strong> range <strong>of</strong> our language model.<br />

5 Conclusion<br />

We have presented an implemented model for speech recognition based on <strong>the</strong> concept <strong>of</strong><br />

salience. This salience is defined via visual and linguistic cues, and is used to compute<br />

degrees <strong>of</strong> lexical activations, which are in turn applied to dynamically adapt <strong>the</strong> ASR<br />

language model to <strong>the</strong> robot’s environment and dialogue state.<br />

As future work we will examine <strong>the</strong> potential extension <strong>of</strong> our approach in three directions.<br />

First, we are investigating how to use <strong>the</strong> situated context to perform some priming<br />

<strong>of</strong> function words like prepositions or discourse markers. Second, we wish to take o<strong>the</strong>r<br />

information sources into account, particularly <strong>the</strong> integration <strong>of</strong> a task model, relying on<br />

data made available by <strong>the</strong> symbolic planner. And finally, we want to go beyond speech<br />

recognition, and investigate <strong>the</strong> relevance <strong>of</strong> such salience model for <strong>the</strong> development <strong>of</strong><br />

a robust understanding system for situated dialogue.<br />

Acknowledgements<br />

My thanks go to G.-J. Kruijff, H. Zender, M. Wilson and N. Yampolska for <strong>the</strong>ir insightful comments.<br />

The research reported in this article was supported by <strong>the</strong> EU FP6 IST Cognitive Systems<br />

Integrated project Cognitive Systems for Cognitive Assistants “CoSy” FP6-004250-IP.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Altmann, G. T. and Kamide, Y. (2004). Now you see it, now you don’t: Mediating<br />

<strong>the</strong> mapping between language and <strong>the</strong> visual world, Psychology Press, New York,<br />

pp. 347–386.<br />

Baldridge, J. and Kruijff, G.-J. M. (2001). Coupling ccg and hybrid logic dependency<br />

semantics, ACL ’02: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 40th Annual Meeting on Association for<br />

Computational Linguistics, ACL, Morristown, NJ, USA, pp. 319–326.<br />

Barsalou, L. W. (1999). Perceptual symbol systems., Behavioral & Brain Sciences 22(4).<br />

Chai, J. Y. and Qu, S. (2005). A salience driven approach to robust input interpretation in<br />

multimodal conversational systems, <strong>Proceedings</strong> <strong>of</strong> Human Language Technology<br />

Conference and Conference on Empirical Methods in Natural Language Processing<br />

2005, Association for Computational Linguistics, Vancouver, Canada, pp. 217–224.<br />

Gruenstein, A., Wang, C. and Seneff, S. (2005). Context-sensitive statistical language<br />

modeling, <strong>Proceedings</strong> <strong>of</strong> INTERSPEECH 2005, pp. 17–20.<br />

103


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Hawes, N., Sloman, A., Wyatt, J., Zillich, M., Jacobsson, H., Kruijff, G.-J. M., Brenner,<br />

M., Berginc, G. and Skocaj, D. (n.d.). Towards an integrated robot with multiple<br />

cognitive functions., AAAI, AAAI Press, pp. 1548–1553.<br />

Jacobsson, H., Hawes, N., Kruijff, G.-J. and Wyatt, J. (2008). Crossmodal content binding<br />

in information-processing architectures, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 3rd ACM/IEEE International<br />

Conference on Human-Robot Interaction (HRI), Amsterdam, The Ne<strong>the</strong>rlands.<br />

Kelleher, J. (2005). Integrating visual and linguistic salience for reference resolution, in<br />

N. Creaney (ed.), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 16th Irish conference on Artificial Intelligence<br />

and Cognitive Science (AICS-05), Portstewart, Nor<strong>the</strong>rn Ireland.<br />

Knoeferle, P. and Crocker, M. (2006). The coordinated interplay <strong>of</strong> scene, utterance, and<br />

world knowledge: evidence from eye tracking, Cognitive Science 30(3): 481–529.<br />

Kruijff, G.-J. M., Lison, P., Benjamin, T., Jacobsson, H. and Hawes, N. (in submission).<br />

Incremental, multi-level processing for comprehending situated dialogue in humanrobot<br />

interaction, Connection Science .<br />

Lak<strong>of</strong>f, G. (1987). Women, fire and dangerous things: what categories reveal about <strong>the</strong><br />

mind, University <strong>of</strong> Chicago Press, Chicago.<br />

Landragin, F. (2006). Visual perception, language and gesture: A model for <strong>the</strong>ir understanding<br />

in multimodal dialogue systems, Signal Processing 86(12): 3578–3595.<br />

Langley, P., Laird, J. E. and Rogers, S. (2005). Cognitive architectures: Research issues<br />

and challenges, Technical report, Institute for <strong>the</strong> Study <strong>of</strong> Learning and Expertise,<br />

Palo Alto.<br />

Moore, R. K. (2007). Spoken language processing: piecing toge<strong>the</strong>r <strong>the</strong> puzzle, Speech<br />

Communication: Special Issue on Bridging <strong>the</strong> Gap Between Human and Automatic<br />

Speech Processing 49: 418–435.<br />

Oepen, S. and Carroll, J. (2000). Ambiguity packing in constraint-based parsing - practical<br />

results, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 1st Conference <strong>of</strong> <strong>the</strong> North America Chapter <strong>of</strong> <strong>the</strong><br />

Association <strong>of</strong> Computational Linguistics, Seattle, WA, pp. 162–169.<br />

Qu, S. and Chai, J. (2007). An exploration <strong>of</strong> eye gaze in spoken language processing for<br />

multimodal conversational interfaces, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Conference <strong>of</strong> <strong>the</strong> North<br />

America Chapter <strong>of</strong> <strong>the</strong> Association <strong>of</strong> Computational Linguistics, pp. 284–291.<br />

Roy, D. and Mukherjee, N. (2005). Towards situated speech understanding: visual context<br />

priming <strong>of</strong> language models, Computer Speech & Language (2): 227–248.<br />

Steedman, M. and Baldridge, J. (2003). Combinatory categorial grammar. MS Draft 4.<br />

Weilhammer, K., Stuttle, M. N. and Young, S. (2006). Bootstrapping language models<br />

for dialogue systems, <strong>Proceedings</strong> <strong>of</strong> INTERSPEECH 2006, Pittsburgh, PA.<br />

Zender, H. and Kruijff, G.-J. M. (2007). Towards generating referring expressions in<br />

a mobile robot scenario, Language and Robots: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Symposium,<br />

Aveiro, Portugal, pp. 101–106.<br />

104


A LOGIC WITH A CONDITIONAL PROBABILITY OPERATOR<br />

Petar Maksimović, Dragan Doder, Bojan Marinković and Aleksandar Perović<br />

Ma<strong>the</strong>matical Institute <strong>of</strong> <strong>the</strong> Serbian Academy <strong>of</strong> Sciences and Arts, Belgrade, Serbia<br />

Abstract. This paper presents a sound and strongly complete axiomatization <strong>of</strong> <strong>the</strong> reasoning<br />

about linear combinations <strong>of</strong> conditional probabilities, including comparative statements.<br />

The developed logic is decidable, with a PSPACE containment for <strong>the</strong> decision procedure.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The present paper constitutes an effort to proceed along <strong>the</strong> lines <strong>of</strong> <strong>the</strong> research presented<br />

in (Fagin, Halpern and Megiddo, 1990; Lukasiewicz, 2002; Ognjanović and Raˇsković,<br />

1996; Ognjanović and Raˇsković, 1999; Ognjanović and Raˇsković, 2000; Ognjanović,<br />

Marković and Raˇsković, 2005; Ognjanović, Perović and Raˇsković, 2008; Raˇsković, Ognjanović<br />

and Marković, 2004), on <strong>the</strong> formal development <strong>of</strong> probabilistic logics, where<br />

probability statements are expressed by probabilistic operators expressing bounds on <strong>the</strong><br />

probability <strong>of</strong> a propositional formula.<br />

The main technical novelty <strong>of</strong> this paper lies in <strong>the</strong> fact that in it is given a sound and<br />

strongly complete axiomatization <strong>of</strong> <strong>the</strong> reasoning about linear combinations <strong>of</strong> conditional<br />

probabilities, which also allows for qualitative statements. For instance, we formally<br />

write <strong>the</strong> statement “<strong>the</strong> conditional probability <strong>of</strong> α given β is at least <strong>the</strong> sum <strong>of</strong><br />

conditional probabilities <strong>of</strong> α given γ and twice γ given α” as<br />

CP (α, β) � CP (α, γ) + 2 · CP (γ, α).<br />

It should be noted that all <strong>of</strong> <strong>the</strong> probabilities we use are Kolmogorov-style. We also prove<br />

that <strong>the</strong> developed logic is decidable.<br />

As it is well known, <strong>the</strong> conditional probability <strong>of</strong> α given β has meaning only if<br />

P (β) > 0, and is, by definition, calculated by<br />

P (α|β) =<br />

P (α ∧ β)<br />

.<br />

P (β)<br />

To avoid technical difficulties, we will adopt <strong>the</strong> convention that 0 −1 = 1. Namely, it is<br />

more convenient to assume that −1 is a total operation, with this being considered usual<br />

practice in quantifier elimination for <strong>the</strong> <strong>the</strong>ory <strong>of</strong> real closed fields. In this way, we make<br />

sure that conditional events are always defined.<br />

The rest <strong>of</strong> <strong>the</strong> paper is organized as follows. In Section 2 <strong>the</strong> syntax <strong>of</strong> <strong>the</strong> logic is<br />

given and <strong>the</strong> class <strong>of</strong> measurable probabilistic models is described. Section 3 contains<br />

<strong>the</strong> corresponding axiomatization and introduces <strong>the</strong> notion <strong>of</strong> deduction. A pro<strong>of</strong> <strong>of</strong> <strong>the</strong><br />

completeness <strong>the</strong>orem is presented in Section 4, whereas <strong>the</strong> decidability <strong>of</strong> <strong>the</strong> logic is<br />

analyzed in Section 5. Concluding remarks are in Section 6.<br />

105


2 Syntax and semantics<br />

Let V ar = {pn | n < ω} be <strong>the</strong> set <strong>of</strong> propositional variables. The corresponding set <strong>of</strong> all<br />

propositional formulas over V ar will be denoted by F orC, where C stands for classical,<br />

and is defined in <strong>the</strong> usual way. Propositional formulas will be denoted by α, β and γ,<br />

possibly with indices.<br />

Definition 1 The set T erm <strong>of</strong> all probabilistic terms is recursively defined as follows:<br />

• T erm(0) = {s | s ∈ Q} ∪ {CP (α, β) | α, β ∈ F orC}.<br />

• T erm(n + 1) = T erm(n) ∪ {(f + g), (s · g), (−f) | f, g ∈ T erm(n), s ∈ Q}<br />

• T erm = ∞�<br />

n=0<br />

T erm(n). �<br />

Probabilistic terms will be denoted by f, g and h, possibly with indices. To simplify<br />

notation, we introduce <strong>the</strong> following convention: f+g is (f+g), f+g+h is ((f+g)+h).<br />

For n > 3, n�<br />

fi is ((· · · ((f1 + f2) + f3) + · · ·) + fn). Similarly, −f is (−f) and f − g<br />

i=1<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

is (f + (−g)).<br />

If α and β are propositional formulas, <strong>the</strong>n <strong>the</strong> probabilistic term CP (α, β) reads “<strong>the</strong><br />

conditional probability <strong>of</strong> α given β”. To simplify notation, we will write P (α) instead<br />

<strong>of</strong> CP (α, ⊤), where ⊤ is an arbitrary tautology instance.<br />

Definition 2 A basic probabilistic formula is any formula <strong>of</strong> <strong>the</strong> form f � 0. Fur<strong>the</strong>rmore,<br />

we define <strong>the</strong> following abbreviations:<br />

• f � 0 is −f � 0; • f > 0 is ¬(f � 0); • f < 0 is ¬(f � 0);<br />

• f = 0 is f � 0 ∧ f � 0; • f �= 0 is ¬(f = 0); • f � g is f − g � 0.<br />

We define f � g, f > g, f < g, f = g and f �= g in a similar way. �<br />

We define <strong>the</strong> notion <strong>of</strong> a probabilistic formula as a Boolean combination <strong>of</strong> basic<br />

probabilistic formulas. As in <strong>the</strong> propositional case, ¬ and ∧ are <strong>the</strong> primitive connectives,<br />

while all <strong>of</strong> <strong>the</strong> o<strong>the</strong>r connectives are introduced in <strong>the</strong> usual way. Probabilistic formulas<br />

will be denoted by φ, ψ and θ, possibly with indices. The set <strong>of</strong> all probabilistic formulas<br />

will be denoted by F orP .<br />

By “formula” we mean ei<strong>the</strong>r a classical formula or a probabilistic formula. We do<br />

not allow for <strong>the</strong> mixing <strong>of</strong> those types <strong>of</strong> formulas, nor for <strong>the</strong> nesting <strong>of</strong> <strong>the</strong> probability<br />

operator P . Formulas will be denoted by Φ, Ψ and Θ, possibly with indices. The set <strong>of</strong><br />

all formulas will be denoted by F or.<br />

We define <strong>the</strong> notion <strong>of</strong> a model as a special kind <strong>of</strong> Kripke model. Namely, a model<br />

M is any tuple 〈W, H, µ, v〉 such that:<br />

• W is a nonempty set. As usual, its elements will be called worlds.<br />

• H is an algebra <strong>of</strong> sets over W .<br />

• µ : H −→ [0, 1] is a finitely additive probability measure.<br />

106


• v : F orC × W −→ {0, 1} is a truth assignment 1 compatible with ¬ and ∧. That is,<br />

v(¬α, w) = 1 − v(α, w) and v(α ∧ β, w) = v(α, w) · v(β, w).<br />

For a given model M, let [α]M be <strong>the</strong> set <strong>of</strong> all w ∈ W such that v(α, w) = 1. If<br />

<strong>the</strong> context is clear, we will write [α] instead <strong>of</strong> [α]M. We say that M is measurable if<br />

[α] ∈ H for all α ∈ F orC.<br />

Definition 3 Let M = 〈W, H, µ, v〉 be any measurable model. We define <strong>the</strong> satisfiability<br />

relation |= recursively as follows:<br />

• M |= α if v(α, w) = 1 for all w ∈ W .<br />

• M |= f � 0 if f M � 0, where f M is recursively defined in <strong>the</strong> following way:<br />

– s M = s.<br />

– CP (α, β) M = µ([α ∧ β]) · µ([β]) −1 .<br />

– (f + g) M = f M + g M .<br />

– (s · g) M = s · g M .<br />

– (−f) M = −(f M ).<br />

• M |= ¬φ if M �|= φ.<br />

• M |= φ ∧ ψ if M |= φ and M |= ψ. �<br />

A formula Φ is satisfiable if <strong>the</strong>re is a measurable model M such that M |= Φ; Φ is<br />

valid if it is satisfied in every measurable model. We say that <strong>the</strong> set T <strong>of</strong> formulas is<br />

satisfiable if <strong>the</strong>re is a measurable model M such that M |= Φ for all Φ ∈ T .<br />

Notice that <strong>the</strong> last two clauses <strong>of</strong> Definition 3 provide validity <strong>of</strong> each tautology instance.<br />

3 Axiomatization<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In this section we will introduce <strong>the</strong> axioms and inference rules and prove that <strong>the</strong> proposed<br />

axiomatization is sound and strongly complete with respect to <strong>the</strong> class <strong>of</strong> all measurable<br />

models. The set <strong>of</strong> axioms from our axiomatic system, which we denote AXLPCP,<br />

is divided into three groups: axioms for propositional reasoning, axioms for probabilistic<br />

reasoning and arithmetical axioms.<br />

Axioms for propositional reasoning<br />

A1. τ(Φ1, . . . , Φn), where τ(p1, . . . , pn) ∈ F orC is any tautology and Φi are ei<strong>the</strong>r all<br />

propositional or all probabilistic.<br />

Axioms for probabilistic reasoning<br />

A2. P (α) � 0; A5. P (α ↔ β) = 1 → P (α) = P (β);<br />

A3. P (⊤) = 1; A6. P (α ∨ β) = P (α) + P (β) − P (α ∧ β);<br />

A4. P (⊥) = 0; A7. (P (α ∧ β) = r ∧ P (β) = s) → CP (α, β) = r · s −1 .<br />

1 1 stands for “true”, while 0 stands for “false”<br />

107


Arithmetical axioms.<br />

A8. r � s, whenever r � s; A16. s · (f + g) = (s · f) + (s · g)<br />

A9. s · r = sr; A17. r · (s · f) = r · s · f<br />

A10. s + r = s + r; A18. 1 · f = f<br />

A11. f + g = g + f; A19. f � g ∨ g � f<br />

A12. (f + g) + h = f + (g + h); A20. (f � g ∧ g � h) → f � h<br />

A<strong>13</strong>. f + 0 = f; A21. f � g → f + h � g + h<br />

A14. f − f = 0; A22. (f � g ∧ s > 0) → s · f � s · g<br />

A15. (r · f) + (s · f) = r + s · f;<br />

Inference rules<br />

R1. From Φ and Φ → Ψ infer Ψ.<br />

R2. From α infer P (α) = 1.<br />

R3. From <strong>the</strong> set <strong>of</strong> premises {φ → f � −n −1 | n = 1, 2, 3, . . .} infer φ → f � 0.<br />

Let us briefly comment on <strong>the</strong> axioms and inference rules. The axioms A1-A7 provide<br />

<strong>the</strong> required properties <strong>of</strong> probability, while <strong>the</strong> axioms A8-A22 provide <strong>the</strong> properties<br />

required for computation. In <strong>the</strong> inference rules, R1 is modus ponens, R2 resembles<br />

necessitation, while R3 provides that non-Archimedean probabilites are not permitted.<br />

Definition 4 A formula Φ is a <strong>the</strong>orem (⊢ Φ) if <strong>the</strong>re is an at most countable sequence<br />

<strong>of</strong> formulas Φ0, Φ1, . . . , Φ, such that every Φi is ei<strong>the</strong>r an axiom or it is derived from <strong>the</strong><br />

preceding formulas <strong>of</strong> <strong>the</strong> sequence by an inference rule. In this paper we will also use<br />

<strong>the</strong> notion <strong>of</strong> deducibility. A formula Φ is deducible from a set T <strong>of</strong> sentences (T ⊢ Φ) if<br />

<strong>the</strong>re is an at most countable sequence <strong>of</strong> formulas Φ0, Φ1, . . . , Φ, such that every Φi is<br />

an axiom or a formula from <strong>the</strong> set T , or it is derived from <strong>the</strong> preceding formulas by an<br />

inference rule. A formula Φ is a <strong>the</strong>orem (⊢ Φ) if it is deducible from <strong>the</strong> empty set. A set<br />

<strong>of</strong> sentences T is consistent if <strong>the</strong>re is at least one formula from F orC, and at least one<br />

formula from F orP that are not deducible from T . O<strong>the</strong>rwise, T is inconsistent. A set T<br />

is deductively closed if for every Φ ∈ F or, if T ⊢ Φ, <strong>the</strong>n Φ ∈ T .<br />

�<br />

Observe that <strong>the</strong> length <strong>of</strong> <strong>the</strong> inference may be any successor ordinal lesser than <strong>the</strong><br />

first uncountable ordinal ω1. Using a straightforward induction on <strong>the</strong> length <strong>of</strong> <strong>the</strong> inference,<br />

one can easily show that <strong>the</strong> above axiomatization is sound with respect to <strong>the</strong> class<br />

<strong>of</strong> all measurable models.<br />

4 Completeness<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Theorem 1 (Deduction <strong>the</strong>orem) Suppose that T is an arbitrary set <strong>of</strong> formulas and that<br />

Φ, Ψ ∈ F or. Then, T ⊢ Φ → Ψ iff T ∪ {Φ} ⊢ Ψ.<br />

108


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Pro<strong>of</strong>: If T ⊢ Φ → Ψ, <strong>the</strong>n clearly T ∪ {Φ} ⊢ Φ → Ψ, so, by modus ponens (R1),<br />

T ∪ {Φ} ⊢ Ψ. Conversely, let T ∪ {Φ} ⊢ Ψ. As in <strong>the</strong> classical case, we will use <strong>the</strong><br />

induction on <strong>the</strong> length <strong>of</strong> inference to prove that T ⊢ Φ → Ψ. The pro<strong>of</strong> differs from <strong>the</strong><br />

classical only in <strong>the</strong> cases when we apply <strong>the</strong> inifinitary inference rule R3.<br />

Suppose that Ψ is <strong>the</strong> formula φ → f � 0 and that T ⊢ Φ → (φ → f � −n −1 ) for all<br />

n. Since <strong>the</strong> formula (p0 → (p1 → p2)) ↔ ((p0 ∧ p1) → p2), is a tautology, we obtain<br />

T ⊢ (Φ ∧ φ) → f � −n −1 for all n (A1). Now, by R3, T ⊢ (Φ ∧ φ) → f � 0. Hence, by<br />

<strong>the</strong> same tautology, T ⊢ Φ → Ψ.<br />

�<br />

The next technical lemma will be used in <strong>the</strong> construction <strong>of</strong> a maximally consistent<br />

extension <strong>of</strong> a consistent set <strong>of</strong> formulas.<br />

Lemma 2 Suppose that T is a consistent set <strong>of</strong> formulas. If T ∪ {φ → f � 0} is inconsistent,<br />

<strong>the</strong>n <strong>the</strong>re is a positive integer n such that T ∪ {φ → f < −n −1 } is consistent.<br />

Pro<strong>of</strong>: The pro<strong>of</strong> is based on <strong>the</strong> reductio ad absurdum argument. Thus, let us suppose<br />

that T ∪ {φ → f < −n −1 } is inconsistent for all n. Due to Deduction <strong>the</strong>orem, we can<br />

conclude that<br />

T ⊢ φ → f � −n −1<br />

for all n. By R3, T ⊢ φ → f � 0, so T is inconsistent; a contradiction. �<br />

Definition 5 Suppose that T is a consistent set <strong>of</strong> formulas and that F orP = {φi | i =<br />

0, 1, 2, 3, . . .}. We define a completion T ∗ <strong>of</strong> T recursively as follows:<br />

1. T0 = T ∪ {α ∈ F orC | T ⊢ α} ∪ {P (α) = 1 | T ⊢ α}.<br />

2. If Ti ∪ {φi} is consistent, <strong>the</strong>n Ti+1 = Ti ∪ {φi}.<br />

3. If Ti ∪ {φi} is inconsistent, <strong>the</strong>n:<br />

(a) If φi has <strong>the</strong> form ψ → f � 0, <strong>the</strong>n Ti+1 = Ti ∪ {ψ → f < −n −1 }, where n<br />

is a positive integer such that Ti+1 is consistent. The existence <strong>of</strong> such an n is<br />

provided by Lemma 2.<br />

(b) O<strong>the</strong>rwise, Ti+1 = Ti. �<br />

Obviously, each Ti is consistent. In <strong>the</strong> next <strong>the</strong>orem we will prove that T ∗ is deductively<br />

closed, consistent and maximal with respect to F orP .<br />

Theorem 3 Suppose that T is a consistent set <strong>of</strong> formulas and that T ∗ is constructed as<br />

above. Then:<br />

1. T ∗ is deductively closed, id est, T ∗ ⊢ Φ implies Φ ∈ T ∗ .<br />

2. There is φ ∈ F orP such that φ /∈ T ∗ .<br />

3. For each φ ∈ F orP , ei<strong>the</strong>r φ ∈ T ∗ , or ¬φ ∈ T ∗ .<br />

109


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Pro<strong>of</strong>: We will prove only <strong>the</strong> first clause, since <strong>the</strong> remaining clauses can be proved<br />

in <strong>the</strong> same way as in <strong>the</strong> classical case. In order to do so, it is sufficient to prove <strong>the</strong><br />

following four claims:<br />

(i) Each instance <strong>of</strong> any axiom is in T ∗ .<br />

(ii) If Φ ∈ T ∗ and Φ → Ψ ∈ T ∗ , <strong>the</strong>n Ψ ∈ T ∗ .<br />

(iii) If α ∈ T ∗ , <strong>the</strong>n P (α) = 1 ∈ T ∗ .<br />

(iv) If {φ → f � −n −1 | n = 1, 2, 3, . . .} is a subset <strong>of</strong> T ∗ , <strong>the</strong>n φ → f � 0 ∈ T ∗ .<br />

(i): If Φ ∈ F orC, <strong>the</strong>n Φ ∈ T0. O<strong>the</strong>rwise, <strong>the</strong>re is a nonnegative integer i such that<br />

Φ = φi. Since ⊢ φi, Ti ⊢ φi as well, so φi ∈ Ti+1.<br />

(ii): If Φ, Φ → Ψ ∈ F orC, <strong>the</strong>n Ψ ∈ T0. O<strong>the</strong>rwise, let Φ = φi, Ψ = φj, and Φ →<br />

Ψ = φk. Then, Ψ is a deductive consequence <strong>of</strong> each Tl, where l � max(i, k) + 1.<br />

Let ¬Ψ = φm. If φm ∈ Tm+1, <strong>the</strong>n ¬Ψ is a deductive consequence <strong>of</strong> each Tn, where<br />

n � m + 1. So, for every n � max(i, k, m) + 1, Tn ⊢ Ψ ∧ ¬Ψ, a contradiction.<br />

Thus, ¬Ψ �∈ T ∗ . On <strong>the</strong> o<strong>the</strong>r hand, if also Ψ �∈ T ∗ , we have that Tn ∪ {Ψ} ⊢ ⊥, and<br />

Tn ∪ {¬Ψ} ⊢ ⊥, for n � max(j, m) + 1, a contradiction with <strong>the</strong> consistency <strong>of</strong> Tn.<br />

Thus, Ψ ∈ T ∗ .<br />

(iii): If α ∈ T ∗ , <strong>the</strong>n α ∈ T0, so P (α) = 1 ∈ T0.<br />

(iv): Suppose that {φ → P (α) � −n −1 | n = 0, 1, 2, . . .} is a subset <strong>of</strong> T ∗ . We want<br />

to prove that φ → P (α) � 0 ∈ T ∗ . The pro<strong>of</strong> uses reductio ad absurdum argument. So,<br />

let φ → P (α) � 0 = φi and let us suppose that Ti ∪ {φi} is inconsistent. By 3.(a) <strong>of</strong><br />

Definition 5, <strong>the</strong>re is a positive integer n such that<br />

Ti+1 = Ti ∪ {φ → P (α) < −n −1 }<br />

and Ti+1 is consistent. Then, for all sufficiently large k, Tk ⊢ φ → P (α) < −n −1<br />

and Tk ⊢ φ → P (α) � −n −1 , so Tk ⊢ φ → ψ for all ψ ∈ F orP . In particular,<br />

Tk ⊢ φ → P (α) � 0, i.e., Tk ⊢ φi for all sufficiently large k. But, φi /∈ T ∗ , so φi is<br />

inconsistent with all Tk, k � i. It follows that each Tk is inconsistent for sufficiently large<br />

k, a contradiction.<br />

Thus, Ti ∪ {φi} is consistent, so φ → P (α) � 0 ∈ Ti+1.<br />

�<br />

For <strong>the</strong> given completion T ∗ , we define a canonical model M ∗ as follows:<br />

• W is <strong>the</strong> set <strong>of</strong> all functions w : F orC −→ {0, 1} with <strong>the</strong> following properties:<br />

– w is compatible with ¬ and ∧.<br />

– w(α) = 1 for each α ∈ T ∗ .<br />

• v : F orC × W −→ {0, 1} is defined by v(α, w) = 1 iff w(α) = 1.<br />

• H = {[α] | α ∈ F orC}.<br />

• µ : H −→ [0, 1] is defined by µ([α]) = sup{s ∈ [0, 1] ∩ Q | T ∗ ⊢ P (α) � s}.<br />

110


Lemma 4 M ∗ is a measurable model.<br />

Pro<strong>of</strong>: We need to prove that H is an algebra <strong>of</strong> sets and that µ is a finitely additive<br />

probability measure. It is easy to see that H is an algebra <strong>of</strong> sets, since [α]∩[β] = [α ∧β],<br />

[α] ∪ [β] = [α ∨ β] and H \ [α] = [¬α]. Concerning µ, it is sufficient to prove that A3, A4<br />

and A6 are satisfied in M. Here we will only give <strong>the</strong> sketch <strong>of</strong> <strong>the</strong> pro<strong>of</strong> for A6, which<br />

provides finite additivity <strong>of</strong> µ.<br />

Let µ([α]) = a, µ([β]) = b and µ([α ∧ β]) = c. We claim that<br />

µ([α ∨ β]) = a + b − c.<br />

This is an immediate consequence <strong>of</strong> <strong>the</strong> following facts:<br />

• µ([γ]) = sup{s ∈ Q | T ∗ ⊢ P (γ) � s}, γ ∈ F orC.<br />

• The real function F (x, y, z) = x + y − z is continuous.<br />

• For each r, s ∈ Q, T ∗ ⊢ r � s iff r � s.<br />

• Q 3 is dense in R 3 .<br />

Namely, for each positive ε, <strong>the</strong>re are positive δ1, δ2, δ3 such that for all 〈r1, r2, r3〉 ∈<br />

((a − δ1, a] × (b − δ2, b] × (c − δ3, c]) ∩ Q 3 ,<br />

In particular, for each s ′ , s ′′ ∈ Q such that<br />

r1 + r2 − r3 ∈ (a + b − c − ε, a + b − c + ε).<br />

a + b − c − ε < s ′ � r1 + r2 − r3 � s ′′ < a + b − c + ε,<br />

using <strong>the</strong> axioms about rational numbers, we have that<br />

T ∗ ⊢ s ′ � r1 + r2 − r3 � s ′′ ,<br />

i.e., µ([α ∨ β]) = µ([α]) + µ([β]) − µ([α ∧ β]). �<br />

Theorem 5 (Strong completeness <strong>the</strong>orem) Every consistent set <strong>of</strong> formulas has a measurable<br />

model.<br />

Pro<strong>of</strong>: Let T be a consistent set <strong>of</strong> formulas. We can extend it to a maximally consistent<br />

set T ∗ , and define a canonical model M ∗ , as above. By induction on <strong>the</strong> complexity<br />

<strong>of</strong> <strong>the</strong> formulas we can prove that M ∗ |= Φ iff Φ ∈ T ∗ .<br />

To begin <strong>the</strong> induction, let Φ = α ∈ F orC. If α ∈ T ∗ , i.e., T ∗ ⊢ α, <strong>the</strong>n by definition<br />

<strong>of</strong> M ∗ , M ∗ |= α. Conversely, if M ∗ |= α, by <strong>the</strong> completeness <strong>of</strong> classical propositional<br />

logic, T ∗ ⊢ α, and α ∈ T ∗ .<br />

Let us suppose that f � 0 ∈ T ∗ . Then, using <strong>the</strong> axioms for ordered commutative<br />

rings, we can prove that<br />

T ∗ ⊢ f = s +<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

m�<br />

si · CP (αi, βi) and T ∗ ⊢ s +<br />

i=1<br />

111<br />

m�<br />

si · CP (αi, βi) � 0,<br />

i=1


for some s, si ∈ Q and some αi, βi ∈ F orC such that T ∗ ⊢ P (βi) > 0. Let ai = µ([αj])<br />

and bi = µ([βi]). It remains to prove that<br />

s +<br />

m�<br />

i=1<br />

si · ai · b −1<br />

i<br />

� 0. (1)<br />

Similarly as in <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Lemma 4, we can show that (1) is an immediate consequence<br />

<strong>of</strong> <strong>the</strong> following facts:<br />

• µ([γ]) = sup{s ∈ Q | T ∗ ⊢ P (γ) � s}, γ ∈ F orC.<br />

• The real function F (x1, . . . , xm, y1, . . . , ym) = s + n�<br />

• For each r, s ∈ Q, T ∗ ⊢ r � s iff r � s.<br />

• Q k is dense in R k .<br />

i=1<br />

si · xi · y −1<br />

i is continuous.<br />

For <strong>the</strong> o<strong>the</strong>r direction, let M ∗ |= f � 0. If f � 0 /∈ T ∗ , by construction <strong>of</strong> T ∗ ,<br />

<strong>the</strong>re is a positive integer n such that f < −n−1 ∈ T ∗ . Reasoning as above, we have that<br />

M ∗<br />

f < 0, which is a contradiction. So, f � 0 ∈ T ∗ .<br />

Let Φ = ¬φ ∈ F orP . Then M ∗ |= ¬φ iff M ∗ �|= φ iff φ �∈ T ∗ iff (by Theorem 3)<br />

¬φ ∈ T ∗ .<br />

Finally, let Φ = φ ∧ ψ ∈ F orP . M ∗ |= φ ∧ ψ iff M ∗ |= φ and M ∗ |= ψ iff φ, ψ ∈ T ∗<br />

iff (by Theorem 3) φ ∧ ψ ∈ T ∗ . �<br />

5 Decidability<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Theorem 6 Satisfiability <strong>of</strong> probabilistic formulas is decidable.<br />

Pro<strong>of</strong>: Up to equivalence, each probabilistic formula is a finite disjunction <strong>of</strong> finite<br />

conjunctions <strong>of</strong> literals, where literal is ei<strong>the</strong>r a basic probabilistic formula, or a negation<br />

<strong>of</strong> a basic probabilistic formula. Thus, it is sufficient to show <strong>the</strong> decidability <strong>of</strong> <strong>the</strong><br />

satisfiability problem for <strong>the</strong> formulas <strong>of</strong> <strong>the</strong> form<br />

�<br />

fi � 0 ∧ �<br />

gj < 0. (2)<br />

i<br />

j<br />

Suppose that p1, . . . , pn are all <strong>of</strong> <strong>the</strong> propositional formulas appearing in (2). Let A1, . . . , A2 n<br />

be all <strong>of</strong> <strong>the</strong> formulas <strong>of</strong> <strong>the</strong> form<br />

±p1 ∧ · · · ∧ ±pn,<br />

where +p = p and −p = ¬p. Clearly, Ai are pairwise disjoint and form a partition <strong>of</strong> ⊤.<br />

Fur<strong>the</strong>rmore, for each α appearing in (2) <strong>the</strong>re is a unique set Iα ⊆ {1, . . . , 2n } such that<br />

α ↔ �<br />

112<br />

i∈Iα<br />

Ai


is a tautology. Now we can equivalently rewrite (2) as<br />

� � �<br />

sii ′CP (<br />

i<br />

i ′<br />

k∈Iα ii ′<br />

Ak, �<br />

l∈Iβ ii ′<br />

Al) � 0 ∧ � � �<br />

sjj ′CP (<br />

Let σi(x1, . . . , x2n), δj(x1, . . . , x2n) be <strong>the</strong> formulas<br />

�<br />

and �<br />

j ′<br />

i ′<br />

�<br />

sii ′ · (<br />

k∈Iα ii ′<br />

�<br />

sjj ′ · (<br />

k∈Iα jj ′<br />

j<br />

j ′<br />

xk) · ( �<br />

l∈Iβ ii ′<br />

xk) · ( �<br />

l∈Iβ jj ′<br />

k∈Iα jj ′<br />

xl) −1 � 0<br />

xl) −1 < 0.<br />

Then, it is easy to see that (2) is satisfiable iff <strong>the</strong> sentence<br />

∃x1 . . . ∃x2n(� σi(¯x) ∧ �<br />

δj(¯x))<br />

i<br />

j<br />

Ak, �<br />

l∈Iβ jj ′<br />

Al) < 0.<br />

is satisfied in <strong>the</strong> ordered field <strong>of</strong> reals. Since <strong>the</strong> latter question is decidable, we have our<br />

claim. �<br />

It should be noted that this logic can be embedded into <strong>the</strong> logic described in (Fagin<br />

et al., 1990), which has a PSPACE containment for <strong>the</strong> decision procedure. Also, <strong>the</strong><br />

rewriting <strong>of</strong> formulas from our logic into that logic can be accomplished in linear time:<br />

CP (α, β) is equavivalent to<br />

w(α ∧ β)<br />

w(β)<br />

which is representable in (Fagin et al., 1990).<br />

Thus, we conclude that our logic is also decidable in PSPACE.<br />

6 Conclusion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In this paper we introduced a sound and strongly-complete axiomatic system for <strong>the</strong> probabilistic<br />

logic with <strong>the</strong> conditional probability operator CP , which allows for linear combinations<br />

and comparative statements. As it was noticed in (van der Hoek, 1997), it is not<br />

possible to give a finitary strongly complete axiomatization for such a system. In our case<br />

<strong>the</strong> strong completeness was made possible by adding an infinitary rule <strong>of</strong> inference.<br />

The obtained formalism is quite expressive and allows for <strong>the</strong> representation <strong>of</strong> uncertain<br />

knowledge, where uncertainty is modeled by probability formulas. For instance,<br />

conditional statement <strong>of</strong> <strong>the</strong> form “<strong>the</strong> sum <strong>of</strong> probabilities <strong>of</strong> α given β and γ given δ is<br />

at least 0.95” can be written as<br />

CP (α, β) + CP (γ, δ) � 0.95.<br />

A similar approach can be applied to de Finetti style conditional probabilities. Future<br />

research will also consider a possibility <strong>of</strong> dealing with probabilistic first-order formulas.<br />

1<strong>13</strong>


References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Fagin, Halpern and Megiddo (1990). A logic for reasoning about probabilities, Information<br />

and Computation 87(1/2): 78–128.<br />

Lukasiewicz, T. (2002). Probabilistic default reasoning with conditional constraints, Annals<br />

<strong>of</strong> Ma<strong>the</strong>matics and Artificial Intelligence 34: 35–88.<br />

Ognjanović, Z., Marković, Z. and Raˇsković, M. (2005). Completeness <strong>the</strong>orem for a<br />

logic with imprecise and conditional probabilities, Publications de l’institute ma<strong>the</strong>matique,<br />

nouvelle serie 78(92): 35–49.<br />

Ognjanović, Z., Perović, A. and Raˇsković, M. (2008). Logics with <strong>the</strong> qualitative probability<br />

operator, Logic Journal <strong>of</strong> IGPL 16(2): 105–120.<br />

Ognjanović, Z. and Raˇsković, M. (1996). A logic with higher order probabilities, Publication<br />

de l‘Institut Math. (NS) 60(74): 1–4.<br />

Ognjanović, Z. and Raˇsković, M. (1999). Some probability logics with new types <strong>of</strong><br />

probability operators, Journal <strong>of</strong> Logic and Computation 9(2): 181–195.<br />

Ognjanović, Z. and Raˇsković, M. (2000). Some first-order probability logics, Theoretical<br />

Computer Science 247(1-2): 191–212.<br />

Raˇsković, M., Ognjanović, Z. and Marković, Z. (2004). A logic with conditional probabilities,<br />

in J. Leite and J. Alferes (eds), 9th European Conference Jelia’04 Logics in<br />

Artificial Intelligence, Vol. 3229, Springer-Verlag, pp. 226–238.<br />

van der Hoek, W. (1997). Some considerations on <strong>the</strong> logic pfd: a logic combining<br />

modality and probability, Journal <strong>of</strong> Applied Non-Classical Logics 7(3): 287–307.<br />

114


A PROOF-THEORETIC APPROACH TO FRENCH PRONOMINAL CLITICS ◦<br />

Scott Martin<br />

The Ohio State University<br />

Abstract. This paper sketches an account <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> French pronominal clitics in<br />

CVG, a pro<strong>of</strong>-<strong>the</strong>oretic categorial grammar formalism. The approach shown here differs<br />

from most categorial analyses <strong>of</strong> French clitics in that it treats clitics as noun phrases ra<strong>the</strong>r<br />

than as functions that operate on under-saturated verb phrases. Basic French cliticization,<br />

clitics in infinitival constructions, and both auxiliary and non-auxiliary clitic climbing are<br />

analyzed.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Cliticization in French is a set <strong>of</strong> phenomena in which pronominal complements to a<br />

verbal host are systematically realized as affixes. Linguistic generalizations about <strong>the</strong>se<br />

phenomena have been structured using several different frameworks, with Sag & Miller’s<br />

(1997) HPSG treatment <strong>of</strong> French clitics as morphological affixes being <strong>the</strong> most comprehensive<br />

and successful. Categorial accounts <strong>of</strong> cliticization phenomena, among <strong>the</strong>m<br />

Kraak (1998) for French and Morrill & Gavarro (1992) for Catalan, have largely analyzed<br />

clitics as functors over under-saturated verb phrases. Stabler (2001) and Amblard (2006)<br />

are two recent approaches to French clitics in <strong>the</strong> Minimalist Grammar formalism, both<br />

<strong>of</strong> which treat <strong>the</strong>m as syntactic elements with certain feature sets.<br />

In this paper, I give a preliminary account <strong>of</strong> some <strong>of</strong> <strong>the</strong> phenomena involving French<br />

clitics using Convergent Grammar (CVG), a categorial grammar framework that uses natural<br />

deduction with hypo<strong>the</strong>tical pro<strong>of</strong>. 1 This treatment is limited to a subset <strong>of</strong> what<br />

Bonami & Boye (2005) call French Pronominal Clitics (FPCs), specifically, those FPCs<br />

that appear as verbal complements. From Kraak (1998) I borrow <strong>the</strong> idea <strong>of</strong> a specialized<br />

combinatory mode for FPC attachment to a verbal host (analogous to her •ca) that is<br />

“stronger” than normal Complement Merge and reflects <strong>the</strong> status <strong>of</strong> clitic attachment as<br />

a process more morphological than syntactic. In contrast to Kraak’s and much o<strong>the</strong>r work<br />

on FPCs in categorial frameworks, however, <strong>the</strong> account sketched here partly follows <strong>the</strong><br />

work <strong>of</strong> Stabler and Amblard in analyzing FPCs not as functors over verb phrases but<br />

as sets <strong>of</strong> morphological features that also represent a syntactic and semantic argument,<br />

much like ordinary NPs.<br />

Drawing on Sag & Miller’s work on French clitics as inspiration, <strong>the</strong> analysis reflected<br />

here relies mainly on properly-structured lexical axioms to describe <strong>the</strong> behavior <strong>of</strong> FPCs.<br />

Basic instances <strong>of</strong> cliticization are considered as well as more complicated situations,<br />

such as argument composition and <strong>the</strong> interaction <strong>of</strong> FPCs with infinitivals. However,<br />

this paper does not take a firm stance on <strong>the</strong> question <strong>of</strong> whe<strong>the</strong>r cliticization phenomena<br />

◦ For many helpful comments and suggestions on this and earlier drafts <strong>of</strong> this paper, I am grateful to<br />

Yusuke Kubota, Carl Pollard, Chris Worth, and three anonymous <strong>ESSLLI</strong> reviewers.<br />

1 Pollard (2007) provides an introduction to CVG.<br />

115


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

should be considered syntactic or morphological, since CVG’s tectogrammatical terms<br />

represent syntactic dependency relations and do not necessarily correspond exactly to<br />

surface word order or prosodic form.<br />

2 Pronominal Complement Clitics in French<br />

French verbs take canonical complements in a manner that resembles complement selection<br />

for <strong>the</strong>ir English analogs: <strong>the</strong> verbal head combines with its complement(s) to <strong>the</strong><br />

right and with its subject to <strong>the</strong> left to form a finite or infinitive clause. When certain<br />

complements are pronominalized, however, <strong>the</strong>y can optionally appear to <strong>the</strong> immediate<br />

left <strong>of</strong> <strong>the</strong> verb in a variant form as proclitics. The following data, replicated in part from<br />

(1) in Sag & Miller (1997), show <strong>the</strong> verb voir ‘to see’ with its complement realized both<br />

canonically and as a proclitic: 2<br />

(1) a. Marie voit Jean. ‘Marie sees Jean.’<br />

b. Marie voit lui. ‘Marie sees him.’ [boldface = prosodic stress]<br />

c. Marie le voit.<br />

Marie ACC.3S sees<br />

‘Marie sees him.’<br />

The cliticized configuration is given in (1c), with <strong>the</strong> complement in its clitic form (le)<br />

instead <strong>of</strong> <strong>the</strong> canonical one (here Jean, or lui with appropriate stress).<br />

Among <strong>the</strong> o<strong>the</strong>r distinctive characteristics <strong>of</strong> complement FPCs noted by Kraak (1998),<br />

<strong>the</strong> ones that bear most on <strong>the</strong> account given here are that:<br />

• as verbal complements, <strong>the</strong>y do not co-occur with <strong>the</strong>ir non-pronominal or noncliticized<br />

versions (exemplified in (1)).<br />

• <strong>the</strong>y do not serve as <strong>the</strong> complement to bare past participles. This fact gives rise to<br />

an instance <strong>of</strong> <strong>the</strong> phenomenon known as “clitic climbing”:<br />

(2) a. *Marie a le vu. ‘Marie saw him.’<br />

b. Marie l’a vu.<br />

Marie ACC.3S has seen<br />

‘Marie saw him.’<br />

Here, (2a) is unacceptable because although <strong>the</strong> clitic le is <strong>the</strong> accusative complement<br />

<strong>of</strong> vu, it must be realized on <strong>the</strong> tense auxiliary form a as in (2b). However,<br />

causatives and certain verbs <strong>of</strong> perception exhibit different behavior. For <strong>the</strong>se<br />

verbs, it is possible for some <strong>of</strong> <strong>the</strong>ir arguments to be realized as clitics on <strong>the</strong><br />

upstairs verb and some on <strong>the</strong> downstairs one:<br />

(3) Jean le fera la réparer.<br />

Jean ACC.3S make.FUT ACC.3FS repair<br />

‘Jean will make him repair it.’<br />

(From Abeille, Godard and Miller(1995, example (2a)).)<br />

2 I adopt Bonami & Boye’s (2005) scheme here for annotating morphological features.<br />

116


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

• No syntactic material except ano<strong>the</strong>r clitic can intervene between an FPC and its<br />

host verb. This fact distinguishes cliticized complements from <strong>the</strong>ir canonical counterparts<br />

in which certain adverbials can occur between a verb and its complements:<br />

(4) a. Marie l’a souvent dit à lui.<br />

Marie ACC.3S has <strong>of</strong>ten said to him<br />

‘Marie has <strong>of</strong>ten said it to him.’<br />

b. Marie l’a dit souvent à lui.<br />

Marie ACC.3S has said <strong>of</strong>ten<br />

‘Marie has <strong>of</strong>ten said it to him.’<br />

to him<br />

c. Marie le lui a souvent dit.<br />

Marie ACC.3S DAT.3S has <strong>of</strong>ten<br />

‘Marie has <strong>of</strong>ten said it to him.’<br />

said<br />

d. *Marie le lui souvent a dit.<br />

Marie ACC.3S DAT.3S <strong>of</strong>ten has said<br />

‘Marie has <strong>of</strong>ten said it to him.’<br />

e. *Marie le souvent lui a dit.<br />

Marie ACC.3S <strong>of</strong>ten DAT.3S has said<br />

‘Marie has <strong>of</strong>ten said it to him.’<br />

(Example (4d) is from Kraak (1998, (7d)).) Here, (4d) and (4e) show <strong>the</strong> disallowed<br />

intervention <strong>of</strong> <strong>the</strong> adverbial souvent ‘<strong>of</strong>ten’ between an FPC and its host verb,<br />

while (4b) demonstrates <strong>the</strong> allowable intervention <strong>of</strong> souvent in <strong>the</strong> canonical form.<br />

• <strong>the</strong>y are normally realized on <strong>the</strong> verb <strong>the</strong>y complement, illustrated here with an<br />

embedded infinitival:<br />

(5) a. *Marie le veut voir. ‘Marie wants to see him.’<br />

b. Marie veut le voir.<br />

Marie wants ACC.3S to see<br />

‘Marie wants to see him.’<br />

The cliticized accusative le here is <strong>the</strong> complement <strong>of</strong> <strong>the</strong> infinitive voir, and does<br />

not to attach to <strong>the</strong> upstairs verb veut.<br />

These are <strong>the</strong> most basic facts about cliticization <strong>of</strong> declarative verbal complements in<br />

French. FPCs also occur in passive constructions and in constructions like those in (6):<br />

(6) a. i. Pierre reste fidèle à Jean.<br />

‘Pierre remains faithful to Jean.’<br />

ii. Pierre lui reste fidèle.<br />

b.<br />

Pierre DAT.3S remains faithful<br />

‘Pierre remains faithful to him.’<br />

i. Marie connaît la fin de l’histoire.<br />

‘Marie knows <strong>the</strong> end <strong>of</strong> <strong>the</strong> story.’<br />

117


ii. Marie en connaît la fin.<br />

Marie GEN.3S knows <strong>the</strong> end<br />

‘Marie knows <strong>the</strong> end <strong>of</strong> it.’<br />

(Both are from Sag & Miller (1997, example 3).) Constructions involving FPCs like those<br />

in (6) are similar to <strong>the</strong> clitic climbing that occurs with auxiliaries like avoir (as shown in<br />

(2)).<br />

In §3, I sketch an analysis <strong>of</strong> <strong>the</strong> basic facts about cliticization in some <strong>of</strong> <strong>the</strong> situations<br />

described above.<br />

3 Accounting for <strong>the</strong> Data<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Sag & Miller (1997) give extensive argumentation for considering clitics as morphological<br />

ra<strong>the</strong>r than syntactic in nature. Their account constrains <strong>the</strong> inflectional paradigm<br />

<strong>of</strong> French verbs, treating clitics as pronominal affixes that reduce <strong>the</strong> valence requirements<br />

<strong>of</strong> a given verb. In examining French clitics from a deductive perspective, Kraak<br />

(1998) instead describes cliticization as occurring on a “sliding scale” between morphology<br />

(affix-host attachment) and syntax (complement selection). The view presented here<br />

is more in line with Kraaks in that it uses CVG tectogrammatical pro<strong>of</strong> terms to describe<br />

<strong>the</strong> combinatoric potential <strong>of</strong> functions and arguments.<br />

However, this account diverges from Kraak’s and most o<strong>the</strong>r categorial grammar treatments<br />

in that it construes FPCs as regular pronominal NPs, instead <strong>of</strong> formulating <strong>the</strong>m<br />

as functors over under-saturated verb phrases. This approach allows <strong>the</strong> semantics to be<br />

nearly identical between canonical and cliticized forms by specifying a separate mode <strong>of</strong><br />

complement selection specifically for clitics.<br />

3.1 FPCs as a Local Dependency<br />

Because cliticization differs from <strong>the</strong> canonical form <strong>of</strong> complement selection (⊸C) in<br />

various ways, a separate implication mode, called ⊸PC (for proclitic), is used. As a local<br />

implication mode, it has modus ponens (elimination) but not hypo<strong>the</strong>tical pro<strong>of</strong> (introduction),<br />

which is used in CVG for non-local extractions. The elimination (or “merge”)<br />

rule for ⊸PC is as follows: 3<br />

Proclitic Merge<br />

If Γ ⊢ a, x : A, C ⊣ ∆<br />

and Γ ′ ⊢ f, v : A ⊸PC B, C ⊃ D ⊣ ∆ ′<br />

<strong>the</strong>n Γ, Γ ′ ⊢ ( PC a f), v(x) : B, D ⊣ ∆, ∆ ′<br />

This rule formalizes <strong>the</strong> affixation <strong>of</strong> clitics to a verbal host, taking into account both<br />

<strong>the</strong> syntactic and semantic pro<strong>of</strong> terms. This new ⊸PC implication mode allows lexical<br />

axioms to specify <strong>the</strong> cliticized complement mode <strong>of</strong> combination as opposed to<br />

<strong>the</strong> canonical one, and is central to <strong>the</strong> account <strong>of</strong> clitic behavior sketched here. As a<br />

mnemonic meant to reflect French word order in derivational history, function application<br />

for ⊸PC writes an FPC to <strong>the</strong> left <strong>of</strong> its host. This rule also states that hypo<strong>the</strong>ses present<br />

3 A CVG sign is a triple made up <strong>of</strong> <strong>the</strong> prosodic/phonological form, syntactic tectogrammatical term,<br />

and semantic content. For brevity, I omit <strong>the</strong> prosodic element and only include <strong>the</strong> syntactic tecto-term and<br />

semantic denotation.<br />

118


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

in both <strong>the</strong> syntactic context (to <strong>the</strong> left <strong>of</strong> ⊢) and <strong>the</strong> semantic co-context (to <strong>the</strong> right<br />

<strong>of</strong> ⊣) <strong>of</strong> both premises are propagated into <strong>the</strong> conclusion. This ensures that <strong>the</strong> application<br />

<strong>of</strong> this rule does not have any effect on any non-local extractions (filler-gap path<br />

information), stored quantifiers, or anaphoric pronouns.<br />

With this new implication mode and merge rule, an account <strong>of</strong> FPC behavior as demonstrated<br />

in §2 is possible that requires no o<strong>the</strong>r machinery than <strong>the</strong> CVG merge rules described<br />

in Pollard (2007). All that remains is to correctly specify <strong>the</strong> necessary lexical<br />

axioms. First are <strong>the</strong> canonical forms <strong>of</strong> <strong>the</strong> verbs and complements: 4<br />

⊢ Marie, marie ′ : Nom, Ind<br />

⊢ Jean, jean ′ : Acc, Ind<br />

⊢ lui1 , a : Acc, Ind<br />

⊢ voit1 , λyλxsee ′ (x, y) : (Acc \ Pcl) ⊸C (Nom ⊸SU Fin), Ind ⊃ (Ind ⊃ Prop)<br />

The new type Pcl is assigned to proclitics in order to differentiate <strong>the</strong>m from <strong>the</strong>ir canonical<br />

counterparts. Here, voit selects a complement <strong>of</strong> type Acc \ Pcl to indicate that it<br />

does not combine with proclitics in canonical complement position: <strong>the</strong> set complement<br />

specifies all inhabitants <strong>of</strong> type Acc except those that inhabit Pcl. Next, <strong>the</strong> lexicon is<br />

extended to reflect <strong>the</strong> syntactic/morphological features <strong>of</strong> le and <strong>the</strong> cliticization mode<br />

<strong>of</strong> complement selection for voir:<br />

⊢ le, b : Acc ∩ 3Sg ∩ Pcl, Ind<br />

⊢ voit2 , λyλxsee ′ (x, y) : (Acc ∩ Pcl) ⊸PC (Nom ⊸SU Fin),<br />

Ind ⊃ (Ind ⊃ Prop)<br />

These axioms allow <strong>the</strong> following pro<strong>of</strong> terms for <strong>the</strong> data in (1): 5<br />

(7) a. ⊢ ( SU Marie (voit1 Jean C )), see ′ (marie ′ , jean ′ ) : Fin, Prop<br />

b. ⊢ ( SU Marie (voit1 lui1 C )), see ′ (marie ′ , a) : Fin, Prop<br />

c. ⊢ ( SU Marie ( PC le voit2 )), see ′ (marie ′ , b) : Fin, Prop<br />

Aside from <strong>the</strong> different implication mode, <strong>the</strong> only difference between <strong>the</strong> canonical<br />

form <strong>of</strong> voit (voit1 ) and <strong>the</strong> cliticized variant (voit2 ) is that <strong>the</strong> argument to voit2 must<br />

be <strong>of</strong> <strong>the</strong> intersective type Acc ∩ Pcl. The type 3Sg represents <strong>the</strong> argument’s agreement<br />

features. So stated, this selectional restriction ensures that voit2 can only combine in<br />

cliticized mode with accusative complements that are also proclitics, as desired. It is<br />

important to note that not only are <strong>the</strong> semantics <strong>of</strong> both variants <strong>of</strong> voit identical, but<br />

both cliticized and canonical complements are <strong>of</strong> <strong>the</strong> same semantic type (Ind) as well.<br />

4 The basic tectogrammatical types used here are Nom for nominative NPs, Acc for accusative NPs, and<br />

Fin for finite clauses. The hyperintensional types Ind, <strong>the</strong> type <strong>of</strong> individual concepts; and Prop, <strong>the</strong> type<br />

<strong>of</strong> propositions, are <strong>the</strong> basic semantic types. In addition to <strong>the</strong> new combinatory mode ⊸PC, implicative<br />

tectogrammatical types are constructed using ⊸SU and ⊸C, which invoke Subject Merge and Complement<br />

Merge, respectively.<br />

5 For clarity, <strong>the</strong> pro<strong>of</strong> terms given in this account show <strong>the</strong> semantics but not <strong>the</strong> co-context as quantification,<br />

wh-phrases, and anaphoric binding are not discussed here.<br />

119


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

3.2 “Clitic Climbing” and Tense Auxiliaries<br />

The axioms for tense auxiliaries are structured so that <strong>the</strong>y take <strong>the</strong> complements <strong>of</strong> <strong>the</strong>ir<br />

verbal complement. Past-participial verbs in turn need to be specified in such a way that<br />

<strong>the</strong> proclitic merge rule does not apply to <strong>the</strong>m. This approach is reminiscent <strong>of</strong> <strong>the</strong><br />

argument composition approach employed by Sag & Miller (1997) and Abeille, Godard<br />

& Sag(1998). The axioms necessary to describe <strong>the</strong> “climbing” behavior in (2) are <strong>the</strong><br />

following:<br />

⊢ aA, λvv<br />

: ((A \ Pcl) ⊸C (Nom ⊸SU Psp)) ⊸C ((A ∩ Pcl) ⊸PC (Nom ⊸SU Fin)),<br />

(Ind ⊃ (Ind ⊃ Prop)) ⊃ (Ind ⊃ (Ind ⊃ Prop))<br />

⊢ vu, λyλxsee ′ (x, y) : (Acc \ Pcl) ⊸C (Nom ⊸SU Psp), Ind ⊃ (Ind ⊃ Prop)<br />

The tense auxiliary form a (from avoir) is schematically defined to combine with a verb<br />

in past participial form missing its complement, <strong>of</strong> polymorphic type A, to yield a finite<br />

sentence missing both that same A complement and a nominative subject. In this way, <strong>the</strong><br />

A-type complement is “passed along” from <strong>the</strong> past participle to <strong>the</strong> tense auxiliary, whose<br />

semantics are just to apply <strong>the</strong> identity function to <strong>the</strong> meaning <strong>of</strong> its past-participial<br />

complement.<br />

A pro<strong>of</strong> term that correctly predicts <strong>the</strong> allowed form <strong>of</strong> (2b) is <strong>the</strong>n possible: 6<br />

(8) ⊢ ( SU Marie ( PC le (aAcc vu C ))), see ′ (marie ′ , b) : Fin, Prop<br />

No pro<strong>of</strong> is available for <strong>the</strong> disallowed form in (2a) because <strong>the</strong> lexical axiom vu only<br />

uses <strong>the</strong> ⊸C mode <strong>of</strong> implication, and as a result proclitics can not directly combine with<br />

it.<br />

3.3 FPCs in Infinitival Constructions<br />

Ensuring that cliticized complements <strong>of</strong> infinitival complements stay on <strong>the</strong> infinitiveform<br />

verb, as depicted in (5), can also be accomplished with well formulated lexical<br />

axioms. This ends up being simply a matter <strong>of</strong> making sure that infinitive-form verbs can<br />

take proclitic complements and <strong>the</strong> verbs that select infinitivals can not:<br />

⊢ voir1 , λyλxsee ′ (x, y)<br />

: (Acc ∩ Pcl) ⊸PC (Nom ⊸SU Inf), Ind ⊃ (Ind ⊃ Prop)<br />

⊢ veut, λPλxwant ′ (x, P (x))<br />

: (Nom ⊸SU Inf) ⊸C (Nom ⊸SU Fin), (Ind ⊃ Prop) ⊃ (Ind ⊃ Prop)<br />

The semantic representation <strong>of</strong> veut given here is <strong>the</strong> “equi” version <strong>of</strong> <strong>the</strong> denotation<br />

λP∈Propλx∈Indwant ′ (x, P ) that might be used where veut takes a sentential complement,<br />

as in Marie veut qu’elle gagne ‘Marie wants that she wins’.<br />

With <strong>the</strong> lexicon so extended, a pro<strong>of</strong> term for (5b) can be derived:<br />

(9) ⊢ ( SU Marie (veut ( PC le voir) C )), want ′ (marie ′ , see ′ (marie ′ , b)) : Fin, Prop<br />

A derivation for (5a) is not possible because veut does not employ <strong>the</strong> ⊸PC mode <strong>of</strong><br />

combination required for FPCs.<br />

6 Note that <strong>the</strong> tectogrammatical pro<strong>of</strong> term in (8) does not describe <strong>the</strong> phonological elision between le<br />

and a that occurs in French.<br />

120


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

3.4 FPCs and Non-auxiliary Composition<br />

Extending CVG to account for FPCs that combine with argument composition verbs o<strong>the</strong>r<br />

than auxiliaries, whose behavior is exemplified in (6), requires defining special lexical<br />

axioms for those verbs. Similar to <strong>the</strong> data examined so far, “non-local pronominal affixation”<br />

(in <strong>the</strong> terminology <strong>of</strong> Sag & Miller (1997)) is very short distance in nature, and as<br />

such employs <strong>the</strong> local implication ⊸PC that was introduced to handle procliticization.<br />

It is not necessary to invoke CVG’s hypo<strong>the</strong>tical pro<strong>of</strong> machinery for handling extraction<br />

phenomena to explain <strong>the</strong> data in (6).<br />

Here, a strategy is adopted <strong>of</strong> composing a predicative adjectival (for example, fidèle)<br />

or transitive verb (like connaît) with a version <strong>of</strong> its complement that is itself expecting<br />

a complement. The necessary extensions to <strong>the</strong> lexicon for <strong>the</strong> data in (6a) are <strong>the</strong><br />

following: 7<br />

⊢ Pierre, pierre ′ : Nom, Ind<br />

⊢ lui2 , d : Dat ∩ 3Sg ∩ Pcl, Ind<br />

⊢ fidèle, λyλxfaithful ′ (x, y) : (Dat \ Pcl) ⊸C (Nom ⊸SU Adj),<br />

Ind ⊃ (Ind ⊃ Prop)<br />

⊢ reste, λPλyλxremain ′ (P (x, y))<br />

: ((Dat \ Pcl) ⊸C (Nom ⊸SU Adj)) ⊸C ((Dat ∩ Pcl) ⊸PC (Nom ⊸SU Fin)),<br />

(Ind ⊃ (Ind ⊃ Prop)) ⊃ (Ind ⊃ (Ind ⊃ Prop))<br />

These axioms describe fidèle as an adjective missing a dative complement to form an<br />

adjectival small clause and <strong>the</strong> form <strong>of</strong> rester that takes an adjectival complement that is<br />

itself missing its complement. These extensions permit a pro<strong>of</strong> term for (6a-ii):<br />

(10) ⊢ ( SU Pierre ( PC lui2 (reste fidèle C ))), remain ′ (faithful ′ (pierre ′ , d)) : Fin, Prop<br />

(A full derivation <strong>of</strong> (10) is given in Figure 1 in <strong>the</strong> appendix.) With a few fur<strong>the</strong>r extensions<br />

to <strong>the</strong> lexicon, (6b) can also be accounted for:<br />

⊢ connaît, λf λyλxknow ′ (x, f(y))<br />

: ((De \ Pcl) ⊸C Acc) ⊸C ((De ∩ Pcl) ⊸PC (Nom ⊸SU Fin)),<br />

(Ind ⊃ Ind) ⊃ (Ind ⊃ Prop)<br />

⊢ fin, end ′ : N, Ind<br />

⊢ la, λf λxf(x) : N ⊸SP ((De \ Pcl) ⊸C Acc), Ind ⊃ (Ind ⊃ Ind)<br />

⊢ en, e : De ∩ Pcl, Ind<br />

Here, connaît is formulated as just an ordinary transitive verb except that it selects an<br />

accusative complement that is itself missing its De complement. The definite article la<br />

is treated as a function from common nouns (type N) to possessive NPs (functions from<br />

canonical de-phrases to accusatives), using <strong>the</strong> specifier combinatory mode ⊸SP. The<br />

clitic en is represented as an axiom whose type is <strong>the</strong> intersection <strong>of</strong> De and Pcl. These<br />

axioms allow a pro<strong>of</strong> term like <strong>the</strong> one in (10) for (6b-ii):<br />

7 This account assumes <strong>the</strong> analysis <strong>of</strong> predicatives given by Pollard (2006) pp. 52–65, for example, for<br />

adjectival small clauses <strong>of</strong> <strong>the</strong> type Nom ⊸SU Adj.<br />

121


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(11) ⊢ ( SU Marie ( PC en (connaît (la fin SP ) C ))), know ′ (marie ′ , end ′ (e)) : Fin, Prop<br />

The lexical axioms introduced here predict that FPCs in non-auxiliary composition<br />

contexts behave in a way largely parallel with that <strong>of</strong> FPCs that combine with auxiliary<br />

verbs. The main difference between FPCs with auxiliaries and with non-auxiliaries is that<br />

<strong>the</strong> complement types for non-auxiliaries must be more constrained than <strong>the</strong> free-ranging<br />

polymorphic complement allowed by auxiliaries. Since this approach does not appeal<br />

to CVG’s unbounded dependency machinery, instead relying on axioms that specify <strong>the</strong><br />

⊸PC local dependency, <strong>the</strong>se instances <strong>of</strong> cliticization are guaranteed to remain shortdistance.<br />

If FPCs in non-auxiliary composition contexts were construed as non-local<br />

extractions, it would be difficult to rule out constructions like (12), for example, which do<br />

not occur in French: 8<br />

(12) *Marie luii reste certaine que Céline a donné le livre i.<br />

4 Conclusions and Future Work<br />

This paper sketches a pro<strong>of</strong>-<strong>the</strong>oretic account <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> FPCs as complements.<br />

For local cliticization, a new valence implication mode ⊸PC is introduced to differentiate<br />

procliticization from <strong>the</strong> canonical form <strong>of</strong> verbal complement selection. Combined with<br />

properly-formulated lexical axioms, this new mode can account for some <strong>of</strong> <strong>the</strong> behavior<br />

<strong>of</strong> FPCs, including <strong>the</strong> basic instances <strong>of</strong> cliticization, FPCs in infinitival constructions,<br />

and two forms <strong>of</strong> “clitic climbing” via an argument composition analysis.<br />

The analysis given here departs from traditional categorial analyses <strong>of</strong> cliticization<br />

by construing FPCs as special instances <strong>of</strong> NPs. An advantage <strong>of</strong> this approach is that<br />

a cliticized complement has identical semantics and a nearly identical tectogrammatical<br />

form as its canonical counterpart. This fact, in combination with <strong>the</strong> new ⊸PC mode<br />

<strong>of</strong> implication for FPC affixation, allows lexical axioms to more strictly constrain <strong>the</strong><br />

behavior <strong>of</strong> FPCs in comparison to o<strong>the</strong>r types <strong>of</strong> verbal complements. This ability may<br />

be central to correctly predicting, for example, <strong>the</strong> distribution <strong>of</strong> souvent as shown in (4).<br />

This approach suffers, however, from <strong>the</strong> proliferation <strong>of</strong> lexical axioms that must occur<br />

since all verbs that take complements need at least two distinct representations in <strong>the</strong><br />

lexicon. Such a requirement would have especially adverse implications for computational<br />

applications like parsing. Since very <strong>of</strong>ten, as with voit1 and voit2 , <strong>the</strong> canonical<br />

form <strong>of</strong> a verb closely resembles its cliticized variant, it is clear that a lexical rule associating<br />

<strong>the</strong>se forms is crucial to <strong>the</strong> success <strong>of</strong> this type <strong>of</strong> approach. The instances <strong>of</strong><br />

auxiliary and non-auxiliary composition presented here are also largely similar between<br />

cliticized and non-cliticized versions. A general account <strong>of</strong> FPCs in French along <strong>the</strong><br />

lines <strong>of</strong> <strong>the</strong> analyses presented here must include a mapping between <strong>the</strong>se similar forms<br />

that captures <strong>the</strong>ir common linguistic and information-structural characteristics.<br />

Future work on FPCs will aim to develop a correspondence between canonical and<br />

cliticized verb forms that predicts FPC behavior in a general way. This work will need to<br />

account for multiple clitic constructions, <strong>the</strong> rigid (and sometimes idiosyncratic) ordering<br />

<strong>of</strong> FPC clusters, agreement between FPCs and past participles, FPCs in passive, causative,<br />

and perceptual-verb constructions, and <strong>the</strong> enclitic attachment to imperative-form verbs<br />

in French.<br />

8 This example is due to Carl Pollard (personal communication <strong>of</strong> March 18, 2008).<br />

122


References<br />

Abeillé, A., Godard, D. and Miller, P. (1995). Causatifs et Verbes de Perception en<br />

Français, Actes du Deuxième Colloque Langues et Grammaire, Paris VIII, Saint<br />

Denis.<br />

Abeillé, A., Godard, D. and Sag, I. A. (1998). Two Kinds <strong>of</strong> Composition in French<br />

Complex Predicates, Syntax and Semantics: Complex Predicates in Nonderivational<br />

Syntax 30: 1–41.<br />

Amblard, M. (2006). Treating clitics with minimalist grammars, in S. Wintner (ed.),<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Eleventh Conference on Formal Grammar, CSLI Publications,<br />

pp. 9–20.<br />

Bonami, O. and Boyé, G. (2005). French pronominal clitics and <strong>the</strong> design <strong>of</strong> Paradigm<br />

Function Morphology, in G. Booij, L. Ducceschi, B. Fradin, E. Guevara, A. Ralli<br />

and S. Scalise (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Fifth Mediterranean Morphology Meeting,<br />

pp. 291–322.<br />

Kraak, E. (1998). A Deductive Account <strong>of</strong> French Object Clitics, Syntax and Semantics:<br />

Complex Predicates in Nonderivational Syntax 30: 271–312.<br />

Morrill, G. and Gavarro, A. (1992). Catalan Clitics, in A. Lecomte (ed.), Word Order in<br />

Categorial Grammar, Editions Adosa, Clermont-Ferrand, pp. 211–232.<br />

Pollard, C. (2006). Higher Order Grammar: A Tutorial. Unpublished ms., available at<br />

http://www.ling.osu.edu/∼hana/hog/pollard2006-synners.pdf.<br />

Pollard, C. (2007). Nonlocal dependencies via variable contexts, in R. Muskens (ed.),<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Workshop on New Directions in Type-Theoretic Grammar. ESS-<br />

LLI 2007, Dublin.<br />

Sag, I. A. and Miller, P. H. (1997). French Clitic Movement without Clitics or Movement,<br />

Natural Language and Linguistic Theory 15(3): 573–639.<br />

Stabler, E. P. (2001). Recognizing Head Movement, LACL ’01: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 4th International<br />

Conference on Logical Aspects <strong>of</strong> Computational Linguistics, Springer-<br />

Verlag, London, UK, pp. 245–260.<br />

Appendix A: Full Derivation<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

123


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

⊢ reste : ((Dat \ Pcl) ⊸C (Nom ⊸SU Adj)) ⊸C ((Dat ∩ Pcl) ⊸PC (Nom ⊸SU Fin)) ⊢ fidèle : (Dat \ Pcl) ⊸C (Nom ⊸SU Adj)<br />

⊢ (reste fidèle C ) : (Dat ∩ Pcl) ⊸PC (Nom ⊸SU Fin)<br />

⊢ ( PC lui2 (reste fidèle C )) : Nom ⊸SU Fin<br />

⊢ lui2 : Dat ∩ 3Sg ∩ Pcl<br />

⊢ Pierre : Nom<br />

⊢ ( SU Pierre ( PC lui2 (reste fidèle C ))) : Fin<br />

124<br />

⊢ λPλyλxremain ′ (P (x, y)) : (Ind ⊃ (Ind ⊃ Prop)) ⊃ (Ind ⊃ (Ind ⊃ Prop)) ⊢ λyλxfaithful ′ (x, y) : Ind ⊃ (Ind ⊃ Prop)<br />

⊢ λyλxremain ′ (faithful ′ (x, y)) : Ind ⊃ (Ind ⊃ Prop)<br />

⊢ λxremain ′ (faithful ′ (x, d)) : Ind ⊃ Prop<br />

⊢ d : Ind<br />

⊢ pierre ′ : Ind<br />

⊢ remain ′ (faithful ′ (pierre ′ , d)) : Prop<br />

Figure 1: Full derivation <strong>of</strong> (10), with tecto-terms (above) and semantic terms (below) given separately for space considerations.


INFINITE GAMES<br />

FROM AN INTUITIONISTIC POINT OF VIEW<br />

Takako Nemoto<br />

Tohoku University<br />

Abstract. In this paper, we consider determinacy in Brouwerian intuitionistic ma<strong>the</strong>matics.<br />

We give some examples <strong>of</strong> games such that <strong>the</strong> character <strong>of</strong> this ma<strong>the</strong>matical setting—<strong>the</strong><br />

lack <strong>of</strong> <strong>the</strong> law <strong>of</strong> excluded middle and <strong>the</strong> adoption <strong>of</strong> continuity principle—makes <strong>the</strong><br />

behavior <strong>of</strong> determinacy drastically different from that on <strong>the</strong> classical setting.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Games on N N have been <strong>of</strong> great interest in ma<strong>the</strong>matical logic for a long time. On one<br />

hand, determinacy <strong>of</strong> games has been used as a strong tool to investigate Baire space N N<br />

or Cantor space {0, 1} N . On <strong>the</strong> o<strong>the</strong>r hand, as has been known, determinacy statements<br />

are quite sensitive to <strong>the</strong> ma<strong>the</strong>matical setting: For example, with <strong>the</strong> axiom <strong>of</strong> choice,<br />

full determinacy is inconsistent; determinacy <strong>of</strong> analytic games are beyond ZFC.<br />

The ultimate purpose <strong>of</strong> <strong>the</strong> author is to know how Baire space and Cantor space vary<br />

depending on settings o<strong>the</strong>r than usual ones. As <strong>the</strong> first step toward this, she has been<br />

investigating <strong>the</strong> promising tool, determinacy, on <strong>the</strong>se settings. Among <strong>the</strong>se are subsystems<br />

<strong>of</strong> second order arithmetic, much weaker ones than ZFC (cf. (Nemoto, Ould Med-<br />

Salem and Tanaka, 2007), (Nemoto, 2008)).<br />

This paper treats ano<strong>the</strong>r setting, Brouwerian intuitionistic ma<strong>the</strong>matics. It denies <strong>the</strong><br />

law <strong>of</strong> excluded middle (LEM) and adopts <strong>the</strong> continuity principle, asserting that all <strong>the</strong><br />

functions from N N to N N or to N are continuous (for detail, see Section 2). We give<br />

some examples <strong>of</strong> games, which show that <strong>the</strong> continuity principle and <strong>the</strong> lack <strong>of</strong> LEM<br />

make <strong>the</strong> behavior <strong>of</strong> determinacy drastically different from that on <strong>the</strong> classical setting.<br />

To explicate <strong>the</strong> role <strong>of</strong> classical principles in determinacy, we treat predeterminacy—<br />

a formalization <strong>of</strong> determinacy in <strong>the</strong> intuitionistic ma<strong>the</strong>matics—also in <strong>the</strong> classical<br />

ma<strong>the</strong>matics.<br />

2 Axioms <strong>of</strong> <strong>the</strong> intuitionistic ma<strong>the</strong>matics<br />

In this section, we clarify <strong>the</strong> ma<strong>the</strong>matical setting <strong>of</strong> this paper.<br />

The logical constants have <strong>the</strong>ir constructive meanings and <strong>the</strong> rules <strong>of</strong> <strong>the</strong> intuitionistic<br />

logic are employed. In particular, a disjunctive statement A∨B means <strong>the</strong>re exists a pro<strong>of</strong><br />

<strong>of</strong> A or one <strong>of</strong> B, and an existential statement ∃x ∈ V [A(x)] means <strong>the</strong>re exist an element<br />

a <strong>of</strong> V and an pro<strong>of</strong> <strong>of</strong> A(a). A statement A is decidable if A ∨ ¬A holds. A set X ⊆ V<br />

is decidable if <strong>the</strong> statement a ∈ X is decidable for each a ∈ V .<br />

An infinite sequence α <strong>of</strong> natural numbers α(0), α(1), α(2), ... may be determined by<br />

some finitely described algorithm, i.e., <strong>the</strong> n-th element α(n) <strong>of</strong> α is <strong>the</strong> result <strong>of</strong> <strong>the</strong><br />

algorithm for input n. Sometimes, however, such an infinite sequence may be constructed<br />

step by step by choosing its elements one by one. In this case, <strong>the</strong> construction <strong>of</strong> <strong>the</strong><br />

125


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

sequence is never finished: At any point in time, only finitely many elements have been<br />

chosen, and so we can only know a finite part <strong>of</strong> <strong>the</strong> sequence.<br />

The latter construction is not permitted in <strong>the</strong> constructive ma<strong>the</strong>matics, and so this<br />

point divides <strong>the</strong> intuitionistic ma<strong>the</strong>matics from <strong>the</strong> constructive ma<strong>the</strong>matics.<br />

Note that every infinite sequence, even if it is given by an algorithm, can be regarded<br />

as a result <strong>of</strong> step-by-step-construction. This is <strong>the</strong> reason we do not distinguish infinite<br />

sequences <strong>of</strong> natural number by <strong>the</strong>ir manners <strong>of</strong> construction.<br />

Let N be <strong>the</strong> set <strong>of</strong> natural numbers. XN is <strong>the</strong> set <strong>of</strong> infinite sequences from X.<br />

In particular NN is called Baire space and 2N is called Cantor space. Xn is <strong>the</strong> set<br />

<strong>of</strong> sequences from X <strong>of</strong> length n and X


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The strict fan <strong>the</strong>orem<br />

For a fan S and a decidable bar B in S, <strong>the</strong>re is a bounded sub-bar B ′ ⊆ B in S.<br />

While König’s lemma and <strong>the</strong> strict fan <strong>the</strong>orem are equivalent in <strong>the</strong> classical ma<strong>the</strong>matics,<br />

<strong>the</strong>y are not in <strong>the</strong> intuitionistic ma<strong>the</strong>matics. Actually we can construct a “socalled”<br />

intuitionistic counterexample, i.e., a fan T which has sequences <strong>of</strong> any finite length<br />

such that we cannot prove that T has an infinite path, i.e., αN → N such that αn ∈ T for<br />

all n. Let i n ∈ {0, 1} n be such that i n (k) = i for all k < n and let i N ∈ {0, 1} N be such<br />

that i N (n) = i for all n. Define T ⊆ {i n : i < 2, n ∈ N} by<br />

0 n ∈ T ↔<strong>the</strong>re is no k < n such that pk+i = 9 for all i < 99, or if <strong>the</strong> least such k is even,<br />

1 n ∈ T ↔<strong>the</strong>re is no k < n such that pk+i = 9 for all i < 99, <strong>the</strong>n <strong>the</strong> least such k is odd,<br />

where pk denotes <strong>the</strong> k-th digit <strong>of</strong> <strong>the</strong> decimal expansion <strong>of</strong> π. We can easily see that T<br />

is a fan which has sequences <strong>of</strong> any finite length and that if T has an infinite path α, <strong>the</strong>n<br />

α = 0 N or α = 1 N . Assume that T has an infinite path α. If α(0) = 0 (or 1), <strong>the</strong>n we<br />

must have a pro<strong>of</strong> <strong>of</strong> <strong>the</strong> statement “if <strong>the</strong>re is uninterrupted occurrences <strong>of</strong> 9 <strong>of</strong> length 99<br />

in <strong>the</strong> decimal expansion <strong>of</strong> π, <strong>the</strong> least such one starts at an even (resp. odd) digit.” Up<br />

to now, we do not have any pro<strong>of</strong> <strong>of</strong> such statements, and so <strong>the</strong>re is no infinite path in T .<br />

(If we have a pro<strong>of</strong> in future, we can find ano<strong>the</strong>r so-called counterexample using ano<strong>the</strong>r<br />

unsolved problem in a similar way.)<br />

3 Determinacy in intuitionistic ma<strong>the</strong>matics<br />

In this section, we introduce <strong>the</strong> notion <strong>of</strong> determinacy and variants.<br />

For A ⊆ N N , <strong>the</strong> game G(A) in N N is defined as follows. Two players, called players<br />

I and II, starting with player I, alternately choose a natural number to construct α ∈ N N .<br />

Player I wins if and only if <strong>the</strong> resulting play α is in A. Player II wins if and only if player<br />

I does not win. A strategy for player I (resp. II) is a function which assigns a natural<br />

number to each even-(resp. odd-)length sequence in N


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(Veldman, 2004) gave three formalizations <strong>of</strong> determinacy in <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />

G(A) is strongly determinate if, in G(A), ei<strong>the</strong>r player I or player II has a winning<br />

strategy. This is <strong>the</strong> simplest formalization, but almost no game is strongly determinate.<br />

G(A) is determinate from <strong>the</strong> view point <strong>of</strong> player I if, if for every strategy τ <strong>of</strong> player<br />

II, <strong>the</strong>re is α ∈II τ with α ∈ A, <strong>the</strong>n player I has a winning strategy in G(A). This<br />

statement corresponds to <strong>the</strong> classical statement “if player II has no winning strategy,<br />

<strong>the</strong>n player I has one in G(A),” which is classically equivalent to “G(A) is determinate.”<br />

To describe <strong>the</strong> last, we need a new notion. An anti-strategy for player I in G(A) is<br />

a function η which assigns α ∈II τ to each strategy τ for player II in G(A). An antistrategy<br />

η for player I secures A if, for any strategy τ for player II, η(τ) ∈ A. G(A)<br />

is predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I if, if he has an anti-strategy securing A,<br />

<strong>the</strong>n he has a winning strategy in G(A).<br />

Note that G(A) is predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I, if G(A) is determinate<br />

from his viewpoint.<br />

Moreover, in a game G(X) in N N (or spread [S]), <strong>the</strong> second axiom <strong>of</strong> continuous<br />

choice yields <strong>the</strong> converse, i.e., predeterminacy implies determinacy, since a strategy for<br />

a player can be regarded as a function from N to N and since if <strong>the</strong>re is α ∈II τ with<br />

α ∈ X for all strategy τ for player II, <strong>the</strong>n by <strong>the</strong> second axiom <strong>of</strong> continuous choice an<br />

anti-strategy for player I securing X is given by a code η <strong>of</strong> a continuous function.<br />

The intuitionistic determinacy <strong>the</strong>orem (Veldman, 2004, Theorem 3.5) If [S] is a IIfinitary<br />

branching spread, i.e., S is a spread-law such that, for every odd-length s ∈ S,<br />

<strong>the</strong>re are at most finitely many n with s ∗ 〈n〉 ∈ T , <strong>the</strong>n G[S](A) is predeterminate from<br />

<strong>the</strong> viewpoint <strong>of</strong> player I for every A ⊆ [S].<br />

In particular, if A ⊆ {0, 1} N , G {0,1} N(A) is predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player<br />

I. (Veldman, 2004) also gave A ⊆ N N such that G(A) is not predeterminate from <strong>the</strong><br />

viewpoint <strong>of</strong> player I.<br />

Remark The notion <strong>of</strong> predeterminacy can be formalized from <strong>the</strong> viewpoint <strong>of</strong> player II<br />

and we can obtain similar results to <strong>the</strong> last <strong>the</strong>orem.<br />

4 Variations <strong>of</strong> games and predeterminacy<br />

In this section, we consider o<strong>the</strong>r variations <strong>of</strong> games in <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />

For <strong>the</strong>se games, we can define <strong>the</strong> three formalizations <strong>of</strong> determinacy in <strong>the</strong> same way.<br />

4.1 2-length games in {0, 1} N × {0, 1}<br />

This subsection treats one <strong>of</strong> <strong>the</strong> simplest cases in which less strategies are allowed than<br />

in <strong>the</strong> classical context. {0, 1} N × {0, 1} denotes <strong>the</strong> product topological space <strong>of</strong> Cantor<br />

space and discrete space {0, 1}.<br />

For given A ⊆ {0, 1} N × {0, 1}, <strong>the</strong> game G1(A) is defined as follows:<br />

• Player I chooses α ∈ {0, 1} N .<br />

• Player II chooses i ∈ {0, 1}.<br />

• Player I wins if (α, i) ∈ A and player II wins if player I does not win.<br />

128


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Although {0, 1} N × {0, 1} is homeomorphic to Cantor space topologically, we must be<br />

sensitive to <strong>the</strong> ordertype <strong>of</strong> <strong>the</strong> indexing set for <strong>the</strong> sequences.<br />

In this game, a strategy for player I is his initial move α, and a strategy for player<br />

II is a function from {0, 1} N to {0, 1}. The continuity principle forces all <strong>the</strong> strategies<br />

for player II to be continuous, and so we may regard a strategy τ for player II as a code<br />

<strong>of</strong> a continuous function such that (τ|α)(0) ∈ {0, 1} for all α ∈ {0, 1} N . B = {s ∈<br />

{0, 1} 0} is a decidable bar in <strong>the</strong> fan {0, 1} N . Then, by <strong>the</strong> strict fan <strong>the</strong>orem,<br />

<strong>the</strong>re is a bounded sub-bar B ′ ⊆ B. Take n such that lh(s) < n for every s ∈ B ′ .<br />

Then, {0, 1} n is also a bar in {0, 1} N , and, for every α, β ∈ {0, 1} N , αn = βn implies<br />

τ|α(0) = τ|β(0). Thus we can regard τ as a function from {0, 1} nτ to {0, 1}, which can<br />

be coded by a natural number. Because an anti-strategy η for player I is a function from<br />

<strong>the</strong> set <strong>of</strong> all strategies for player II to <strong>the</strong> set <strong>of</strong> plays in this game, it can be regarded as<br />

a function from N with <strong>the</strong> discrete topology to {0, 1} N × {0, 1}.<br />

The following examples shows that even simpler sets, such as open or closed sets, are<br />

not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I.<br />

Example 1 An open game G1(A) which is not predeterminate from <strong>the</strong> view point <strong>of</strong><br />

player I: Define Ai = {0 n ∗ 〈1, i〉 : n ∈ N} and A = {(α, i) : ∃n[αn ∈ Ai]}. Then A is<br />

open. Let η be <strong>the</strong> anti-strategy for player I which assigns (0 n τ ∗ 〈1, τ(0 n τ )〉 ∗ 0 N , τ(0 n τ )) to<br />

each strategy τ for player II. Then η(τ) ∈ A for each strategy τ for player II, and so η is<br />

an anti-strategy for player I securing A. On <strong>the</strong> o<strong>the</strong>r hand, it is clear that player I has no<br />

winning strategy in G1(A).<br />

Example 2 A closed game G1(B) which is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong><br />

player I: Let T be an intuitionistic counterexample to König’s lemma, i.e., an unbounded<br />

binary tree without infinite paths. Let Ti = {t ∗ i n |t ∈ T ∧ n ∈ N}. Then B =<br />

{(α, i)|∀n[αn ∈ Ti]} is a closed set. If player I had a winning strategy α in G1(B), α<br />

would be an infinite path <strong>of</strong> T . Thus player I cannot have a winning strategy in G1(B).<br />

On <strong>the</strong> o<strong>the</strong>r hand, player I has an anti-strategy securing B. Fix an enumeration <strong>of</strong> T and<br />

let tn be <strong>the</strong> minimum s ∈ T such that lh(s) = n with respect to this enumeration. Let<br />

η be <strong>the</strong> anti-strategy for player I which assigns (tn ∗ (τ(tn)) N , τ(tn)) to each strategy<br />

τ : {0, 1} n → {0, 1} for player II. Clearly η secures B.<br />

4.2 ω + 1 length games in {0, 1} N × {0, 1}<br />

In this subsection, we consider ano<strong>the</strong>r kind <strong>of</strong> games in {0, 1} N × {0, 1}.<br />

For given A ⊆ {0, 1} N × {0, 1}, <strong>the</strong> game G2(A) is defined as follows.<br />

• Player I and player II alternately choose i ∈ {0, 1} to form α ∈ {0, 1} N .<br />

• After α is formed, player I chooses i ∈ {0, 1}.<br />

• Player I wins G2(A) if and only if (α, i) ∈ A.<br />

In this game, a strategy σ for player I is a pair (σ0, σ1) <strong>of</strong> functions σ0 : �<br />

n∈N {0, 1}2n →<br />

{0, 1} and σ1 : {0, 1} N → {0, 1}. By <strong>the</strong> strict fan <strong>the</strong>orem, we can regard, as well as in<br />

<strong>the</strong> last subsection, σ1 as a function from {0, 1} n to {0, 1} for some n ∈ N.<br />

A strategy for player II is a function τ : �<br />

n∈N {0, 1}2n+1 → {0, 1}, which can be<br />

regarded as an element <strong>of</strong> {0, 1} N . Then an anti-strategy η for player I is a function from<br />

129


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

{0, 1} N to {0, 1} N × {0, 1}, which can be regarded a pair (η0, η1) <strong>of</strong> codes <strong>of</strong> continuous<br />

functions such that, for any strategy τ for player II, (η0|τ, (η1|τ)(0)) ∈II τ. By <strong>the</strong><br />

strict fan <strong>the</strong>orem, <strong>the</strong>re is n such that for any strategies τ and τ ′ , τn = τ ′ n implies<br />

(η1|τ)(0) = (η1|τ ′ )(0), and so we can regard η1 as a function from {0, 1} n to {0, 1}.<br />

Theorem 1 For any C ⊆ {0, 1} N × {0, 1}, G2(C) is predeterminate from <strong>the</strong> viewpoint<br />

<strong>of</strong> player I.<br />

Pro<strong>of</strong>. For i < {0, 1}, set Ci = {α : (α, i) ∈ C}. Assume that η = (η0, η1) is an<br />

anti-strategy for player I securing C and η1 can be regarded as a function from {0, 1} n<br />

to {0, 1} for some n. Note that, in G {0,1} N(C0 ∪ C1), η0 is an anti-strategy for player I<br />

securing C0 ∪ C1. Let σ0 be a winning strategy for player I constructed in <strong>the</strong> pro<strong>of</strong> <strong>of</strong><br />

The intuitionistic determinacy <strong>the</strong>orem in G {0,1} N(C0 ∪ C1). Set Pσ0 = {α : α ∈I σ0}.<br />

Note that Pσ0 is a spread. By <strong>the</strong> pro<strong>of</strong> <strong>of</strong> The intuitionistic determinacy <strong>the</strong>orem, for any<br />

α ∈ Pσ0, <strong>the</strong>re exists a strategy δ for player II with η0|δ = α. By <strong>the</strong> second axiom <strong>of</strong><br />

continuous choice, <strong>the</strong>re exists a code <strong>of</strong> continuous function ζ such that, for any strategy<br />

α ∈ Pσ0, ζ|α is a strategy for player II with η0|(ζ|α) = α. By <strong>the</strong> strict fan <strong>the</strong>orem, <strong>the</strong>re<br />

exists a natural number N such that, for any α and β in Pσ0, αN = βN implies (ζ|α)n =<br />

(ζ|β)n. Then we can define σ1 : Pσ0 → {0, 1} by σ1(α) = η1((ζ|α)n), since σ1(α) is<br />

determined by αN. Define a new strategy σ = (σ0, σ1) for player I in G2(C). Then, for<br />

any (α, i) ∈I σ, a strategy δ = ζ|α for player II satisfies (α, i) = (η0|δ, (η1|δ)(0)), and so<br />

σ is a winning strategy for player I in G2(C). �<br />

Comparing this <strong>the</strong>orem with <strong>the</strong> examples in <strong>the</strong> last subsection, we can conclude that<br />

predeterminacy depends how players construct <strong>the</strong> sequence ra<strong>the</strong>r than what sequence<br />

<strong>the</strong>y do.<br />

4.3 ω + 2-length game in {0, 1} N × {0, 1} 2<br />

Next we consider slightly longer games.<br />

For a given set A ⊆ {0, 1} N × {0, 1} 2 , consider <strong>the</strong> following game G3(A).<br />

• First, player I and player II alternately choose n ∈ {0, 1} to form α ∈ {0, 1} N .<br />

• After α is formed, player I chooses i ∈ {0, 1} and player II chooses j ∈ {0, 1}.<br />

• Player I wins if (α, 〈i, j〉) ∈ A and player II wins if player I does not win.<br />

Similarly to <strong>the</strong> previous subsection, a strategy σ for player I is a pair (σ0, σ1), where<br />

σ0 is a function �<br />

n∈N {0, 1}2n to {0, 1} and where σ1 is a function from {0, 1} N to {0, 1}.<br />

We can regard σ1 as a function from {0, 1} n to {0, 1} for some n ∈ N.<br />

A strategy τ for player II is a pair (τ0, τ1), where τ0 is a function from �<br />

n∈N {0, 1}2n+1<br />

to {0, 1} and where τ1 is a function from {0, 1} N × {0, 1} to {0, 1}. Note that since τ1 is<br />

continuous, its restriction τ1,i to {0, 1} N × {i} is also continuous and so we can regard τ1<br />

as a pair (τ10, τ11) <strong>of</strong> functions {0, 1} ni to {0, 1} for some ni’s.<br />

Hence, <strong>the</strong> set <strong>of</strong> strategies for player II can be regarded as {0, 1} N × N, and so an antistrategy<br />

for player I can be regarded as a function η from {0, 1} N × N to {0, 1} N × {0, 1} 2<br />

such that η(τ) ∈II τ for each strategy τ for player II.<br />

As in <strong>the</strong> case <strong>of</strong> G1(X), we have <strong>the</strong> following examples. For any s ∈ {0, 1}


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Example 3 Recall Ai defined in Example 1. Then <strong>the</strong> open game G3(A ′ ) defined by<br />

A ′ = {(α, 〈i, j〉) : ∃n[(αn) ′ ∈ Aj]} is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I.<br />

Example 4 Recall Ti defined in Example 2. Then <strong>the</strong> closed game G3(B ′ ) defined by<br />

B ′ = {(α, 〈i, j〉) : ∀n(αn) ′ ∈ Tj} is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I.<br />

5 Predeterminacy in <strong>the</strong> classical ma<strong>the</strong>matics<br />

In this section, we consider predeterminacy in <strong>the</strong> classical ma<strong>the</strong>matics in order to investigate<br />

<strong>the</strong> role <strong>of</strong> classical principles in predeterminacy. Note that all <strong>the</strong> definitions<br />

and statements in this section are made in <strong>the</strong> classical ma<strong>the</strong>matics which includes <strong>the</strong><br />

countable axiom <strong>of</strong> choice.<br />

Recall that, in <strong>the</strong> intuitionistic ma<strong>the</strong>matics, an anti-strategy is a function η such that<br />

η(τ) ∈II τ for each strategy τ for player II. We translate this definition into <strong>the</strong> classical<br />

ma<strong>the</strong>matics, noticing that every function on N N is continuous in <strong>the</strong> intuitionistic<br />

ma<strong>the</strong>matics:<br />

Let G(X) be any <strong>of</strong> games treated in <strong>the</strong> previous sections. An anti-strategy for player<br />

I in G(X) is a continuous function which assigns α ∈II τ to every continuous strategy<br />

τ for player II in G(X). An anti-strategy η for player I in G(X) secures X if η(τ) ∈ X<br />

for all continuous strategies τ for player II. G(X) is predeterminate from <strong>the</strong> viewpoint<br />

<strong>of</strong> player I if,<br />

if player I has an anti-strategy η securing X <strong>the</strong>n player I has a winning<br />

strategy in G(X).<br />

Note that <strong>the</strong> ordinary definition <strong>of</strong> determinacy statement can be seen as “if <strong>the</strong>re is a<br />

function η such that η(τ) ∈II τ and η(τ) ∈ X for all strategies τ for player II, <strong>the</strong>n player<br />

I has a winning strategy in G(X).”<br />

For X ⊆ N N , strategies for players in <strong>the</strong> game G(X) can be regarded as functions N<br />

to N, and so all <strong>the</strong> strategies are continuous. Therefore <strong>the</strong> condition “continuous” for<br />

strategies has no effect in games G(X), but it does in <strong>the</strong> games G1(X), G2(X) and G3(X).<br />

Moreover <strong>the</strong> continuity in <strong>the</strong> definition <strong>of</strong> anti-strategy is essential in <strong>the</strong> following<br />

discussion.<br />

As mentioned in (Veldman, 2004, 1.1), The intuitionistic determinacy <strong>the</strong>orem holds<br />

also in <strong>the</strong> classical ma<strong>the</strong>matics. In particular, for all A ⊆ {0, 1} N , G {0,1} N(A) is predeterminate<br />

from <strong>the</strong> viewpoint <strong>of</strong> player I in <strong>the</strong> classical ma<strong>the</strong>matics.<br />

Now we consider <strong>the</strong> predeterminacy <strong>of</strong> <strong>the</strong> games G1(X), G2(X) and G3(X) which<br />

are defined in <strong>the</strong> last section, in <strong>the</strong> classical ma<strong>the</strong>matics. Due to König’s lemma, <strong>the</strong><br />

classical counterpart <strong>of</strong> <strong>the</strong> strict fan <strong>the</strong>orem, also in <strong>the</strong> classical ma<strong>the</strong>matics, a continuous<br />

function from {0, 1} N → {0, 1} or {0, 1} N → {0, 1} N is given by its code η defined<br />

in Section 2. In particular, a strategy for player II in G1(A) can be seen as a function<br />

τ : {0, 1} n → {0, 1} for some n and an anti-strategy for player I in G2(A) can be seen as<br />

a pair (η0, η1) <strong>of</strong> a code η0 <strong>of</strong> continuous function and η1 : {0, 1} m → {0, 1} for some m.<br />

The game G1(A) is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I, where A is<br />

defined in <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Example 1. For closed games, <strong>the</strong> situation differs: Whereas<br />

Example 2 is a closed game which is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I<br />

in <strong>the</strong> intuitionistic ma<strong>the</strong>matics, we will show that <strong>the</strong>re is no such closed game in <strong>the</strong><br />

classical ma<strong>the</strong>matics.<br />

<strong>13</strong>1


For X ⊆ {0, 1} N × {0, 1} and s ∈ {0, 1}


6 Fur<strong>the</strong>r problems<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Predeterminacy <strong>of</strong> closed game G3(X) in <strong>the</strong> classical ma<strong>the</strong>matics The first problem<br />

<strong>the</strong> author is interested in is whe<strong>the</strong>r <strong>the</strong> closed games G3(X) are predeterminate or<br />

not in <strong>the</strong> classical ma<strong>the</strong>matics. It will be solved by analyzing <strong>the</strong> property <strong>of</strong> continuous<br />

functions in Cantor space.<br />

Classical investigation <strong>of</strong> predeterminacy We can consider various formalizations <strong>of</strong><br />

predeterminacy in <strong>the</strong> classical ma<strong>the</strong>matics o<strong>the</strong>r than defined in Section 5, e.g.,<br />

If player I has an anti-strategy such that η(τ) ∈ A for each continuous strategy<br />

τ for player II, <strong>the</strong>n player I has a continuous winning strategy in G(A).<br />

Note that <strong>the</strong> italicized part is newly added. Again, in game G(X) in N N , this modification<br />

has no effect. However, we can easily find X ⊆ {0, 1} N which is not predeterminate in<br />

this sense but which is predeterminate in <strong>the</strong> sense <strong>of</strong> Section 5. The author expects that<br />

<strong>the</strong> investigation on <strong>the</strong>se variations explicates how continuity confines functions on Baire<br />

space or Cantor space.<br />

Constructive reverse ma<strong>the</strong>matical analysis <strong>of</strong> predeterminacy Constructive reverse<br />

ma<strong>the</strong>matics is a study to measure <strong>the</strong> strength <strong>of</strong> ma<strong>the</strong>matical statements by nonconstructive<br />

principles using constructive ma<strong>the</strong>matics as a base <strong>the</strong>ory. Constructive ma<strong>the</strong>matics<br />

is a ma<strong>the</strong>matics which is based on <strong>the</strong> intuitionistic logic, but which does not<br />

adopt axioms introduced in Section 2. Therefore it is included both in <strong>the</strong> classical ma<strong>the</strong>matics<br />

and in <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />

(1) The role <strong>of</strong> <strong>the</strong> second axiom <strong>of</strong> continuous choice for predeterminacy Under<br />

<strong>the</strong> second axiom <strong>of</strong> continuous choice, predeterminacy implies determinacy. This implication<br />

needs only a fragment <strong>of</strong> <strong>the</strong> second axiom <strong>of</strong> continuous choice, and it is natural<br />

to ask exactly how strong fragments are required. If we measure <strong>the</strong> strength <strong>of</strong> fragments<br />

by <strong>the</strong> complexity <strong>of</strong> R in <strong>the</strong> axiom, <strong>the</strong> difficulty is in <strong>the</strong> reduction <strong>of</strong> general formulas<br />

<strong>of</strong> <strong>the</strong> form ∀α∃βR(α, β) to <strong>the</strong> form ∀τ∃σ∀α(α ∈I σ ∧ α ∈II τ → R ′ (α)).<br />

(2) Equivalences between predeterminacy and intuitionistic axioms (Veldman,<br />

200x) proposed intuitionistic second order arithmetic and proved that <strong>the</strong> predeterminacy<br />

<strong>of</strong> open subsets <strong>of</strong> II-finitary branching spreads in N is equivalent to <strong>the</strong> strict fan <strong>the</strong>orem<br />

over <strong>the</strong> system BIM, which corresponds a popular classical base <strong>the</strong>ory RCA0 in <strong>the</strong> field<br />

called Friedman-Simpson’s reverse ma<strong>the</strong>matics (cf. (Simpson, 1999)). The author <strong>of</strong> <strong>the</strong><br />

present paper is now looking for similar equivalences beyond open sets. The first task in<br />

this direction is to find a suitable intuitionistic axiom to compare with. One <strong>of</strong> candidates<br />

is almost-fan-<strong>the</strong>orem proposed in (Veldman, 2001).<br />

(3) The role <strong>of</strong> LEM for predeterminacy In <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Theorem 2, we use <strong>the</strong> law<br />

<strong>of</strong> excluded middle. It seems impossible to prove it without this classical law, because<br />

we have B <strong>of</strong> Example 2 in <strong>the</strong> intuitionistic ma<strong>the</strong>matics. The next natural question<br />

is what fragment <strong>of</strong> <strong>the</strong> classical law (such as <strong>the</strong> excluded middle or double negation<br />

<strong>13</strong>3


elimination) is necessary and sufficient for determinacy or predeterminacy statements.<br />

(Akama, Berardi, Hayashi and Kohlenbach, 2004) discovered a hierarchy consisting <strong>of</strong><br />

<strong>the</strong>se fragments over Heyting arithmetic HA, which is <strong>the</strong> constructive counterpart to<br />

Peano arithmetic. The author <strong>of</strong> present paper tries to measure predeterminacy or determinacy<br />

statements along this hierarchy.<br />

(4) Equivalences between predeterminacy and classical axioms Since we treat<br />

predeterminacy also in <strong>the</strong> classical ma<strong>the</strong>matics, it is natural to consider Friedman-<br />

Simpson’s reverse ma<strong>the</strong>matical study <strong>of</strong> predeterminacy. Using constructive ma<strong>the</strong>matics<br />

as a base <strong>the</strong>ory, we can make a finer reverse ma<strong>the</strong>matical study <strong>of</strong> predeterminacy.<br />

Acknowledgements<br />

Some parts <strong>of</strong> this paper were done as <strong>the</strong> final assignment <strong>of</strong> master class 2006/2007 in<br />

logic at ma<strong>the</strong>matical research institute, <strong>the</strong> Ne<strong>the</strong>rlands. The author would like to express<br />

her gratitude to <strong>the</strong> supervisor, Dr. Wim Veldman, who introduced her to <strong>the</strong> attractivity<br />

<strong>of</strong> <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Akama, Y., Berardi, S., Hayashi, S. and Kohlenbach, U. (2004). An arithmetical hierarchy<br />

<strong>of</strong> <strong>the</strong> law <strong>of</strong> excluded middle and related principles, in H. Ganzinger (ed.),<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Nineteenth Annual IEEE Symp. on Logic in Computer Science,<br />

LICS 2004, IEEE Computer Society Press, pp. 192–201.<br />

Nemoto, T. (2008). Determinacy <strong>of</strong> wadge classes and subsystems <strong>of</strong> second order arithmetic.<br />

Accepted for publication in Math. Log. Q., available at<br />

http://www.math.tohoku.ac.jp/˜sa4m20/wadge.pdf.<br />

Nemoto, T., Ould MedSalem, M. and Tanaka, K. (2007). Infinite games in <strong>the</strong> cantor<br />

space and subsystems <strong>of</strong> second order arithmetic, Math. Log. Q. 53: 226–236.<br />

Simpson, S. G. (1999). Subsystems <strong>of</strong> second order arithmetic, Springer.<br />

Veldman, W. (2001). Almost <strong>the</strong> fan <strong>the</strong>orem, Technical report, Department <strong>of</strong> Ma<strong>the</strong>matics,<br />

University <strong>of</strong> Nijmegen.<br />

Veldman, W. (2004). The problem <strong>of</strong> <strong>the</strong> determinacy <strong>of</strong> infinite games from an intuitionistic<br />

point <strong>of</strong> view, Technical report, Department <strong>of</strong> Ma<strong>the</strong>matics, University <strong>of</strong><br />

Nijmegen. To appear in <strong>the</strong> proceeding <strong>of</strong> Logic, Games and Philosophy: Foundational<br />

Perspectives, Prague 2004.<br />

Veldman, W. (200x). Brouwer’s fan <strong>the</strong>orem as an axiom and as a contrast to kleene’s<br />

alternative. Preprint.<br />

<strong>13</strong>4


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

LANGUAGE TECHNOLOGIES FOR INSTRUCTIONAL RESOURCES IN<br />

BULGARIAN<br />

Ivelina Nikolova<br />

University <strong>of</strong> S<strong>of</strong>ia<br />

Abstract. This paper describes a system that uses language technologies applied on instructional<br />

materials in order to provide computer-aided design <strong>of</strong> test items. This approach<br />

employs lexical and syntactic information obtained from various techniques like POS tagging,<br />

constituency parsing and term extraction. The system compiles a list <strong>of</strong> central terms<br />

for <strong>the</strong> instructional materials, creates drafts <strong>of</strong> fill in <strong>the</strong> blank questions and suggests possible<br />

distrators. The experiment is carried out on textbooks in geography, biology and history<br />

<strong>of</strong> Bulgarian high-schools.<br />

1 Introduction and related work<br />

Asking questions is a way to keep students attention in class and verify <strong>the</strong>ir understanding.<br />

Depending on <strong>the</strong> type <strong>of</strong> education and <strong>the</strong> goal <strong>of</strong> <strong>the</strong> teacher, questions could<br />

be asked in a different form - orally or as short writing examination, in a game manner<br />

etc. One common technique to do that is asking multiple choice questions, which became<br />

even more popular in <strong>the</strong> last years, because it is also applicable for <strong>the</strong> case <strong>of</strong> e-learning.<br />

However, designing thousands <strong>of</strong> tests is a time and effort-consuming educational activity.<br />

All questions in <strong>the</strong> test should be carefully tuned for <strong>the</strong> target group <strong>of</strong> test-takers<br />

and should not underestimate or overestimate <strong>the</strong>ir knowledge. Hence <strong>the</strong> teaching experts<br />

who prepare <strong>the</strong> tests must have much broader knowledge in <strong>the</strong> field, compared to<br />

<strong>the</strong> content which is explicitly included in <strong>the</strong> particular textbook, and <strong>the</strong>y have to tune<br />

<strong>the</strong> tests to <strong>the</strong> knowledge <strong>of</strong> <strong>the</strong> test-takers. One <strong>of</strong> <strong>the</strong> most difficult tasks in producing<br />

test items is to decide whe<strong>the</strong>r a question does really have its answer in <strong>the</strong> instructional<br />

materials.<br />

These difficulties gave rise <strong>of</strong> a relatively new research area dealing with support <strong>of</strong><br />

<strong>the</strong> generation <strong>of</strong> test items, answer and distractor suggestions. Generation <strong>of</strong> multiple<br />

choice questions with <strong>the</strong> help <strong>of</strong> NLP technologies is a hot area where different tools for<br />

text processing are used in order to transform <strong>the</strong> facts from <strong>the</strong> instructional materials<br />

to questions which can be used for students assessment. One <strong>of</strong> <strong>the</strong> most interesting approaches<br />

in this respect is presented by (Mitkov, Ha and Karamis, 2006), where <strong>the</strong>y apply<br />

language technologies (LT) for generation <strong>of</strong> test-items for English, focusing on <strong>the</strong> automatic<br />

choice <strong>of</strong> distractors. They report speeding up <strong>of</strong> <strong>the</strong> process <strong>of</strong> test development<br />

about 6-10 times, compared to <strong>the</strong> manual test elicitation. Their approach is not domain<br />

specific and can be applied to each area. O<strong>the</strong>r authors actively working in <strong>the</strong> area are<br />

(Aldabe, De Lacalle and Maritxalar, 2007), who are focusing on <strong>the</strong> different types <strong>of</strong><br />

question models with application primary in <strong>the</strong> language learning. We are not familiar<br />

with any related work concerning this activity for learning materials in Bulgarian except<br />

for <strong>the</strong> previous work <strong>of</strong> <strong>the</strong> author (Nikolova, 2007). So our efforts are strongly inspired<br />

by <strong>the</strong> growing interest to this field, which is due to its significant practical importance.<br />

On <strong>the</strong> o<strong>the</strong>r hand, we are motivated and encouraged by <strong>the</strong> presence <strong>of</strong> sophisticated<br />

<strong>13</strong>5


LT for Bulgarian language, which enable relatively complex text preprocessing, so <strong>the</strong><br />

automatic acquisition <strong>of</strong> learning objects from raw texts does not start from scratch.<br />

This article presents <strong>the</strong> idea <strong>of</strong> <strong>the</strong> master <strong>the</strong>sis <strong>of</strong> <strong>the</strong> author which is still work in<br />

progress. The aim is to develop a workbench supporting test designers by language technologies,<br />

applied to <strong>the</strong> instructional materials. The task has three aspects: (1) suggestion<br />

<strong>of</strong> key terms for (2) question generation and (3) distractor suggestion. For our purpose<br />

<strong>the</strong> text is preprocessed by a number <strong>of</strong> preliminary available LT modules and lexical and<br />

syntactic features are extracted and kept in meta-data format. Those features are used<br />

later on for <strong>the</strong> generation <strong>of</strong> <strong>the</strong> draft learning objects. The experiment described in <strong>the</strong><br />

article has been applied for three different domain areas Geography, Biology and History.<br />

The materials are taken from textbooks for 9th, 10th and 11th grade respectively.<br />

The remaining part <strong>of</strong> this article is organised as follows: we first sketch <strong>the</strong> general<br />

architecture <strong>of</strong> <strong>the</strong> system in section 2; in section 3 we describe <strong>the</strong> data processing;<br />

section 4 explains in detail <strong>the</strong> experiment done so far; section 5 concerns <strong>the</strong> evaluation<br />

at <strong>the</strong> current stage <strong>of</strong> <strong>the</strong> experiment; section 6 presents <strong>the</strong> conclusion and issues for<br />

future work.<br />

Figure 1: Workbench supporting <strong>the</strong> development <strong>of</strong> multiple-choice test items.<br />

2 Workbench description<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The system suggests draft learning objects to <strong>the</strong> test designers in order to help <strong>the</strong>m during<br />

<strong>the</strong> test items preparation. As shown in Fig.1 <strong>the</strong> instructional materials are supplied<br />

by <strong>the</strong> test maker. They are being preprocessed and two main data sets are created: (a)<br />

list <strong>of</strong> key terms (terms central for <strong>the</strong> text which is supplied), <strong>the</strong> way how it is built is<br />

explained later in section 4.1 and (b) lexical and syntactic information about <strong>the</strong> supplied<br />

text, which is kept in metadata format. Then <strong>the</strong> user may obtain all possible questions<br />

generated from <strong>the</strong> supplied material or <strong>the</strong> ones related to a certain key term she is interested<br />

in. If <strong>the</strong> system does not find appropriate sentences, containing <strong>the</strong> term, which<br />

match its internal question templates (explained later in section 4.2), it returns a list <strong>of</strong><br />

<strong>13</strong>6


pointers to <strong>the</strong> text, containing <strong>the</strong> local context in which <strong>the</strong> term appears and a list <strong>of</strong><br />

related concepts, generated by <strong>the</strong> same model as <strong>the</strong> distractors are.<br />

3 Data processing<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Our task is to support test makers during <strong>the</strong> process <strong>of</strong> building educational resources,<br />

namely test questions and vocabulary <strong>of</strong> important concepts for <strong>the</strong> domain. We do this<br />

by using language technologies over <strong>the</strong> raw instructional materials and obtain linguistic<br />

resources which are to be loaded into a workbench that help <strong>the</strong> test designers during <strong>the</strong>ir<br />

work. For our purpose we passed through several phases as shown in Fig. 2.<br />

Figure 2: Data processing.<br />

The instructional material is taken in a plain text format and is firstly parsed with<br />

an NP extractor, where nouns and noun phrases are obtained in order to make a list <strong>of</strong><br />

potential key terms, which are to be suggested to <strong>the</strong> test designers. By <strong>the</strong> same time<br />

when those extracted terms are marked an inverted index is produced. It contains a list<br />

<strong>of</strong> <strong>the</strong> extracted NPs (nouns and noun phrases) and <strong>the</strong>ir corresponding absolute position<br />

in <strong>the</strong> text. A threshold for <strong>the</strong> importance <strong>of</strong> <strong>the</strong> extracted terms is set and all NPs with<br />

frequency higher than <strong>the</strong> threshold are included in <strong>the</strong> list <strong>of</strong> key terms. In addition all<br />

<strong>the</strong> NPs that contain a noun which is a key term are also included in <strong>the</strong> key terms list.<br />

During <strong>the</strong> next phase <strong>the</strong> raw text is tagged for POS categories. For our case we found<br />

practical to use <strong>the</strong> SVMTool made by (Gimenéz and Márquez, 2004) which was trained<br />

over <strong>the</strong> newspaper part <strong>of</strong> BulTreeBank 1 . The proper names, recognised by <strong>the</strong> tagger<br />

were added to <strong>the</strong> list <strong>of</strong> key terms and <strong>the</strong>n <strong>the</strong> output was processed with <strong>the</strong> multilingual<br />

statistical parsing engine <strong>of</strong> Dan Bikels (Bikel, 2004), which is implementation<br />

and extension <strong>of</strong> Collins parser referred bellow as (Collins, 1999). The parsing model<br />

1 HPSG-based Syntactic Treebank <strong>of</strong> Bulgarian (BulTreeBank), http://bultreebank.org/<br />

<strong>13</strong>7


was trained on BulTreeBank. All <strong>the</strong> syntactic and lexical information obtained in <strong>the</strong>se<br />

phases is kept in meta-format and used later in order to produce draft learning objects<br />

(key terms, test items), which are suggested to <strong>the</strong> test designers.<br />

4 The experiment<br />

4.1 Key terms suggestion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We build our approach on <strong>the</strong> understanding that questions given to <strong>the</strong> learner concern<br />

terms, which are central for <strong>the</strong> domain. These are <strong>the</strong> terms, which serve as a basis<br />

for <strong>the</strong> learned material and represent a specific domain vocabulary. Here those terms<br />

are referred to as key terms. Although verbs might be also qualified as good key terms<br />

in some domains, in this experiment we pay attention only to nouns and noun phrases<br />

as potential key terms. They were extracted by <strong>the</strong> classic approach for automatic term<br />

extraction based on frequencies. In order to overcome <strong>the</strong> problem <strong>of</strong> <strong>the</strong> inflection <strong>of</strong><br />

<strong>the</strong> language <strong>the</strong> raw texts were firstly lemmatized and <strong>the</strong>n parsed with <strong>the</strong> NP-extractor<br />

Morena. Once we obtained a list <strong>of</strong> nouns (LN) and noun phrases (LNP) we had to<br />

rank <strong>the</strong>m in order to extract only <strong>the</strong> most important ones which are <strong>the</strong> focus <strong>of</strong> our<br />

approach and users queries. We applied two different techniques for measuring <strong>the</strong> term<br />

importance over LN: simple frequency counting and TF-IDF measuring. As reported by<br />

(Mitkov et al., 2006), we also noticed that TF-IDF produces worse results as it tends to<br />

give low score to frequently used words (for example ���������� - economy) which are<br />

actually quite important in <strong>the</strong> case <strong>of</strong> instructional materials (it is common to repeat <strong>the</strong><br />

same information to <strong>the</strong> learners in order to force <strong>the</strong>m to better remember it). At <strong>the</strong><br />

same time sorting <strong>the</strong> list <strong>of</strong> nouns by <strong>the</strong>ir frequencies, after removing <strong>the</strong> stop words,<br />

gave us quite satisfying results.<br />

Word frequency fi Number <strong>of</strong> words wf with frequency fi<br />

55 1<br />

46 1<br />

22 6<br />

20 1<br />

18 1 wf ≤ fi<br />

16 1<br />

14 1<br />

12 5<br />

10 5<br />

8 6<br />

6 8<br />

4 44 wf ≥ fi<br />

2 174<br />

Table 1: Word frequency distribution in a text with length about 1000 words.<br />

To set <strong>the</strong> threshold for important and less important terms in previous experiments<br />

we have observed already prepared test items, prepared manually by <strong>the</strong> test designers,<br />

concerning <strong>the</strong> same material as <strong>the</strong> corpora we are processing. The test items were parsed<br />

with an NP extractor. We checked <strong>the</strong> popularity <strong>of</strong> <strong>the</strong> NPs, extracted from <strong>the</strong> test items,<br />

in <strong>the</strong> whole corpus and <strong>the</strong> lowest popularity was accepted as a threshold. After repeating<br />

<strong>the</strong> same procedure for different domain corpora we noticed that <strong>the</strong> importance border<br />

is near <strong>the</strong> term frequency, which equals to <strong>the</strong> number <strong>of</strong> words having that count. For<br />

<strong>13</strong>8


example in a comparatively short text we have <strong>the</strong> following figure (Table 1) where <strong>the</strong><br />

threshold is set to frequency f = 7.<br />

Once adjusted <strong>the</strong> threshold, we consider all <strong>the</strong> terms above it as key terms which<br />

should be suggested to <strong>the</strong> test-makers. Now we add all NPs, which contained key terms<br />

to <strong>the</strong> list <strong>of</strong> key terms. For example: along with <strong>the</strong> term (economy) from <strong>the</strong> materials<br />

in geography we add <strong>the</strong> following NPs:<br />

������� ���������� (rural economy),<br />

�������� ���������� (world economy),<br />

���������� ���������� (national economy),<br />

������� ���������� (market economy),<br />

���������� ������� ���������� (national market economy),<br />

���������� �������� ���������� (contemporary world economics),<br />

������� ���������� (Japanese economy),<br />

��������� ���������� (natural economy),<br />

���������� ������� ������� ���������� (contemporary modern rural economy)<br />

Removing <strong>the</strong> NPs containing stop words prevented <strong>the</strong> use <strong>of</strong> phrases like �������<br />

���������� (<strong>the</strong>ir economy). After <strong>the</strong> POS tagging <strong>the</strong> recognised proper nouns were<br />

also added to <strong>the</strong> list <strong>of</strong> key terms and <strong>the</strong> final list <strong>of</strong> key terms was formed.<br />

4.2 Question generation<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In order to filter out clauses which are appropriate for question generation a module processes<br />

<strong>the</strong> lexico-syntactic information collected during <strong>the</strong> preprocessing phase and decides<br />

that a clause is eligible if:<br />

(1) it contains at least one key term,<br />

(2) <strong>the</strong> term is in a NPA clause <strong>of</strong> its VPS 2 (<strong>the</strong> NPA clause is <strong>the</strong> subject daughter <strong>of</strong><br />

VPS phrase) and<br />

(3) <strong>the</strong> clause is finite.<br />

If <strong>the</strong> three conditions are present, we consider that <strong>the</strong> term is in <strong>the</strong> subject phrase in<br />

<strong>the</strong> sentence, which means that it is has central meaning for <strong>the</strong> sentence and we apply a<br />

rule which replaces <strong>the</strong> focal term with a blank. The system additionally checks whe<strong>the</strong>r<br />

<strong>the</strong> sentences do not point to some figures or tables, appendixes.<br />

For example in <strong>the</strong> materials <strong>of</strong> Biology <strong>the</strong> terms �������������� (heredity) and<br />

������������ (inheritance) are key terms. And we have <strong>the</strong> following information about<br />

<strong>the</strong> constituents for one <strong>of</strong> <strong>the</strong> sentences which contain <strong>the</strong> terms.<br />

(S (VPS (NPA (N (NN ������������)) (PP (Prep (IN ��)) (Ncfsd ����������������) (CoordP (Conj (C (CC<br />

�)) (Ncnsd ��������������)) (ConjArg (NPA (N (NN ��������)) (PP (Prep (IN �)) (N (NN ���������))))))))<br />

(VPC (V (T (RP ��)) (Pron (Ppxta ��)) (V (VB ��������))) (NPA (A (JJ �����)) (N (NN �����))))) (PUNC .))<br />

Whichever <strong>of</strong> both terms is chosen by <strong>the</strong> user <strong>the</strong> system will try to produce a stem<br />

from this sentence because it satisfies <strong>the</strong> three necessary conditions. So it will replace<br />

<strong>the</strong> suggested key term with a blank and suggest <strong>the</strong> key term as an answer.<br />

E.g. ������������ �� ��� � �������������� �������� � ��������� �� �� �������� ����� ������<br />

(Due to ... and inheritance <strong>the</strong> species remain unchanged for long periods.)<br />

2 NPA - head-adjunct noun phrase / VPS -head-subject verb phrase for full definitions - HPSGbased<br />

Syntactic Treebank <strong>of</strong> Bulgarian (BulTreeBank), BulTreeBank Project Technical Report 05. 2004,<br />

http://bultreebank.org/TechRep/BTB-TR05.pdf<br />

<strong>13</strong>9


correct answer: ����������������(<strong>the</strong> heredity)<br />

In <strong>the</strong> following sentence, again <strong>the</strong> key term �������������� is present.<br />

(S (VPS (NPA (CoordP (ConjArg (NPA (N (NN �����������)) (PP (Prep (IN ��)) (Ncfsd ����������������)<br />

(CoordP (Conj (C (CC �))) (Ncfsd �������������))))) (Conj (C (CC �))) (ConjArg (N (NN ������������)))) (PP<br />

(Prep (IN )) (Ncmpd ) (Pron (Ppetdp3 )))) (VPC (V (VB )) (NPA (A (JJ )) (N (NN )) (IN ))) (Ncfsd )) (PUNC .))<br />

The term is a part <strong>of</strong> <strong>the</strong> subject phrase, so it is possible to make a fill in <strong>the</strong> blank<br />

question, where <strong>the</strong> blank will replace <strong>the</strong> focal term ����������������.<br />

����������� �� ��� � ������������� � ������������ �� ���������������� �� �� ��������� ������<br />

�� �����������<br />

(The study <strong>of</strong> ... and variability and <strong>the</strong> discovery <strong>of</strong> <strong>the</strong>ir regularities are <strong>the</strong> basic tasks <strong>of</strong> genetics.)<br />

correct answer: ����������������(heredity)<br />

Except for <strong>the</strong> change <strong>of</strong> <strong>the</strong> focal term with a blank, we do not apply any o<strong>the</strong>r transformation<br />

to <strong>the</strong> chosen sentence.<br />

4.3 Distractor generation<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

For <strong>the</strong> purpose <strong>of</strong> our application we need to suggest distractors in two cases: (1) when<br />

questions are generated automatically and (2) when a key term was chosen by <strong>the</strong> designer,<br />

but no questions could be generated for that key term, <strong>the</strong>n only related concepts<br />

are shown to <strong>the</strong> user (<strong>the</strong>y are extracted by <strong>the</strong> same principle as distractors and that is<br />

why we explain <strong>the</strong>ir construction in this section).<br />

In <strong>the</strong> well-designed multiple-choice tests, <strong>the</strong> distractors are always semantically close<br />

to <strong>the</strong> correct answer (as well as to each o<strong>the</strong>r, in a sense). To find such distractors<br />

in previous studies we have tried paragraph clustering in order to define groups <strong>of</strong> text<br />

sections which have similar topics, but in short text this methodology does not give a<br />

promising result. Because <strong>of</strong> that we chose a ra<strong>the</strong>r simple working solution. We observed<br />

already prepared tests for beginners level and we noticed that most <strong>of</strong> <strong>the</strong> distractors<br />

looked very similar in first sight. They were mostly phrases holding <strong>the</strong> same noun and<br />

different modifiers or <strong>the</strong> opposite, composed by <strong>the</strong> same modifier and different nouns.<br />

That is why we accepted <strong>the</strong> practice to suggested as distractors NPs, which contain <strong>the</strong><br />

same noun, which <strong>the</strong> key term chosen by <strong>the</strong> user contains, but we change <strong>the</strong> modifier<br />

<strong>of</strong> <strong>the</strong> phrase. And also <strong>the</strong> o<strong>the</strong>r way around, we change <strong>the</strong> noun <strong>of</strong> <strong>the</strong> chosen key<br />

term and suggest phrases with <strong>the</strong> same modifier and different noun. All <strong>the</strong>se phrases are<br />

taken from <strong>the</strong> NP list generated in <strong>the</strong> first stage.<br />

140


Such an example is:<br />

Constant modifier Constant noun<br />

�������� �������� (natural complex) ������� ���������� (rural economy)<br />

�������� ���� (natural zone) �������� ���������� (world economy)<br />

�������� ��������� (natural component) ���������� ���������� (national economy)<br />

5 Evaluation<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

At <strong>the</strong> current stage <strong>the</strong> system has been tested by three teachers, who are pr<strong>of</strong>essional<br />

test designers. Each one <strong>of</strong> <strong>the</strong>m is a specialist in one <strong>of</strong> <strong>the</strong> three areas and has a degree<br />

also in one <strong>of</strong> <strong>the</strong> o<strong>the</strong>rs. They have experimented with materials in <strong>the</strong> three domains<br />

biology, geography and history. Each designer had to choose 20 key terms in total and<br />

to evaluate with a YES/NO mark (YES - acceptable question with or without need to be<br />

changed; NO - not acceptable question) <strong>the</strong> questions produced by <strong>the</strong> system, related to<br />

<strong>the</strong> chosen key terms.<br />

From <strong>the</strong> materials in biology and geography useful definitions were extracted and<br />

<strong>the</strong>y were appreciated by <strong>the</strong> designers while for <strong>the</strong> history domain mainly proper names<br />

were helpful. In total <strong>the</strong> average <strong>of</strong> <strong>the</strong> generated fill in <strong>the</strong> blank questions reported as<br />

acceptable by <strong>the</strong> designers were 61% (with or without post-editing). The pr<strong>of</strong>essionals<br />

shared that <strong>the</strong> context and <strong>the</strong> distractors have helped <strong>the</strong>m a lot, because <strong>the</strong>y gave <strong>the</strong>m<br />

more options to seek for <strong>the</strong> needed information in order to correct a not well-formed<br />

question. The reasons for discarding <strong>the</strong> rest <strong>of</strong> <strong>the</strong> questions were mainly that some<br />

<strong>of</strong> <strong>the</strong> sentences had common meaning and did not represent specific definition; some<br />

o<strong>the</strong>rs were discarded because <strong>the</strong> blank was ambiguous - <strong>the</strong>y had two many possible<br />

options for a correct answer; or <strong>the</strong> chosen term was not central for <strong>the</strong> sentence which<br />

was chosen.<br />

The designers were especially satisfied with <strong>the</strong> high quality <strong>of</strong> <strong>the</strong> key terms which<br />

served as a cross-reference over <strong>the</strong> whole material. They find <strong>the</strong>m useful in order to<br />

systematise <strong>the</strong> topics on which <strong>the</strong> student could be examined. In this way <strong>the</strong>y saved<br />

<strong>the</strong>m time, because <strong>the</strong>y could use <strong>the</strong> vocabulary <strong>of</strong> key terms as a summary <strong>of</strong> <strong>the</strong><br />

contents. Deeper analysis <strong>of</strong> <strong>the</strong> speeding-up <strong>of</strong> <strong>the</strong> process will be done after improving<br />

<strong>the</strong> user interface <strong>of</strong> <strong>the</strong> system.<br />

The test designers were certain that <strong>the</strong> so-prepared question items are useful only in<br />

<strong>the</strong> case <strong>of</strong> beginner level testing, where deep understanding is not required and learners<br />

are taught mostly basic definitions.<br />

6 Conclusion and future work<br />

This experiment represents a step towards <strong>the</strong> automatic test generation and it shows <strong>the</strong><br />

advances gained using more sophisticated tools and deeper processing <strong>of</strong> <strong>the</strong> instructional<br />

materials.<br />

Although <strong>the</strong> approach is considered as domain independent we consider Biology and<br />

Geography more suitable, producing better results than History. One <strong>of</strong> <strong>the</strong> reasons is that<br />

in history pure definitions in one sentence are hardly found and normally many references<br />

141


are used. In this domain important role had <strong>the</strong> proper names which were also included<br />

in <strong>the</strong> list <strong>of</strong> key terms.<br />

As this article represents a work in progress we plan to go deeper in <strong>the</strong> data analysis<br />

by adding dependency parsing. Then we can observe <strong>the</strong> subject and object clauses and<br />

make additional inferences. We will also try different techniques for distractor selection,<br />

such as using term similarity measures over <strong>the</strong> corpus and different types <strong>of</strong> questions.<br />

We plan to improve <strong>the</strong> user interface, because it is a main issue, which concerns <strong>the</strong><br />

efficiency <strong>of</strong> <strong>the</strong> work <strong>of</strong> <strong>the</strong> test designers. Overall we plan deeper evaluation <strong>of</strong> <strong>the</strong><br />

system,including Classical test <strong>the</strong>ory and error analysis in order to improve <strong>the</strong> produced<br />

items.<br />

7 Acknowledgements<br />

My complements go to my supervisor Galia Angelova and for Atanas Chanev who kindly<br />

provided models for <strong>the</strong> SVMTool and Dan Bikel’s parser for Bulgarian.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Aldabe, I., De Lacalle, M. L. and Maritxalar, M. (2007). Automatic acquisition <strong>of</strong> didactic<br />

resources: generating test-based questions, in I. F. de Castro (ed.), Proceeding <strong>of</strong><br />

SINTICE 07, pp. 105–111.<br />

Bikel, D. (2004). A distributional analysis <strong>of</strong> a lexicalized statistical parsing model, in<br />

D. Lin and D. Wu (eds), <strong>Proceedings</strong> <strong>of</strong> EMNLP.<br />

URL: http://www.cis.upenn.edu/ dbikel/s<strong>of</strong>tware.htmlstat-parser<br />

Collins, M. (1999). Head-Driven Statistical Models for Natural Language Parsing, PhD<br />

<strong>the</strong>sis, University <strong>of</strong> Pennsylvania.<br />

Gimenéz, J. and Márquez, L. (2004). Svmtool: A general pos tagger generator based on<br />

support vector machines, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 4th International Conference LREC’04.<br />

Mitkov, R., Ha, L. A. and Karamis, N. (2006). A computer-aided environment for generating<br />

multiple-choice test items, Natural Language Engineering 12.: 177–194.<br />

Nikolova, I. (2007). Supporting <strong>the</strong> development <strong>of</strong> multiple-choice tests in bulgarian<br />

by language technologies, in E. Paskaleva and M. Slavcheva (eds), <strong>Proceedings</strong> <strong>of</strong><br />

<strong>the</strong> Workshop A Common Natural Language Processing Paradigm for Balkan Languages,<br />

pp. 31–34.<br />

142


WORD SPACE MODELS OF<br />

SEMANTIC SIMILARITY AND RELATEDNESS<br />

Yves Peirsman<br />

University <strong>of</strong> Leuven & Research Foundation – Flanders<br />

Abstract. Word Space Models provide a convenient way <strong>of</strong> modelling word meaning in<br />

terms <strong>of</strong> a word’s contexts in a corpus. This paper investigates <strong>the</strong> influence <strong>of</strong> <strong>the</strong> type <strong>of</strong><br />

context features on <strong>the</strong> kind <strong>of</strong> semantic information that <strong>the</strong> models capture. In particular,<br />

we make a distinction between semantic similarity and semantic relatedness. It is shown<br />

that <strong>the</strong> strictness <strong>of</strong> <strong>the</strong> context definition correlates with <strong>the</strong> models’ ability to identify<br />

semantically similar words: syntactic approaches perform better than bag-<strong>of</strong>-word models,<br />

and small context windows are better than larger ones. For semantic relatedness, however,<br />

syntactic features and small context windows are at a clear disadvantage. Second-order bag<strong>of</strong>-word<br />

models perform below average across <strong>the</strong> board.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Word Space Models have become <strong>the</strong> standard approach to <strong>the</strong> computational modelling<br />

<strong>of</strong> lexical semantics (Landauer and Dumais, 1997; Lin, 1998; Schütze, 1998; Padó and<br />

Lapata, 2007). They indeed <strong>of</strong>fer a convenient way <strong>of</strong> capturing <strong>the</strong> meaning <strong>of</strong> a word<br />

simply on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> contexts in which it is used in a corpus. In that way, <strong>the</strong>y can<br />

retrieve <strong>the</strong> most similar words for a given target word. Yet, <strong>the</strong>re is no agreement on how<br />

context should be defined exactly. Context features vary from sentences or paragraphs to<br />

single words, with or without <strong>the</strong> addition <strong>of</strong> syntactic relations. While all <strong>the</strong>se features<br />

definitely capture some semantic information, it is only to be expected that <strong>the</strong> choice <strong>of</strong><br />

context definition has an influence on <strong>the</strong> kind <strong>of</strong> semantic relatives that <strong>the</strong> Word Space<br />

Models will find.<br />

It is well known that words may be semantically related along a number <strong>of</strong> dimensions<br />

(Cruse, 1986). In <strong>the</strong> NLP literature, similarity takes up a central position, with synonymy<br />

as <strong>the</strong> most obvious example. But <strong>the</strong>re are o<strong>the</strong>r types <strong>of</strong> semantic relations, too. For<br />

instance, two words like doctor and hospital have a clear connection, although <strong>the</strong>y are<br />

in no way semantically similar. Recovering this semantic relatedness from a corpus may<br />

have to proceed along different lines than <strong>the</strong> modelling <strong>of</strong> semantic similarity. Specific<br />

Word Space Models may thus have a bias towards one or <strong>the</strong> o<strong>the</strong>r <strong>of</strong> <strong>the</strong>se relations. In <strong>the</strong><br />

literature, however, <strong>the</strong> investigation <strong>of</strong> this semantic behaviour <strong>of</strong> Word Space Models<br />

has only recently come to <strong>the</strong> fore (Sahlgren, 2006; Peirsman, Heylen and Speelman,<br />

2007).<br />

In this paper, we investigate eleven Word Space Models, representing three broad<br />

classes, with respect to <strong>the</strong>ir performance in <strong>the</strong> fields <strong>of</strong> semantic similarity and semantic<br />

relatedness. It will be shown that <strong>the</strong>re is no such thing as a single best Word Space<br />

Model: <strong>the</strong> ranking <strong>of</strong> <strong>the</strong> approaches depends on <strong>the</strong> type <strong>of</strong> semantic information we<br />

want to find. The paper is structured as follows: in <strong>the</strong> next section, we will introduce <strong>the</strong><br />

different context models and <strong>the</strong> two types <strong>of</strong> semantic relationship that we investigate.<br />

Section 3 <strong>the</strong>n presents <strong>the</strong> precise setup <strong>of</strong> our experiments, while section 4 discusses<br />

<strong>the</strong>ir results. Section 5 wraps up with conclusions and an outlook for future research.<br />

143


2 Word Space Models<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

2.1 Competing definitions <strong>of</strong> context<br />

All Word Space Models <strong>of</strong> lexical semantics rely on <strong>the</strong> so-called distributional hypo<strong>the</strong>sis<br />

(Harris, 1954), which claims that words with similar meanings occur in similar contexts.<br />

From this hypo<strong>the</strong>sis, it follows that semantic similarity can be modelled in terms <strong>of</strong><br />

contextual or distributional similarity. This is done by constructing for each target word<br />

a so-called context vector, which contains <strong>the</strong> scores <strong>of</strong> its target word for all possible<br />

context features. These scores can be <strong>the</strong> number <strong>of</strong> times that <strong>the</strong> contextual feature<br />

co-occurs with <strong>the</strong> target, or more <strong>of</strong>ten, some kind <strong>of</strong> weighted frequency that captures<br />

<strong>the</strong> statistical link between <strong>the</strong> target word and that feature. The distributional similarity<br />

between two words is <strong>the</strong>n calculated as <strong>the</strong> similarity between <strong>the</strong>ir vectors, on <strong>the</strong> basis<br />

<strong>of</strong> a function like <strong>the</strong> cosine. In this way, it is possible to find for each target word <strong>the</strong> n<br />

most distributionally similar words in any given corpus. We call <strong>the</strong>se words <strong>the</strong> nearest<br />

neighbours <strong>of</strong> <strong>the</strong> target.<br />

Based on <strong>the</strong> definition <strong>of</strong> context, it is possible to define a hierarchy <strong>of</strong> Word Space<br />

Models, each with its own kind <strong>of</strong> contextual features. At <strong>the</strong> top <strong>of</strong> <strong>the</strong> tree we make a distinction<br />

between document-based and word-based approaches. Document-based models<br />

use sentences, paragraphs or documents as dimensions, and count how <strong>of</strong>ten a target word<br />

appears in each <strong>of</strong> <strong>the</strong>se entities in <strong>the</strong> corpus (Landauer and Dumais, 1997; Sahlgren,<br />

2006). Word-based models, by contrast, take not <strong>the</strong> context itself, but features from this<br />

context as dimensions. They can be subdivided into syntactic and bag-<strong>of</strong>-word models.<br />

So-called bag-<strong>of</strong>-word or co-occurrence models take into account all words within a predefined<br />

distance <strong>of</strong> <strong>the</strong> target word (generally with <strong>the</strong> exception <strong>of</strong> semantically empty<br />

words like articles, etc.), whereas syntactic models consider only those words to which<br />

<strong>the</strong> target is syntactically related. Sometimes <strong>the</strong> features <strong>of</strong> such syntactic models consist<br />

<strong>of</strong> <strong>the</strong>se syntactically related words alone (Padó and Lapata, 2007), sometimes <strong>the</strong>y are<br />

formed by <strong>the</strong> word plus its relation (Lin, 1998). Finally we can distinguish between firstorder<br />

and second-order approaches. First-order bag-<strong>of</strong>-word approaches count <strong>the</strong> context<br />

words directly (Levy and Bullinaria, 2001), while second-order bag-<strong>of</strong>-word approaches<br />

sum <strong>the</strong> vectors <strong>of</strong> <strong>the</strong>se context words. In this last case, <strong>the</strong> target’s context vector thus<br />

contains frequency information about <strong>the</strong> context words <strong>of</strong> its (first-order) context words<br />

(Schütze, 1998). Although it is in principle possible to construct second-order syntactic<br />

models, to our knowledge no implementation has been presented in <strong>the</strong> literature.<br />

2.2 Semantic similarity and semantic relatedness<br />

While it is claimed that all Word Space Models capture some kind <strong>of</strong> semantic information,<br />

so far we have only very limited knowledge about <strong>the</strong> influence <strong>of</strong> <strong>the</strong> context<br />

definition on <strong>the</strong> types <strong>of</strong> semantic relationship that <strong>the</strong> models find. In this paper we<br />

investigate two such types: semantic similarity and semantic relatedness. The first applies<br />

to synonyms (e.g., plane and airplane), hyponyms and hypernyms (e.g., bird and<br />

blackbird) and co-hyponyms (e.g., blackbird and robin) — two words with a relationship<br />

<strong>of</strong> similarity between <strong>the</strong> concepts <strong>the</strong>y refer to. Semantic relatedness, by contrast, exists<br />

between words whose concepts are not necessarily similar, but still related, for instance<br />

because <strong>the</strong>y belong to <strong>the</strong> same script, frame or lexical field. This is true for pairs like<br />

bird and beak or plane and pilot. Note that it is not possible to draw a clear boundary be-<br />

144


tween semantic similarity and semantic relatedness. Take <strong>the</strong> word pair pepper–salt, for<br />

instance. These two words are clearly semantically similar, since <strong>the</strong>y both refer to spices.<br />

At <strong>the</strong> same time, however, <strong>the</strong>y are also semantically related: not only do <strong>the</strong>y both belong<br />

to <strong>the</strong> lexical fields <strong>of</strong> food or spices, <strong>the</strong>y also <strong>of</strong>ten co-occur toge<strong>the</strong>r in <strong>the</strong> phrase<br />

salt and pepper. Instead <strong>of</strong> mutually exclusive classes, semantic similarity and relatedness<br />

can thus better be thought <strong>of</strong> as <strong>the</strong> two ends <strong>of</strong> a continuum, or two perpendicular<br />

axes in a two-dimensional plane.<br />

For many NLP applications, similarity might be <strong>the</strong> most important relation to model.<br />

In typical Query Expansion, for instance, only semantically similar words (synonyms or<br />

possibly hyponyms) make for a desired extension <strong>of</strong> a search query. Similarly, in Question<br />

Answering a word in <strong>the</strong> question should only be matched with semantically similar<br />

words in <strong>the</strong> database where <strong>the</strong> computer looks for <strong>the</strong> answer. Semantic similarity, however,<br />

is just one way in which words may be related in our mental lexicon, as suggested<br />

by psycholinguistic association experiments. According to Aitchinson (2003), <strong>the</strong> four<br />

major types <strong>of</strong> associations that people give in response to a cue word are, in order <strong>of</strong><br />

frequency, co-ordination (co-hyponyms like pepper and salt), collocation (like salt and<br />

water), superordination (hypernyms like butterfly and insect) and synonymy (like starved<br />

and hungry). A similar observation is made by Schulte im Walde and Melinger (2005).<br />

Comparing <strong>the</strong> results <strong>of</strong> <strong>the</strong>ir German verb association experiment with GermaNet, <strong>the</strong>y<br />

note that only 6% <strong>of</strong> <strong>the</strong> associations are synonyms, 14% are hypernyms and 16% are<br />

hyponyms, while no less than 54% <strong>of</strong> <strong>the</strong> associations are unrelated to <strong>the</strong>ir cue words in<br />

<strong>the</strong> GermaNet taxonomy. Although part <strong>of</strong> this can be explained by <strong>the</strong> incompleteness<br />

<strong>of</strong> <strong>the</strong> database, such results will be difficult to replicate with models <strong>of</strong> semantic similarity.<br />

After all, <strong>the</strong>se are meant to prefer synonyms over hypernyms and co-hyponyms, and<br />

even exclude collocates altoge<strong>the</strong>r. The best Word Space Models <strong>of</strong> semantic similarity<br />

may thus not be <strong>the</strong> best models <strong>of</strong> relatedness, and vice versa.<br />

Despite <strong>the</strong> wealth <strong>of</strong> research into Word Space Models, studies into <strong>the</strong>ir semantic<br />

characteristics are scarce. Most <strong>of</strong>ten one model is applied to a specific computationallinguistic<br />

task, and “comparisons between <strong>the</strong> (...) models have been few and far between<br />

in <strong>the</strong> literature” (Padó and Lapata, 2007, p. 166). Sahlgren (2006) is one exception to this<br />

rule. Focusing on document-based and first-order bag-<strong>of</strong>-word models, he showed that <strong>the</strong><br />

latter are better geared towards <strong>the</strong> modelling <strong>of</strong> paradigmatic (similarity) relations, while<br />

<strong>the</strong> former have a clear bias towards syntagmatic relations. Unfortunately, Sahlgren left<br />

out a number <strong>of</strong> popular word space approaches, like those based on syntactic relations or<br />

second-order co-occurrences. Peirsman et al. (2007) also included syntactic models, but<br />

concentrated on similarity relations only. This article thus sets out to fill <strong>the</strong>se gaps in <strong>the</strong><br />

literature, by discussing a wide variety <strong>of</strong> model types from <strong>the</strong> perspectives <strong>of</strong> similarity<br />

as well as relatedness.<br />

3 Experimental setup<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We investigate three classes <strong>of</strong> Word Space Models, for a total <strong>of</strong> eleven approaches: five<br />

first-order bag-<strong>of</strong>-word models, five second-order bag-<strong>of</strong>-word models and one syntactic<br />

model. Our corpus is <strong>the</strong> 300 million word Twente Nieuws Corpus <strong>of</strong> Dutch newspaper<br />

articles, collected at <strong>the</strong> University <strong>of</strong> Twente and parsed by <strong>the</strong> Alpino parser at <strong>the</strong><br />

University <strong>of</strong> Groningen. As our test set, we selected from this corpus <strong>the</strong> 10,000 most<br />

145


frequent nouns. For each <strong>of</strong> <strong>the</strong>se, we had all models retrieve <strong>the</strong> 100 most similar neighbours<br />

from <strong>the</strong> 9,999 remaining nouns in <strong>the</strong> set.<br />

The bag-<strong>of</strong>-word models, both first-order and second-order, varied <strong>the</strong> size <strong>of</strong> <strong>the</strong> context<br />

window <strong>the</strong>y took into account — 1, 3, 5, 10 or 20 words to ei<strong>the</strong>r side <strong>of</strong> <strong>the</strong> target —<br />

for a total <strong>of</strong> ten models. Sentence boundaries were ignored; article boundaries were not.<br />

The syntactic model considered eight different types <strong>of</strong> syntactic dependency relations,<br />

in which <strong>the</strong> target word could be (1) <strong>the</strong> subject <strong>of</strong> verb v, (2) <strong>the</strong> direct object <strong>of</strong> verb<br />

v, (3) a prepositional complement <strong>of</strong> verb v introduced by preposition p, (4) <strong>the</strong> head <strong>of</strong><br />

an adverbial prepositional phrase (PP) <strong>of</strong> verb v introduced by preposition p, (5) modified<br />

by adjective a, (6) postmodified by a PP with head n introduced by preposition p,<br />

(7) modified by an apposition with head n, or (8) coordinated with head n. Each specific<br />

instantiation <strong>of</strong> <strong>the</strong> variables v, p, a, or n was responsible for a new context feature.<br />

The o<strong>the</strong>r parameter settings were shared by all eleven models:<br />

• Dimensionality: For all approaches, we used <strong>the</strong> 2,000 most frequent contextual<br />

features in <strong>the</strong> corpus as dimensions. This is a simple but common way <strong>of</strong> reducing<br />

<strong>the</strong> o<strong>the</strong>rwise huge dimensionality <strong>of</strong> <strong>the</strong> vectors, which leads to state-<strong>of</strong>-<strong>the</strong>-art<br />

results, particularly for <strong>the</strong> syntactic model (Levy and Bullinaria, 2001; Padó and<br />

Lapata, 2007). For <strong>the</strong> syntactic model <strong>the</strong>se dimensions are <strong>the</strong> 2,000 most frequent<br />

syntactic features, like subj <strong>of</strong> fly. For <strong>the</strong> bag-<strong>of</strong>-word models, <strong>the</strong>y are<br />

formed by <strong>the</strong> 2,000 most frequent words in <strong>the</strong> corpus. Function words and o<strong>the</strong>r<br />

semantically empty words were excluded a priori on <strong>the</strong> basis <strong>of</strong> a stop list.<br />

• Frequency cut-<strong>of</strong>f: Depending on <strong>the</strong> context size, we established a cut-<strong>of</strong>f value n,<br />

so that <strong>the</strong> models ignored those features that occurred toge<strong>the</strong>r with <strong>the</strong> target fewer<br />

than n times. For context size 3, this cut-<strong>of</strong>f was fixed at 3, for <strong>the</strong> larger context<br />

sizes it lay at 5. The syntactic model and <strong>the</strong> bag-<strong>of</strong>-word model with context size<br />

1 did not use a cut-<strong>of</strong>f, since it led to data sparseness.<br />

• Frequency weighting: As is usual in <strong>the</strong> literature, <strong>the</strong> context vectors <strong>of</strong> <strong>the</strong> target<br />

words did not contain <strong>the</strong> simple frequencies <strong>of</strong> <strong>the</strong> features. Instead, <strong>the</strong>y listed<br />

<strong>the</strong> point-wise mutual information between each feature and <strong>the</strong> target word. This<br />

measure expresses whe<strong>the</strong>r <strong>the</strong> two occur toge<strong>the</strong>r more or less <strong>of</strong>ten in <strong>the</strong> corpus<br />

than we expect on <strong>the</strong> basis <strong>of</strong> <strong>the</strong>ir individual relative frequencies.<br />

• Similarity measure: Finally, <strong>the</strong> distributional similarity between two target words<br />

was measured by <strong>the</strong> cosine between <strong>the</strong>ir context vectors.<br />

4 Results<br />

4.1 Semantic similarity<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We evaluated <strong>the</strong> ability <strong>of</strong> our models to find semantically similar words on <strong>the</strong> basis <strong>of</strong><br />

a comparison with Dutch EuroWordNet (Vossen, 1998). This lexical database contains<br />

more than 34,000 sets <strong>of</strong> noun synonyms and <strong>the</strong> relations that exist between <strong>the</strong>m. Two<br />

evaluation measures were applied. First, we focused on <strong>the</strong> general ability <strong>of</strong> our models<br />

to capture semantic similarity. Then we looked into <strong>the</strong> distribution <strong>of</strong> four more specific<br />

similarity relations.<br />

146


wu & palmer<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />

word space models<br />

Figure 1: Wu & Palmer similarity scores between target and nearest neighbour.<br />

syn: syntactic model, cn: first-order bag-<strong>of</strong>-words, ccn: second-order bag-<strong>of</strong>-words<br />

n: context size (number <strong>of</strong> words on ei<strong>the</strong>r side <strong>of</strong> target)<br />

The general performance <strong>of</strong> <strong>the</strong> models was quantified by <strong>the</strong> average Wu and Palmer<br />

score between a target word and its single nearest neighbour (Wu and Palmer, 1994). This<br />

Wu and Palmer score is a popular way <strong>of</strong> measuring <strong>the</strong> semantic similarity between two<br />

words, based on <strong>the</strong>ir depth and <strong>the</strong>ir distance from each o<strong>the</strong>r in a taxonomic structure<br />

like EuroWordNet. If ei<strong>the</strong>r <strong>the</strong> target or its nearest neighbour were not present in <strong>the</strong><br />

database, <strong>the</strong> pair was simply ignored. In order to make <strong>the</strong> results perfectly comparable<br />

across models, we restricted <strong>the</strong> results to <strong>the</strong> 4183 target words with a nearest neighbour<br />

in EuroWordNet for all models. The resulting Wu and Palmer scores are given in Figure 1.<br />

Figure 1 shows a clear decrease in Wu and Palmer score as <strong>the</strong> definition <strong>of</strong> context<br />

becomes less strict. A Friedman test indeed confirms <strong>the</strong> influence <strong>of</strong> <strong>the</strong> type <strong>of</strong> Word<br />

Space Model on performance (Friedman chi-squared = 3541.575, df = 10, p-value <<br />

.001). The syntactic model achieves <strong>the</strong> highest average similarity score by far, followed<br />

by <strong>the</strong> first-order bag-<strong>of</strong>-word models and finally <strong>the</strong> second-order bag-<strong>of</strong>-word models.<br />

Moreover, small contexts appear to model semantic similarity better than large ones. A<br />

test <strong>of</strong> multiple comparisons after Friedman showed that <strong>the</strong> differences between all pairs<br />

<strong>of</strong> models are indeed statistically significant at <strong>the</strong> .05 level, except for those between<br />

context sizes 1 and 3 (both first-order and second-order) and that between <strong>the</strong> first-order<br />

model with context size 20 and <strong>the</strong> second-order model with context size 5.<br />

Of course, this general similarity score does not give any information about what specific<br />

type <strong>of</strong> similarity relation <strong>the</strong> models find. We <strong>the</strong>refore defined four taxonomic<br />

similarity relations, again with EuroWordNet as a gold standard. Synonyms were defined<br />

as words in <strong>the</strong> same synonym set as <strong>the</strong> target word, hypernyms as words exactly one<br />

node above <strong>the</strong> target, hyponyms those one node below and co-hyponyms as words one<br />

node below any <strong>of</strong> <strong>the</strong> target’s hypernyms. Toge<strong>the</strong>r, <strong>the</strong>se relations make up <strong>the</strong> target’s<br />

EuroWordNet environment. Note that our strict definition <strong>of</strong> <strong>the</strong>se relationships does not<br />

147


frequency<br />

0 500 1000 1500 2000 2500<br />

0.512<br />

0.384 0.405<br />

0.369<br />

0.327<br />

0.264<br />

0.25<br />

syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />

word space models<br />

0.273 0.247<br />

0.217<br />

cohyponym<br />

hyperonym<br />

hyponym<br />

synonym<br />

0.185<br />

Figure 2: Distribution <strong>of</strong> semantic similarity relations for all models.<br />

allow for more than one or two steps in <strong>the</strong> tree, and thus disregards possible hypernyms<br />

or hyponyms that are more than one step away from <strong>the</strong> target. This approach ensures <strong>the</strong><br />

reliability <strong>of</strong> our gold standard, but constitutes a test that a relatively low percentage <strong>of</strong><br />

nearest neighbours will pass. Figure 2 shows how <strong>the</strong> single nearest neighbours <strong>of</strong> our target<br />

words are distributed over <strong>the</strong> four similarity relations. Again we restricted ourselves<br />

to <strong>the</strong> 4183 target words with a neighbour in EuroWordNet for all models.<br />

Not surprisingly, <strong>the</strong> number <strong>of</strong> retrieved similarity relations mirrors <strong>the</strong> general Wu<br />

and Palmer similarity score. Again <strong>the</strong> syntactic model performs best: 51.2% <strong>of</strong> its single<br />

nearest neighbours that occur in EuroWordNet are situated in <strong>the</strong> environment <strong>of</strong> <strong>the</strong> target<br />

word. This precision drops to between 40.5% and 26.4% for <strong>the</strong> first-order bag-<strong>of</strong>-word<br />

methods and even lower for <strong>the</strong> second-order models. As above, <strong>the</strong> performance <strong>of</strong><br />

<strong>the</strong> models seems to depend on <strong>the</strong> strictness <strong>of</strong> <strong>the</strong>ir context definition. The stricter <strong>the</strong>y<br />

view context — i.e., syntactic context ra<strong>the</strong>r than a bag <strong>of</strong> words, smaller context windows<br />

ra<strong>the</strong>r than large ones — <strong>the</strong> more examples <strong>of</strong> semantic similarity <strong>the</strong>y find. This pattern<br />

remains unchanged when a larger number <strong>of</strong> nearest neighbours is taken into account.<br />

With one exception, <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> four relations is comparable across <strong>the</strong> different<br />

models. Co-hyponyms figure most prominently among <strong>the</strong> nearest neighbours,<br />

followed by synonyms, hypernyms and hyponyms. Only <strong>the</strong> syntactic model finds an<br />

unexpectedly high number <strong>of</strong> hypernyms. This can probably be explained by <strong>the</strong> way<br />

syntactic relations are typically inherited in a taxonomy: all characteristics <strong>of</strong> a (prototypical)<br />

concept (can fly, for instance) also apply to its hypernyms, so that <strong>the</strong>se are <strong>of</strong>ten<br />

most similar in terms <strong>of</strong> syntactic distribution in a corpus.<br />

4.2 Semantic relatedness<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The results in <strong>the</strong> previous section do not necessarily express <strong>the</strong> overall quality <strong>of</strong> <strong>the</strong> investigated<br />

Word Space Models. It is possible that <strong>the</strong> models that scored relatively badly<br />

148


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

0.0 0.1 0.2 0.3 0.4<br />

0 20 40 60 80 100<br />

number <strong>of</strong> nearest neighbours<br />

precision<br />

recall<br />

F−score<br />

Figure 3: Evolution <strong>of</strong> <strong>the</strong> precision, recall and F-score <strong>of</strong> <strong>the</strong> first-order bag-<strong>of</strong>-word<br />

model with context size 5 in its retrieval <strong>of</strong> associations.<br />

in <strong>the</strong> similarity experiments are simply biased towards a different kind <strong>of</strong> semantic relation.<br />

In this second round <strong>of</strong> experiments we <strong>the</strong>refore turn our attention from semantic<br />

similarity to semantic relatedness.<br />

For this task, we relied on a psycholinguistic experiment <strong>of</strong> human associations, described<br />

in De Deyne and Storms (in press). In this experiment, participants were asked<br />

to list three different word associations for 1,424 cue words. Each word was presented<br />

to at least 82 participants, resulting in a total <strong>of</strong> 381,909 responses. For instance, aap<br />

(‘monkey’) triggered <strong>the</strong> response zoo (‘zoo’) 27 times, aarde (‘earth’) prompted planeet<br />

(‘planet’) 14 times and bikini (‘bikini’) elicited vakantie (‘holiday’) 6 times. These examples<br />

show that this experiment taps into a different kind <strong>of</strong> semantic relationship than<br />

<strong>the</strong> previous one. Note that at this moment, we ignore <strong>the</strong> fact that association strength is<br />

<strong>of</strong>ten asymmetric (Michelbacher, Evert and Schütze, 2007).<br />

In order to make <strong>the</strong> results comparable to those in section 4.1, we reduced <strong>the</strong> data set<br />

to those cue words and associations that belong to <strong>the</strong> 10,000 most frequent nouns in our<br />

corpus. This gave a gold standard <strong>of</strong> 768 cue words with a total <strong>of</strong> 31,862 different cue–<br />

association pairs. When <strong>the</strong>se associations are checked against EuroWordNet, we indeed<br />

find that only 8% belong to <strong>the</strong> EuroWordNet environment <strong>of</strong> <strong>the</strong>ir target word. 9% <strong>of</strong><br />

<strong>the</strong>se are synonyms, 19% are hypernyms, 16% are hyponyms and 56% are cohyponyms.<br />

We evaluated <strong>the</strong> Word Space Models against this gold standard by counting <strong>the</strong> number<br />

<strong>of</strong> associations that <strong>the</strong>y find as <strong>the</strong> nearest neighbours to <strong>the</strong> cue words. If we consider<br />

just one nearest neighbour, <strong>the</strong> results already show a considerable difference from<br />

<strong>the</strong> previous experiments. As <strong>the</strong> top chart in Figure 4 indicates, <strong>the</strong> syntactic model still<br />

performs best, with 340 associations (a precision <strong>of</strong> .443), followed by <strong>the</strong> first-order and<br />

<strong>the</strong>n <strong>the</strong> second-order bag-<strong>of</strong>-word models. However, within <strong>the</strong> bag-<strong>of</strong>-word models, <strong>the</strong><br />

ideal context size has changed. The first-order bag-<strong>of</strong>-word models with context sizes 10<br />

and 20 have 299 and 293 associations among <strong>the</strong>ir single nearest neighbours, respectively.<br />

For 768 targets, this gives precision values <strong>of</strong> .389 and .382. Then we find context sizes 5<br />

(n = 281, P = .366), 3 (n = 269, P = .350) and 1 (n = 228, P = .297). Larger contexts<br />

thus outperform <strong>the</strong>ir smaller competitors here. Note that <strong>the</strong> two best models share only<br />

149


90 correct predictions, which indicates that <strong>the</strong>y have different preferences among <strong>the</strong> associations.<br />

A look at <strong>the</strong> data suggests that <strong>the</strong> syntactic model indeed picks out those<br />

associations that are also semantically similar to <strong>the</strong>ir target word, while <strong>the</strong> first-order<br />

bag-<strong>of</strong>-word models with large contexts cover collocational relatedness better. With <strong>the</strong><br />

second-order models, finally, context size 3 seems optimal.<br />

When we consider one nearest neighbour, <strong>the</strong> models cannot find more than 768 associations,<br />

and recall thus stays extremely low. We <strong>the</strong>refore increased <strong>the</strong> number <strong>of</strong> nearest<br />

neighbours from 1 to 100 and calculated <strong>the</strong> precision, recall and F-score at each step.<br />

By way <strong>of</strong> example, Figure 3 plots <strong>the</strong> evolution <strong>of</strong> <strong>the</strong>se values for <strong>the</strong> best-performing<br />

model. The bottom bar chart in Figure 4, <strong>the</strong>n, shows <strong>the</strong> maximum F-score <strong>of</strong> all <strong>the</strong><br />

models. The syntactic approach has lost its lead, which suggests that it is able to model<br />

only a small number <strong>of</strong> associations well — probably those that also score highly on<br />

similarity. Instead it is now <strong>the</strong> first-order bag-<strong>of</strong>-word model with context size 5 that<br />

outclasses all o<strong>the</strong>rs, with an F-score <strong>of</strong> .127 (P = .112, R = .148) at 55 neighbours.<br />

Extending <strong>the</strong> context window to 10 words brings <strong>the</strong> F-score down to .122 (P = .102,<br />

R = .150, 61 neighbours); reducing <strong>the</strong> window to 3 words takes it to .120 (P = .104,<br />

R = .143, 57 neighbours). Next, we have <strong>the</strong> bag-<strong>of</strong>-word model with context size 20<br />

(F = .115, P = .102, R = .<strong>13</strong>3, 54 neighbours) and only <strong>the</strong>n <strong>the</strong> syntactic model<br />

(F = .111, P = .102 R = .123, 50 neighbours). Large contexts now score slightly worse<br />

than intermediate ones, which probably strike <strong>the</strong> best balance between similarity relations<br />

and collocational links. Second-order models never attain an F-score above .10, and<br />

nei<strong>the</strong>r do <strong>the</strong> smallest context windows, which are thus clearly biased towards similarity.<br />

4.3 Discussion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In part, our experiments have confirmed earlier results in <strong>the</strong> literature. For instance,<br />

Sahlgren (2006) already noted that with first-order bag-<strong>of</strong>-word models, larger contexts<br />

score better in his association experiment, while smaller contexts score better in <strong>the</strong> synonymy<br />

test. Peirsman et al. (2007) found even better results for a syntactic model in<br />

Dutch, at least with respect to semantic similarity evaluated against EuroWordNet. Both<br />

findings are borne out by our experiments.<br />

At <strong>the</strong> same time, our results add some new insights to <strong>the</strong>se earlier observations. We<br />

have shown that <strong>the</strong> syntactic model and <strong>the</strong> bag-<strong>of</strong>-word models with context size 1 are<br />

most biased towards semantic similarity. The syntactic model scored best in our first<br />

round <strong>of</strong> experiments, while <strong>the</strong> results <strong>of</strong> <strong>the</strong> bag-<strong>of</strong>-word models with context size 1<br />

were ei<strong>the</strong>r not statistically different from or better than those <strong>of</strong> models with larger context<br />

windows. When it came to <strong>the</strong> discovery <strong>of</strong> semantic associations, however, context<br />

size 1 proved <strong>the</strong> least advisable choice, and <strong>the</strong> syntactic model was outperformed by<br />

all first-order bag-<strong>of</strong>-word models with an intermediate or large context window. Secondorder<br />

bag-<strong>of</strong>-word models scored below average in both experiments. They probably only<br />

show <strong>the</strong>ir power when data sparseness is an issue, as with Word Sense Discrimination<br />

(Schütze, 1998) or with corpora smaller than ours.<br />

5 Conclusions and future research<br />

In this paper, we investigated <strong>the</strong> influence <strong>of</strong> <strong>the</strong> context definition on <strong>the</strong> ability <strong>of</strong><br />

several Word Space Models to capture two kinds <strong>of</strong> semantic information — semantic<br />

150


frequency<br />

F−score<br />

0 100 200 300 400<br />

0.00 0.04 0.08 0.12<br />

0.443<br />

0.297<br />

0.35 0.366 0.389 0.382<br />

0.185<br />

syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />

0.111<br />

0.084<br />

0.12<br />

word space models<br />

0.127 0.122 0.115<br />

0.052<br />

0.267 0.247 0.247 0.232<br />

syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />

word space models<br />

0.081 0.079 0.081 0.08<br />

Figure 4: Frequency <strong>of</strong> associations among single nearest neighbours (top) and maximal<br />

F-scores for all models (bottom).<br />

similarity and semantic relatedness. We studied a total <strong>of</strong> eleven Word Space Models:<br />

one syntactic approach and ten bag-<strong>of</strong>-word models with context sizes 1, 3, 5, 10 and<br />

20, first-order as well as second-order. Both for semantic similarity and semantic relatedness,<br />

first-order models clearly beat <strong>the</strong>ir second-order competitors. However, while<br />

syntactic models gave <strong>the</strong> best results for semantic similarity, first-order bag-<strong>of</strong>-word approaches<br />

with intermediate to large context windows fared better in <strong>the</strong> retrieval <strong>of</strong> associated<br />

words.<br />

In <strong>the</strong> short term, we aim to extend <strong>the</strong> repository <strong>of</strong> Word Space Models that we are<br />

investigating — document-based models and second-order syntactic models are particularly<br />

high on our list. In <strong>the</strong> longer term, we will try and determine if <strong>the</strong> differences we<br />

observed in <strong>the</strong> modelling <strong>of</strong> semantic relations between word types also play a role in<br />

Word Sense Discrimination. In this task, all contexts <strong>of</strong> a word are clustered in order to<br />

automatically find <strong>the</strong> multiple senses <strong>of</strong> that word. Given <strong>the</strong> results here, we suspect that<br />

different kinds <strong>of</strong> polysemy or homonymy may not demand <strong>the</strong> same context definitions.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Aitchinson, J. (2003). Words in <strong>the</strong> Mind. An Introduction to <strong>the</strong> Mental Lexicon, Oxford:<br />

Blackwell.<br />

Cruse, D. A. (1986). Lexical Semantics, London: Cambridge University Press.<br />

151


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

De Deyne, S. and Storms, G. (in press). Word associations: Norms for 1,424 dutch words<br />

in a continuous task, Behaviour Research Methods .<br />

Harris, Z. (1954). Distributional structure, Word 10(23): 146–162.<br />

Landauer, T. K. and Dumais, S. T. (1997). A solution to Plato’s problem: The Latent<br />

Semantic Analysis <strong>the</strong>ory <strong>of</strong> <strong>the</strong> acquisition, induction, and representation <strong>of</strong> knowledge,<br />

Psychological Review 104: 211–240.<br />

Levy, J. P. and Bullinaria, J. A. (2001). Learning lexical properties from word usage<br />

patterns: Which context words should be used, in R. French and J. Sougne (eds),<br />

Connectionist Models <strong>of</strong> Learning, Development and Evolution: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />

6th Neural Computation and Psychology Workshop, London: Springer, pp. 273–<br />

282.<br />

Lin, D. (1998). Automatic retrieval and clustering <strong>of</strong> similar words, <strong>Proceedings</strong> <strong>of</strong><br />

COLING-ACL98, Montreal, Canada, pp. 768–774.<br />

Michelbacher, L., Evert, S. and Schütze, H. (2007). Asymmetric association measures,<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> International Conference on Recent Advances in Natural Language<br />

Processing (RANLP-07), Borovets, Bulgaria.<br />

Padó, S. and Lapata, M. (2007). Dependency-based construction <strong>of</strong> semantic space models,<br />

Computational Linguistics 33(2): 161–199.<br />

Peirsman, Y., Heylen, K. and Speelman, D. (2007). Finding semantically related words in<br />

dutch. co-occurrences versus syntactic contexts, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> CoSMO Workshop,<br />

Roskilde, Denmark, pp. 9–16.<br />

Sahlgren, M. (2006). The Word-Space Model. Using Distributional Analysis to Represent<br />

Syntagmatic and Paradigmatic Relations Between Words in High-dimensional<br />

Vector Spaces, PhD <strong>the</strong>sis, Stockholm University.<br />

Schulte im Walde, S. and Melinger, A. (2005). Identifying Semantic Relations and Functional<br />

Properties <strong>of</strong> Human Verb Associations, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> joint Conference<br />

on Human Language Technology and Empirical Methods in Natural Language Processing,<br />

Vancouver, Canada, pp. 612–619.<br />

Schütze, H. (1998). Automatic word sense discrimination, Computational Linguistics<br />

24(1): 97–124.<br />

Vossen, P. (ed.) (1998). EuroWordNet: a Multilingual Database with Lexical Semantic<br />

Networks for European Languages, Dordrecht: Kluwer.<br />

Wu, Z. and Palmer, M. (1994). Verb semantics and lexical selection, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />

32nd Annual Meeting <strong>of</strong> <strong>the</strong> Association for Computational Linguistics (ACL-94),<br />

Las Cruces, NM, pp. <strong>13</strong>3–<strong>13</strong>8.<br />

152


EXAMINING THE NOTICING FUNCTION OF OUTPUT<br />

Maren Schierloh<br />

Michigan State University<br />

Abstract. Following Izumi and Bigelow’s research (Izumi and Bigelow, 2000), this study<br />

re-investigates <strong>the</strong> noticing function <strong>of</strong> output; that is, whe<strong>the</strong>r producing <strong>the</strong> target language<br />

focuses learners’ attention to second language (L2) structures in subsequent input. Izumi<br />

and Bigelow found no effects <strong>of</strong> output on ei<strong>the</strong>r noticing or acquisition. They attributed<br />

<strong>the</strong>ir findings to limitations in operationalizing noticing via underlining, coupled with <strong>the</strong><br />

relative difficulty <strong>of</strong> <strong>the</strong> target-structure (past-hypo<strong>the</strong>tical-conditional). Under <strong>the</strong> premise<br />

that <strong>the</strong> learner’s developmental level and attentional resources may constrain noticing, this<br />

partial replication addresses whe<strong>the</strong>r a less difficult structure may yield greater noticing and,<br />

consequently, greater L2 gains. Fifteen intermediate ESL learners were randomly assigned<br />

to two experimental groups (EGs) and one control group (CG). The first EG was given opportunities<br />

for output that elicited <strong>the</strong> past hypo<strong>the</strong>tical conditional (more difficult structure),<br />

while <strong>the</strong> second EG had opportunities to produce <strong>the</strong> present hypo<strong>the</strong>tical conditional (less<br />

difficult structure). The CG was not prompted to produce output that required use <strong>of</strong> ei<strong>the</strong>r<br />

structure. All groups engaged in follow-up reading and underlining activities. The reading<br />

texts modeled target-like use <strong>of</strong> <strong>the</strong> relevant structure for both EGs. Methodological<br />

triangulation measured noticing through underlining <strong>of</strong> <strong>the</strong> target-structure and stimulated<br />

recall to elicit data about cognitive processes involved. Additionally, noticing and L2 gains<br />

were assessed based on participants’ performance on subsequent essay-writing activities and<br />

posttests. Quantitative raw data revealed no effect <strong>of</strong> output (EGs vs. CG) or difficulty-level<br />

(EG1 vs. EG2) on <strong>the</strong> underlining <strong>of</strong> target forms in subsequent texts. Qualitative stimulated<br />

recall data, however, showed that output influences subsequent noticing <strong>of</strong> certain input<br />

elements; e.g. ’This is a good word for my essay’. Overall findings suggest that output<br />

can trigger noticing <strong>of</strong> vocabulary and fur<strong>the</strong>r illustrate how methodological triangulation<br />

can enhance insights into learners’ L2 processes. Thus, this study has ramifications for both<br />

classroom practices and research methodology.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In <strong>the</strong> past decade <strong>of</strong> second language acquisition (SLA) research, <strong>the</strong> notion that noticing<br />

is essential for <strong>the</strong> acquisition <strong>of</strong> new linguistic systems has been a matter <strong>of</strong> debate<br />

(Jourdenais, 2001; Leow, 2002; Robinson, 2001; Schmidt, 2001; Simard and Wong, 2001;<br />

Tomlin and Villa, 1994; Truscott, 1998). Much <strong>of</strong> <strong>the</strong> argumentation is grounded in <strong>the</strong><br />

difficulty <strong>of</strong> operationalizing and measuring <strong>the</strong> second language (L2) learner’s internal<br />

cognitive processes. Research in SLA and cognitive science has raised questions as to<br />

<strong>the</strong> type and amount <strong>of</strong> ’attention’ necessary for language learning, <strong>the</strong> specific aspects <strong>of</strong><br />

language that are more likely to be noticed, <strong>the</strong> what extent to which <strong>the</strong> developmental<br />

level <strong>of</strong> <strong>the</strong> learner determines what is noticed.<br />

Recently, researchers have turned <strong>the</strong>ir attention to <strong>the</strong> role output plays in noticing.<br />

The oral or written production <strong>of</strong> language may consciously induce learners to realize<br />

<strong>the</strong> gap between what <strong>the</strong>y want to say and what <strong>the</strong>y can say. This noticing <strong>of</strong> linguistic<br />

limitations may prompt learners to seek solutions in subsequent input. A study by<br />

Izumi and Bigelow (2000) centered on <strong>the</strong> noticing function <strong>of</strong> output. They investigated<br />

whe<strong>the</strong>r L2 written output promotes noticing <strong>of</strong> form in subsequent text. They compared<br />

an experimental group, which produced output, to a control group, which did not produce<br />

any output but engaged in comprehension-based activities instead. The noticing <strong>of</strong> <strong>the</strong><br />

153


participants was operationalized through <strong>the</strong> participants’ underlining <strong>of</strong> <strong>the</strong> target structure<br />

in written text. Both groups underlined <strong>the</strong> same amount and Izumi and Bigelow<br />

concluded that output does not trigger noticing. Because Izumi and Bigelow’s inquiry<br />

is <strong>of</strong> importance as it may inform L2 pedagogy, <strong>the</strong> present study partially replicates<br />

<strong>the</strong>ir study by asking analogous research questions and by implementing a similar design.<br />

Yet, to achieve a more valid measure <strong>of</strong> noticing, this study uses stimulated recall to tap<br />

into learners’ cognitive processes. In addition to <strong>the</strong> stimulated recall data, this study<br />

quantitatively and qualitatively analyzes <strong>the</strong> data from learners’ underlining and written<br />

production to better examine a possible relationship between output, noticing and L2 development.<br />

This research also addresses whe<strong>the</strong>r a cognitively less demanding structure<br />

may have an effect on noticing by <strong>the</strong> learner. The following section provides a review<br />

<strong>of</strong> <strong>the</strong> literature on noticing, followed by sections detailing <strong>the</strong> difficulties associated with<br />

measuring noticing, <strong>the</strong> role output plays in noticing as well as <strong>the</strong> role <strong>of</strong> <strong>the</strong> learner<br />

level. The third section details <strong>the</strong> research methodology, and <strong>the</strong> subsequent sections<br />

provide a discussion <strong>of</strong> findings and limitations and a conclusion.<br />

2 Review <strong>of</strong> <strong>the</strong> Literature<br />

2.1 Noticing and SLA<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Since Schmidt (1990) first proposed his well-known “noticing hypo<strong>the</strong>sis”, a large body<br />

<strong>of</strong> SLA and cognitive science research has focused on <strong>the</strong> role <strong>of</strong> noticing, or conscious<br />

attention 1 , in promoting L2 development (Alanen, 1995; Leow, 2002; Rosa and O’Neill,<br />

1999). The noticing hypo<strong>the</strong>sis claims that noticing requires awareness and is a necessary<br />

condition for second language acquisition. Yet, some research findings are not in line with<br />

<strong>the</strong> premise that conscious attention is a necessary prerequisite for L2 acquisition (Gass,<br />

Svetics and Lemelin, 2003; Robinson, 1995).<br />

Truscott rejects <strong>the</strong> crucial role <strong>of</strong> noticing in L2 learning process from a <strong>the</strong>oretical<br />

perspective, maintaining that noticing only advances metalinguistic knowledge but not<br />

competence. He fur<strong>the</strong>r contends that “awareness is not only unnecessary but also unhelpful”<br />

(Truscott, 1998, page 126). Such a narrow account <strong>of</strong> <strong>the</strong> role <strong>of</strong> noticing in SLA<br />

is certainly challenged by substantial L2 research data supporting that noticing facilitates<br />

L2 learning (Ellis, 1994; Long, 1996; Robinson, 1995; Swain and Lapkin, 1998).<br />

2.1.1 Operationalizing and Measuring Noticing<br />

At <strong>the</strong> heart <strong>of</strong> <strong>the</strong> ongoing debate on <strong>the</strong> role <strong>of</strong> noticing in SLA is <strong>the</strong> difficulty in<br />

operationalizing it, which requires introspection and assessment <strong>of</strong> learner-internal cognitive<br />

activities. For example, Schmidt (2001) operationalized noticing in terms <strong>of</strong> <strong>the</strong><br />

learners’ self-reporting ei<strong>the</strong>r during or immediately after exposure to <strong>the</strong> input, yet, <strong>the</strong><br />

lack <strong>of</strong> self-reporting should not be interpreted as a lack <strong>of</strong> awareness, as some thinking<br />

processes may be difficult to verbalize (Jourdenais, 2001; Schmidt, 2001). As such, <strong>the</strong><br />

challenge facing <strong>the</strong> measurement <strong>of</strong> noticing is to accurately link observable behaviors<br />

by language learners to <strong>the</strong> construct <strong>of</strong> noticing. Methodologies used to qualitatively and<br />

1 Due to terminological vagueness <strong>of</strong> ’noticing’ resulting from related terms such as ’attention’<br />

(Leow, 2002) and ’awareness’ (Tomlin and Villa, 1994) in noticing- literature, Schmidt’s definition <strong>of</strong><br />

noticing as ’conscious attention’ has been adopted for <strong>the</strong> present study (Schmidt, 2001). Schmidt equates<br />

consciousness with awareness and/or attention.<br />

154


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

quantitatively account for a learners’ noticing <strong>of</strong> a specific target language features fall<br />

into two categories: online, which measure <strong>the</strong> language learner’s noticing during performance<br />

<strong>of</strong> a certain language task, and <strong>of</strong>fline, which employs post-treatment assessment<br />

<strong>of</strong> noticing. Nei<strong>the</strong>r online nor <strong>of</strong>fline methodologies enable an absolute account <strong>of</strong> <strong>the</strong><br />

learners’ attentional processes.<br />

Online methodologies include, for example, think-aloud protocols which require <strong>the</strong><br />

participants to monitor and orally self-report <strong>the</strong>ir mental processes while <strong>the</strong>y perform a<br />

certain language task. Izumi and Bigelow used <strong>the</strong> online methodology <strong>of</strong> participants’<br />

underlining <strong>of</strong> “<strong>the</strong> word, words, or parts <strong>of</strong> <strong>the</strong> words that are [felt to be] particularly<br />

necessary for subsequent production” (Izumi and Bigelow, 2000, page 250). Izumi and<br />

Bigelow characterize underlining as an au<strong>the</strong>ntic procedure readers naturally do during<br />

a reading task, and argue that <strong>the</strong> marking <strong>of</strong> words would not occur without conscious<br />

awareness <strong>of</strong> <strong>the</strong> importance <strong>of</strong> that particular word or phrase. In partially replicating<br />

Izumi and Bigelow, <strong>the</strong> present study utilizes underlining as one integral attribute <strong>of</strong> <strong>the</strong><br />

triangulated measurement <strong>of</strong> noticing.<br />

The advantage <strong>of</strong> online measures, as opposed to post-exposure measures, is <strong>the</strong>ir instantaneous<br />

access to L2 processing, thus minimizing <strong>the</strong> risk <strong>of</strong> possible memory decay<br />

by <strong>the</strong> L2 learner (Gass and Mackey, 2000). Yet, stimulated recall has evolved as a<br />

sound and widely used <strong>of</strong>fline method to obtain data <strong>of</strong> <strong>the</strong> language learner’s thought<br />

processes. During stimulated recall, learners are prompted with a stimulus (e.g. learner’s<br />

written products or a video displaying <strong>the</strong> learner while engaging in <strong>the</strong> language task),<br />

and he/she is asked to report on thought processes while performing <strong>the</strong> language task.<br />

Note, however, that <strong>the</strong> lack <strong>of</strong> evidence <strong>of</strong> noticing in online or <strong>of</strong>fline protocol does not<br />

necessarily imply absence <strong>of</strong> noticing.<br />

2.1.2 Developmental Level as a Factor in Noticing<br />

In addition to <strong>the</strong> concern over how noticing data should be collected and analyzed, current<br />

SLA research has scrutinized connections between <strong>the</strong> difficulty level <strong>of</strong> <strong>the</strong> target<br />

language input and <strong>the</strong> learner’s attentional resources (Ellis, 1994; Gass et al., 2003; Philp,<br />

2003; VanPatten, 1996). Long (1996), for instance, found that <strong>the</strong> pr<strong>of</strong>iciency <strong>of</strong> <strong>the</strong><br />

learner may modulate noticing. Advanced learners may benefit from <strong>the</strong> increasing automaticity<br />

which allows <strong>the</strong>m to attend to more complex structures. A recent study by Philp<br />

(2003) similarly revealed that <strong>the</strong> developmental level <strong>of</strong> <strong>the</strong> learners was one factor to<br />

determine accurate recall <strong>of</strong> <strong>the</strong> reformulation by <strong>the</strong> native speaker. Thus, developmental<br />

readiness may constrain <strong>the</strong> learner’s attention to aspects <strong>of</strong> more difficult structures. In<br />

a similar vein, Robinson (1995) argued that <strong>the</strong> extent to which a language learner may<br />

notice a particular form <strong>of</strong> <strong>the</strong>ir linguistic limitations is dependent on <strong>the</strong> demands <strong>of</strong> <strong>the</strong><br />

pedagogical task.<br />

In <strong>the</strong> research by Izumi and Bigelow (2000), <strong>the</strong> study to be partially replicated here,<br />

<strong>the</strong> past hypo<strong>the</strong>tical conditional was selected as <strong>the</strong> target structure, based on <strong>the</strong> rationale<br />

that this structure poses some difficulty to <strong>the</strong> learner, which may trigger noticing <strong>of</strong><br />

linguistic limitations. Yet, learner level and attentional capacities for <strong>the</strong> target structure,<br />

it is <strong>of</strong> present interest whe<strong>the</strong>r reduced cognitive demands may yield greater noticing<br />

and, in turn, greater L2 gains.<br />

155


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

2.2 The Noticing Function <strong>of</strong> Output<br />

Underlying <strong>the</strong> relationship between noticing and SLA is <strong>the</strong> question <strong>of</strong> under what circumstances<br />

L2 learners may notice linguistic forms. Is it through input or through output,<br />

or both in combination? While <strong>the</strong> essential role <strong>of</strong> input for SLA is universally accepted,<br />

<strong>the</strong> sufficiency <strong>of</strong> input for acquisition has been debated since Swain first proposed her<br />

Output Hypo<strong>the</strong>sis (Swain, 1985) in reaction to Krashen’s view <strong>of</strong> primacy <strong>of</strong> comprehensible<br />

input (Krashen, 1982). While Swain does not negate <strong>the</strong> importance <strong>of</strong> input,<br />

she argues that “L2 output pushes learners to process language more deeply (with more<br />

mental effort) than does input” (Swain, 1995). A series <strong>of</strong> studies by Swain and Lapkin<br />

revealed noticing as one <strong>of</strong> <strong>the</strong> main reasons why producing output mediates L2 development<br />

(Swain and Lapkin, 1995; Swain and Lapkin, 1998). As such, <strong>the</strong>ir argument corresponds<br />

to Schmidt’s Noticing Hypo<strong>the</strong>sis. Because output focuses <strong>the</strong> learner’s attention<br />

on <strong>the</strong> L2 structures <strong>the</strong>y produce (<strong>the</strong>ir interlanguage), it enables <strong>the</strong>m to compare <strong>the</strong>ir<br />

interlanguage to <strong>the</strong> target language <strong>the</strong>y receive, <strong>the</strong>reby attending to <strong>the</strong>ir linguistic limitations<br />

(Gass and Varonis, 1994). If relevant input is immediately available afterwards,<br />

<strong>the</strong> noticing <strong>of</strong> <strong>the</strong> gap may cause <strong>the</strong> learner to process <strong>the</strong> subsequent input with more<br />

focused attention. This hypo<strong>the</strong>sis has been approached by Izumi and Bigelow (2000),<br />

which constitutes <strong>the</strong> basis <strong>of</strong> <strong>the</strong> research reported here.<br />

2.3 Izumi and Bigelow 2000<br />

Izumi and Bigelow addressed <strong>the</strong> issue <strong>of</strong> output and noticing in <strong>the</strong>ir study guided by<br />

two questions: (1) “Do output activities promote <strong>the</strong> noticing <strong>of</strong> linguistic form in subsequent<br />

input?” and (2) “Do <strong>the</strong>se output-input-activities result in improved production <strong>of</strong><br />

<strong>the</strong> target form?” (Izumi and Bigelow, 2000, page 247). They compared an EG, which<br />

was engaged in output tasks (essay writing and text reconstruction) to a CG, which did<br />

not produce any written output. Both groups received <strong>the</strong> same textual input for <strong>the</strong> subsequent<br />

reading and underlining activity; however, <strong>the</strong> groups were given different purposes<br />

for underlining which may have influenced participant’s attentional focus. In <strong>the</strong><br />

present study, all participants received <strong>the</strong> same instructions for <strong>the</strong> reading and underlining<br />

activity. In Izumi and Bigelow (2000), noticing <strong>of</strong> <strong>the</strong> target form (past hypo<strong>the</strong>tical<br />

conditional in English 2 ) was assessed through underlining and through <strong>the</strong> demonstration<br />

<strong>of</strong> uptake (correct use <strong>of</strong> <strong>the</strong> target form by <strong>the</strong> learner) as a complementary measure <strong>of</strong><br />

noticing and acquisition <strong>of</strong> that form. The study presented here did not treat uptake as a<br />

distinct measurement <strong>of</strong> noticing or acquisition, but qualitatively examined to what extent<br />

learner’s uptake corresponds to prior noticing action. Izumi and Bigelow attributed <strong>the</strong>ir<br />

non-significant findings to <strong>the</strong> relative difficulty <strong>of</strong> <strong>the</strong> target structure. Thus, this study<br />

investigates learners’ noticing when engaging with a less complex yet similar structure:<br />

<strong>the</strong> present hypo<strong>the</strong>tical conditional 3 .<br />

Except for one statistically significant increase <strong>of</strong> performance from <strong>the</strong> pretest to <strong>the</strong><br />

second posttest <strong>of</strong> <strong>the</strong> experimental group, Izumi and Bigelow evidenced no statistically<br />

significant between-group differences on any measure. Both groups underlined nearly <strong>the</strong><br />

same percentage <strong>of</strong> conditional-related forms. They concluded that output did not draw<br />

<strong>the</strong> learner’s attention to <strong>the</strong> targeted form, and insignificant results were attributed to<br />

2 i.e. If Lisa had traveled to Spain, she would have seen <strong>the</strong> Olympic games.<br />

3 i.e. If Lisa traveled to Spain, she would see <strong>the</strong> Olympic Games<br />

156


effects <strong>of</strong> input flood and individual variation. I argue that underlining as a single measure<br />

gives an insufficient account <strong>of</strong> learners’ cognitive processes, and I hypo<strong>the</strong>size that <strong>the</strong><br />

output treatment could have been observed to trigger noticing if additional qualitative and<br />

quantitative measures had been employed. Therefore, <strong>the</strong> present study follows Izumi and<br />

Bigelow’s suggestion to implement “methodological triangulation as <strong>the</strong> research design<br />

allows” (Izumi and Bigelow, 2000, page 271) by operationalizing noticing through targetstructure<br />

underlining and reporting <strong>of</strong> conscious attention during <strong>the</strong> stimulated recall<br />

session. In o<strong>the</strong>r words, through triangulated data collection, noticing is investigated<br />

from multiple perspectives.<br />

3 Research Questions and Hypo<strong>the</strong>ses<br />

In order to validly replicate Izumi and Bigelow’s study (Izumi and Bigelow, 2000), similar<br />

research questions are pursued along with <strong>the</strong>ir congruent hypo<strong>the</strong>ses:<br />

RQ1: Do output activities promote noticing <strong>of</strong> linguistic form in subsequent input?<br />

RQ2: Do <strong>the</strong>se output-input activities result in improved production <strong>of</strong> <strong>the</strong> target<br />

form?<br />

It is hypo<strong>the</strong>sized that <strong>the</strong> experimental groups, which are required to produce output,<br />

would show greater noticing <strong>of</strong> <strong>the</strong> target-structure contained in <strong>the</strong> input than <strong>the</strong> control<br />

group, which does not produce output requiring <strong>the</strong> use <strong>of</strong> <strong>the</strong> target-structure. Fur<strong>the</strong>rmore,<br />

on <strong>the</strong> posttests, <strong>the</strong> experimental groups are expected to demonstrate greater gains<br />

in accuracy <strong>of</strong> <strong>the</strong>ir use <strong>of</strong> <strong>the</strong> target form than <strong>the</strong> control group. Given that prior research<br />

found <strong>the</strong> language learner’s developmental level to be associated with attentional<br />

resources available for <strong>the</strong> target-structure, it is hypo<strong>the</strong>sized that a less difficult targetstructure<br />

promotes greater noticing and greater L2 gains. Thus, <strong>the</strong> present study is fur<strong>the</strong>r<br />

guided by <strong>the</strong> following research question:<br />

RQ3: Does <strong>the</strong> present hypo<strong>the</strong>tical conditional, as a less difficult structure, promote<br />

greater noticing compared to <strong>the</strong> past-hypo<strong>the</strong>tical-conditional structure?<br />

4 Methodology<br />

4.1 Participants<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Fifteen intermediate ESL learners enrolled in <strong>the</strong> second semester ESL academic writing<br />

class at Michigan State University have participated up to this point. <strong>Student</strong>s’ enrollment<br />

in <strong>the</strong> ESL academic writing class is determined by a placement test or by passing <strong>the</strong><br />

previous course. The ESL learners were from a variety <strong>of</strong> L2 backgrounds including<br />

Cantonese, Japanese, Korean and Arabic with an average <strong>of</strong> 8.7 years <strong>of</strong> previous English<br />

study 4 . Three students have lived in <strong>the</strong> United States for more than two years, and <strong>the</strong><br />

remaining have resided <strong>the</strong>re for at least a year. Upon completion <strong>of</strong> <strong>the</strong> questionnaire,<br />

participants were randomly assigned to one <strong>of</strong> <strong>the</strong> two experimental groups (EGs) or to<br />

<strong>the</strong> single control group (CG).<br />

4 It needs to be noted that <strong>the</strong> different native languages <strong>of</strong> <strong>the</strong> learners affect <strong>the</strong>ir proximity to (distance<br />

from) English, which could make some structures easier (more difficult) to process for some learners than<br />

for o<strong>the</strong>rs. The native language <strong>of</strong> <strong>the</strong> participant was not systematically investigated here.<br />

157


4.2 Procedures<br />

The experiment followed a pretest-posttest design. The researcher met one-on-one with<br />

each participant for about 1 or 1.5 hours depending on whe<strong>the</strong>r <strong>the</strong> participants chose<br />

to take part in <strong>the</strong> stimulated recall session or not. The participants were informed <strong>of</strong><br />

<strong>the</strong> sequence <strong>of</strong> <strong>the</strong> activities before <strong>the</strong>y completed <strong>the</strong> pretest (see Appendix A for an<br />

example) and <strong>the</strong> reading and writing activities. Participants assigned to <strong>the</strong> first experimental<br />

group (EG1) composed an essay that elicited <strong>the</strong> past hypo<strong>the</strong>tical conditional<br />

(Appendix B), whereas participants assigned to <strong>the</strong> second experimental group (EG2)<br />

composed an essay that elicited <strong>the</strong> present hypo<strong>the</strong>tical conditional. Participants in <strong>the</strong><br />

control group (CG) engaged in a writing task that did not require <strong>the</strong> use <strong>of</strong> ei<strong>the</strong>r structure.<br />

Each participant subsequently received input that modeled <strong>the</strong> correct use <strong>of</strong> <strong>the</strong><br />

relevant target structures (Appendix C); yet, for <strong>the</strong> CG, <strong>the</strong> reading text did not serve<br />

as a model. All groups were instructed to ei<strong>the</strong>r underline what [<strong>the</strong>y] feel is important<br />

for re-writing <strong>the</strong> essay or underline what [<strong>the</strong>y] feel is important for writing an essay<br />

about this topic. By leaving <strong>the</strong> words to be underlined unspecified, <strong>the</strong> learner’s attentional<br />

foci were not predisposed. Before <strong>the</strong> participants carried out <strong>the</strong> actual task, <strong>the</strong><br />

grammar-focused underlining was demonstrated to <strong>the</strong> students using a passage that did<br />

not contain <strong>the</strong> target-structure 5 . Following <strong>the</strong> reading and underlining activity, all participants<br />

in <strong>the</strong> EGs reproduced <strong>the</strong>ir initial essay, whereas <strong>the</strong> CG group wrote about<br />

<strong>the</strong> EGs’ initial essay topic for <strong>the</strong> first time. The immediate posttest was administered<br />

upon completion <strong>of</strong> <strong>the</strong> second essay writing activity or after <strong>the</strong> stimulated recall session<br />

depending on whe<strong>the</strong>r or not participants took part in <strong>the</strong> stimulated recall interview. The<br />

delayed posttest was given after one week had passed 6 . Four participants <strong>of</strong> each EG and<br />

three participants <strong>of</strong> <strong>the</strong> CG volunteered to being videotaped during <strong>the</strong> reading activity.<br />

To better track <strong>the</strong>ir focus during <strong>the</strong> reading and underlining task, <strong>the</strong> videotaped participants<br />

were asked to read aloud. Immediately following completion <strong>of</strong> <strong>the</strong> second essay,<br />

<strong>the</strong> videotape was rewound and played to <strong>the</strong> learner. While watching <strong>the</strong> videotape, <strong>the</strong><br />

researcher stopped <strong>the</strong> tapes after episodes that appeared to involve noticing <strong>of</strong> linguistic<br />

features (i.e. underlining or hesitation), asking <strong>the</strong> student to describe his/her thoughts<br />

during that time. English was used during all interactions between <strong>the</strong> participants and<br />

<strong>the</strong> researchers, which were audio recorded for transcription purposes.<br />

5 Results and Discussion<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The first research question asked whe<strong>the</strong>r output activities promote noticing <strong>of</strong> grammatical<br />

features in subsequent input. In a restricted way, <strong>the</strong> hypo<strong>the</strong>sis predicting greater<br />

noticing <strong>of</strong> <strong>the</strong> target forms for <strong>the</strong> EGs than <strong>the</strong> CGs was not confirmed (p 0.5) 7 . All<br />

participants underlined vocabulary items ra<strong>the</strong>r than <strong>the</strong> grammatical cues in <strong>the</strong> reading<br />

text. However, <strong>the</strong> present study does show that output had an effect on learners’<br />

attentional foci and input processing. While no participant appeared to notice <strong>the</strong> target<br />

form, most participants’ attention was drawn to <strong>the</strong> vocabulary in order to process <strong>the</strong><br />

main message <strong>of</strong> <strong>the</strong> input passages. The predicted effect <strong>of</strong> output in promoting noticing<br />

5 Modeling familiarizes <strong>the</strong> learners with <strong>the</strong> underlining procedure and increases precision <strong>of</strong> <strong>the</strong> mea-<br />

sure 6Three students did not show up for <strong>the</strong> delayed posttest<br />

7 Wilcoxon-signed-rank tests were used for within-group comparisons<br />

158


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

<strong>of</strong> <strong>the</strong> correct use <strong>of</strong> conditional sentences was not supported in this study. Similarly,<br />

<strong>the</strong> output-input-output treatment did not alter <strong>the</strong> students’ level performance on <strong>the</strong> immediate<br />

and delayed posttests when compared to <strong>the</strong> input-output treatment. However,<br />

output appeared to trigger noticing <strong>of</strong> vocabulary, style, and some content issues. This<br />

finding will be discussed in more detail below.<br />

The second research question addressed <strong>the</strong> acquisition issue and inquired whe<strong>the</strong>r<br />

output-input activities results in improved production <strong>of</strong> <strong>the</strong> target form. The present<br />

study did not yield clear results in support <strong>of</strong> such a relationship, mainly because <strong>the</strong><br />

noticing scores could not be sufficiently squared with posttest scores as <strong>the</strong>re was a lack<br />

<strong>of</strong> grammar-related noticing with all candidates during <strong>the</strong> treatment phase. Put ano<strong>the</strong>r<br />

way, <strong>the</strong> posttests do not provide a measure <strong>of</strong> <strong>the</strong> effect <strong>of</strong> noticing. Future research will<br />

need to use correlation analyses in order to square <strong>the</strong> underlining, stimulated recall, and<br />

second essay scores (as a measure <strong>of</strong> noticing) with gain on individualized vocabulary<br />

posttests. While data from <strong>the</strong> underlining, second essay, and stimulated recall point to a<br />

link between noticing and <strong>the</strong> subsequent use <strong>of</strong> noticed items, it would be too suggestive<br />

to claim a causal relationship between noticing and acquisition.<br />

Under <strong>the</strong> premise that attentional resources constrain noticing, <strong>the</strong> third research question<br />

asked whe<strong>the</strong>r a less difficult structure promotes greater noticing than a more difficult<br />

structure. The results <strong>of</strong> this study suggest that <strong>the</strong> less difficult structure had no effect<br />

on noticing or L2 learning 8 . There was no notable difference between EG1 and EG2 performance<br />

on any measure. Of course, any interpretation <strong>of</strong> <strong>the</strong> test-, noticing-, or essay<br />

scores would be unconvincing, given that only three candidates could be compared to<br />

ano<strong>the</strong>r set <strong>of</strong> three candidates. The small number <strong>of</strong> participants notwithstanding, one<br />

possible explanation for this finding might be that <strong>the</strong> less difficult structure was not significantly<br />

easier. The production and processing <strong>of</strong> <strong>the</strong> present-hypo<strong>the</strong>tical- conditional<br />

may have been just as cognitively demanding as <strong>the</strong> past-hypo<strong>the</strong>tical-conditional. Thus,<br />

<strong>the</strong> results <strong>of</strong> this research can not validate (nor invalidate) <strong>the</strong> claim that <strong>the</strong> demands<br />

<strong>of</strong> <strong>the</strong> targeted grammatical structure or <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> pedagogical task have no<br />

effect on learners’ attentional resources. In order to tap into a possible relationship between<br />

cognitive demands and noticing, <strong>the</strong> fact that one task is indeed cognitively more<br />

demanding must first be established. Before this research project is fur<strong>the</strong>r pursued, <strong>the</strong><br />

relative difficulty <strong>of</strong> both structures needs to be evaluated with a larger number <strong>of</strong> ESL<br />

learners.<br />

Although <strong>the</strong> research hypo<strong>the</strong>ses <strong>of</strong> this study have not been supported, this research<br />

demonstrates that noticing has occurred. These insights contrast with Izumi and Bigelow’s<br />

conclusion that output does not trigger learner’s noticing (Izumi and Bigelow, 2000). The<br />

present study demonstrates that output treatment influences learners’ subsequent cognitive<br />

processes, e.g., that is good way to say. I also wanna say something like that, but my<br />

essay is not so good, so I try to remember. As such, output focused <strong>the</strong> learner’s attention<br />

to specific linguistic features in <strong>the</strong> output, and those noticed features were <strong>the</strong>n compared<br />

to <strong>the</strong> features <strong>the</strong> learner had produced in <strong>the</strong>ir first writing activity. Yet, <strong>the</strong> data from<br />

this study leaves us to wonder whe<strong>the</strong>r (and to what extent) <strong>the</strong> noticed features were<br />

incorporated into <strong>the</strong> interlanguage system. Chaudron (1985) argued that L2 learning involves<br />

two stages: first, <strong>the</strong> perception <strong>of</strong> input (noticing), and second, <strong>the</strong> integration <strong>of</strong><br />

intake into <strong>the</strong> learner’s interlanguage system. Gass and Varonis (1994) similarly sug-<br />

8 Mann-Whitney-U tests were used for between-group comparisons<br />

159


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

gested that learners need to apperceive input before it can become intake. According to<br />

Ellis “Intake occurs when learners take features into <strong>the</strong>ir short- or medium-term memories,<br />

whereas interlanguage change occurs only when <strong>the</strong>y become part <strong>of</strong> long term<br />

memory” (Ellis, 1997, page 119). Accordingly, <strong>the</strong> learner has to convert from preliminary<br />

to final intake. It would be interesting to investigate whe<strong>the</strong>r <strong>the</strong> learners in this<br />

study process <strong>the</strong> new linguistic items (e.g., vocabulary) beyond noticing and immediate<br />

intake, in order to contribute to <strong>the</strong>ory building on input, intake, and L2 acquisition. A<br />

possible way <strong>of</strong> approaching this would be to include a delayed essay production task to<br />

see whe<strong>the</strong>r, and to what extent, <strong>the</strong> learners arrived at <strong>the</strong> “final intake stage”.<br />

The overall findings <strong>of</strong> <strong>the</strong> present study indicate that learners processed <strong>the</strong> input<br />

primarily for meaning. Although no form-focused comparisons were invoked, EG candidates<br />

noticed a difference between <strong>the</strong>ir word choice and style and those <strong>of</strong> <strong>the</strong> native<br />

speaker. These findings are in line with VanPatten (1996) who proposed input processing<br />

principles:<br />

1. The Primacy <strong>of</strong> Meaning Principle: Learners process input for meaning before <strong>the</strong>y<br />

process it for form.<br />

2. The Primacy <strong>of</strong> Content Words Principle: Learners process content words in <strong>the</strong><br />

input before anything else.<br />

3. The Lexical Preference Principle: Learners will tend to rely on lexical items as<br />

opposed to grammatical form to get meaning when both encode <strong>the</strong> same semantic<br />

information 9 .<br />

Applying VanPatten’s principles to <strong>the</strong> present study, it might be that all participants<br />

processed meaningful elements in <strong>the</strong> input while reading <strong>the</strong> input text. This may explain<br />

why <strong>the</strong>y did not underline grammatical elements such as modals like would and could,<br />

auxiliaries and past participles. Because learners were not capable <strong>of</strong> attending to vocabulary<br />

and grammar, <strong>the</strong> past/present hypo<strong>the</strong>tical conditional may have been processed<br />

only peripherally.<br />

Based on <strong>the</strong> input processing principles, VanPatten (1996) investigated <strong>the</strong> effects<br />

<strong>of</strong> processing instruction, revealing that learners’ focal attention during processing can<br />

be directed toward <strong>the</strong> relevant grammatical items and, in turn, enhance L2 learning.<br />

Follow-up research should investigate whe<strong>the</strong>r input enhancement or specific instructions<br />

to underline grammatical structures (e.g., <strong>the</strong> past/present hypo<strong>the</strong>tical conditional) would<br />

enhance noticing, intake, and L2 acquisition. The present study did not provide any specific<br />

instructions for <strong>the</strong> underlining, but to underline what is important for subsequent<br />

production, on purpose: The study’s objective was to see whe<strong>the</strong>r output which requires<br />

use <strong>of</strong> a particular structure results in underlining <strong>of</strong> that particular structure in <strong>the</strong> subsequent<br />

input passage. If <strong>the</strong> learners were told to underline grammatical structures, <strong>the</strong>ir<br />

attentional foci would have been predisposed, as it was <strong>the</strong> case in Izumi and Bigelow<br />

(2000).<br />

The findings in <strong>the</strong> present study also raise important methodological issues that should<br />

be addressed in future studies that investigate <strong>the</strong> role <strong>of</strong> noticing in SLA. First and foremost,<br />

this study has shown that triangulated or multiple data-elicitation measures can<br />

9 Only <strong>the</strong> relevant subset <strong>of</strong> <strong>the</strong> entire set <strong>of</strong> input processing principles is presented here<br />

160


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

provide a much more complex picture <strong>of</strong> learners’ internal processes. In this study, <strong>the</strong><br />

underlining, <strong>the</strong> essays, <strong>the</strong> tests scores, and <strong>the</strong> verbal reports from <strong>the</strong> stimulated recall<br />

session, all helped to puzzle out <strong>the</strong> role <strong>of</strong> output and noticing in second language<br />

acquisition. Although verbal stimulated recall reports cannot provide a complete reflection<br />

<strong>of</strong> actual internal processing, <strong>the</strong>y provided useful information as to how learners’<br />

minds process language information when <strong>the</strong> learners articulated <strong>the</strong>ir concerns (e.g., I<br />

wanna say something like that, but my essay is not so good, so I try to remember) or<br />

when <strong>the</strong>y made comparisons to <strong>the</strong>ir first essay (e.g., This is a big word I want to remember).<br />

Learners underlined <strong>the</strong> words that captured <strong>the</strong> author’s key message, and<br />

<strong>the</strong>ir comments reflected <strong>the</strong>ir intent in seeking meaning and better vocabulary for use in<br />

<strong>the</strong>ir second essay. The stimulated recall protocols obtained in this study collaboratively<br />

demonstrate that <strong>the</strong> learners did not attend to grammatical features. Additionally, <strong>the</strong><br />

data from <strong>the</strong> first and second essays illustrate that students improved <strong>the</strong>ir expression<br />

and word choice, but not <strong>the</strong>ir grammatical accuracy. Izumi and Bigelow were unable to<br />

draw such conclusions as <strong>the</strong>y limited <strong>the</strong>ir measurement <strong>of</strong> noticing and <strong>the</strong>ir measurement<br />

<strong>of</strong> acquisition to <strong>the</strong> underlining <strong>of</strong> conditional related items and posttest scores,<br />

respectively. Izumi and Bigelow found that output does not prompt <strong>the</strong> learners to “notice<br />

<strong>the</strong> gap”. The present study, by contrast, reveals that some learners were aware that <strong>the</strong>y<br />

could not express <strong>the</strong>mselves as entirely as <strong>the</strong>y wished, (e.g., I want to say negotiate in<br />

my essay, but I don’t remember it). They noticed <strong>the</strong>ir restricted lexicon and searched<br />

for more appropriate words in <strong>the</strong> input passage. In o<strong>the</strong>r words, students realized lexical<br />

gaps which triggered <strong>the</strong>ir attention to vocabulary in subsequent input.<br />

6 Limitations and Future Research<br />

Although <strong>the</strong> present study sheds some light on meaning-focused processing and noticing<br />

as well as methodological issues, <strong>the</strong>re are some limitations that need to be acknowledged.<br />

First and foremost, <strong>the</strong> small number <strong>of</strong> participants clearly limits <strong>the</strong> generalization <strong>of</strong><br />

findings to a broader variety <strong>of</strong> L2 learners 10 . Proceeding with this research up to a minimum<br />

<strong>of</strong> twenty-one participants will reveal whe<strong>the</strong>r <strong>the</strong> current trends hold true. Fur<strong>the</strong>r<br />

study may include asking non-stimulated recall participants about what <strong>the</strong>y have noticed<br />

in a short questionnaire and what <strong>the</strong>y assume <strong>the</strong> purpose <strong>of</strong> <strong>the</strong> reading and writing tasks<br />

were.<br />

The testing instruments employed in this study are limited in length and scope which<br />

may have impacted <strong>the</strong> measurement <strong>of</strong> L2 attainment. Whereas a more comprehensive<br />

test <strong>of</strong> <strong>the</strong> past-hypo<strong>the</strong>tical-conditional may yield more valid results, it may also prompt<br />

participants to pay closer attention to <strong>the</strong> form in <strong>the</strong> input passage. Consequently, a tenable<br />

comparison between output and no-output treatment would be difficult, as all groups<br />

would produce <strong>the</strong> target form to <strong>the</strong> same extent. As mentioned earlier, in order to better<br />

understand <strong>the</strong> relationship between attention and learning, future research may develop<br />

tests that examine students’ acquisition <strong>of</strong> noticed vocabulary items. For such measurement,<br />

individualized delayed posttests in which <strong>the</strong> noticed (underlined and commented)<br />

items are assessed in terms <strong>of</strong> adequate usage and comprehension would be appropriate.<br />

10 The fact that <strong>the</strong> participants were willing to take part in <strong>the</strong> study outside <strong>of</strong> class time may have lead<br />

to a participant body that is more motivated and eager to improve than <strong>the</strong> average intermediate ESL learner<br />

161


7 Conclusions<br />

The purpose <strong>of</strong> this study was to investigate <strong>the</strong> effects <strong>of</strong> output and cognitive demands<br />

on noticing and second language acquisition, providing <strong>the</strong> following two merits: First,<br />

this study has demonstrated how multiple perspectives can help to obtain insights into<br />

learners’ cognitive processes. Secondly, <strong>the</strong> results <strong>of</strong> this study support <strong>the</strong> noticing<br />

function <strong>of</strong> output to some extent. Output-input treatment has shown to trigger comparison<br />

<strong>of</strong> <strong>the</strong> learner’s interlanguage lexicon with language produced by a native speaker.<br />

Fur<strong>the</strong>rmore, this study demonstrates that learners primarily attend to meaning, which<br />

is in line with VanPatten’s input processing principles (VanPatten, 1996). However, <strong>the</strong><br />

overall results do not allow for clear conclusions. Much more research is needed to find<br />

<strong>the</strong> extent to which learners notice specific features in <strong>the</strong> input as well as to explore <strong>the</strong><br />

very mechanisms <strong>of</strong> noticing. Until <strong>the</strong>n, our understanding <strong>of</strong> what takes place in <strong>the</strong><br />

learners head remains complex and opaque.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Alanen, R. (1995). Input enhancement and rule representation in second language acquisition,<br />

in R. Schmidt (ed.), Attention and Awareness in Foreign Language Learning,<br />

University <strong>of</strong> Hawai’i Press, Honolulu.<br />

Chaudron, C. (1985). Intake: On models and methods for discovering learners’ processing<br />

<strong>of</strong> input, Studies in Second Language Acquisition 7(1): 1–14.<br />

Ellis, R. (1994). Factors in <strong>the</strong> incidental acquisition <strong>of</strong> second language vocabulary from<br />

oral input: A review essay, Applied Language Learning 5(1): 1–32.<br />

Ellis, R. (1997). SLA Research and Language Teaching, University Press, Oxford.<br />

Gass, S. and Mackey, A. (2000). Stimulated recall methodology in second language<br />

research, Lawrence Erlbaum Associates, London.<br />

Gass, S., Svetics, I. and Lemelin, S. (2003). Differential effects <strong>of</strong> attention, Language<br />

Learning 53(3): 497–545.<br />

Gass, S. and Varonis, E. M. (1994). Input, interaction and second language production,<br />

Studies in Second Language Acquisition 16(3): 283–302.<br />

Izumi, S. and Bigelow, M. (2000). Does output promote noticing and second language<br />

acquisition, TESOL Quarterly 34(2): 239–287.<br />

Jourdenais, R. (2001). Cognition, instruction and protocol analysis, in P. Robinson (ed.),<br />

Cognition and Second Language Instruction, Cambridge University Press, New<br />

York.<br />

Krashen, S. (1982). Principles and Practice in Second Language Acquisition, Pergamon,<br />

Oxford.<br />

Leow, R. P. (2002). Models, attention, and awareness in sla, Studies in Second Language<br />

Acquisition 24(1): 1<strong>13</strong>–119.<br />

162


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Long, M. (1996). The role <strong>of</strong> linguistic environment in second language acquisition,<br />

in W. C. Ritchie and T. K. Bhatia (eds), The Handbook <strong>of</strong> Language Acquisition,<br />

Academic Press, San Diego.<br />

Philp, J. (2003). Constraints on noticing <strong>the</strong> gap, Studies in Second Language Acquisition<br />

25(1): 99–126.<br />

Robinson, P. (1995). Attention, memory, and <strong>the</strong> noticing hypo<strong>the</strong>sis, Language Learning<br />

45(2): 283–331.<br />

Robinson, P. (2001). Individual differences, cognitive abilities, aptitude complexes and<br />

learning conditions in second language acquisition, Second Language Research<br />

17(4): 368–392.<br />

Rosa, E. and O’Neill, M. D. (1999). Explicitness, intake and <strong>the</strong> issue <strong>of</strong> awareness,<br />

Studies in Second Language Acquisition 21(4): 511–556.<br />

Schmidt, R. (1990). The role <strong>of</strong> consciousness in second language learning, Applied<br />

Linguistics 11(2): 129–158.<br />

Schmidt, R. (2001). Attention, in P. Robinson (ed.), Cognition and Second Language<br />

Instruction, Cambridge University Press, New York.<br />

Simard, D. and Wong, W. (2001). Alertness, orientation and detection, Studies in Second<br />

Language Acquisition 23(1): 103–124.<br />

Swain, M. (1985). Communicative competence: Some roles <strong>of</strong> comprehensible input and<br />

comprehensible output in its development, in S. Gass and C. Madden (eds), Input in<br />

Second Language Acquisition, Heinle & Heinle, Boston.<br />

Swain, M. (1995). Three functions <strong>of</strong> output in second language learning, in G. Cook and<br />

B. Seidlh<strong>of</strong>er (eds), Principles and practice in applied linguistics: Studies in honor<br />

<strong>of</strong> H. Widdowson, University Press, Oxford.<br />

Swain, M. and Lapkin, S. (1995). Problems in output and <strong>the</strong> cognitive processes <strong>the</strong>y<br />

generate: A step towards second language learning, Applied Linguistics 16(3): 371–<br />

391.<br />

Swain, M. and Lapkin, S. (1998). Interaction and second language learning: Two adolescent<br />

french immersion students working toge<strong>the</strong>r, Modern Language Journal<br />

82(3): 320–337.<br />

Tomlin, R. and Villa, V. (1994). Attention in cognitive science and second language<br />

acquisition, Studies in Second Language Acquisition 16(2): 183–204.<br />

Truscott, J. (1998). Noticing in second language acquisition: A critical review, Second<br />

Language Research 24(2): 103–<strong>13</strong>5.<br />

VanPatten, B. (1996). Input processing and grammar instruction in second language<br />

acquisition, Ablex, Westport.<br />

163


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

164


CDIPROVER3: A TOOL FOR PROVING DERIVATIONAL COMPLEXITIES<br />

OF TERM REWRITING SYSTEMS<br />

Andreas Schnabl<br />

University <strong>of</strong> Innsbruck<br />

Abstract. This paper describes cdiprover3 a tool for proving termination <strong>of</strong> term rewrite<br />

systems by polynomial interpretations and context dependent interpretations. The methods<br />

used by cdiprover3 induce small bounds on <strong>the</strong> derivational complexity <strong>of</strong> <strong>the</strong> considered<br />

system. We explain <strong>the</strong> tool in detail, and give an overview <strong>of</strong> <strong>the</strong> employed pro<strong>of</strong> methods.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Term rewriting is a Turing complete model <strong>of</strong> computation, which is conceptually closely<br />

related to declarative and (first-order) functional programming. One <strong>of</strong> its most studied<br />

properties, termination, is also a central problem in computer science. This property is<br />

undecidable in general, but many partial decision methods have been developed in <strong>the</strong><br />

last decades. Beyond showing termination <strong>of</strong> a given rewriting system, some <strong>of</strong> <strong>the</strong>se<br />

methods can also give bounds on different measures <strong>of</strong> its complexity. As suggested in<br />

(H<strong>of</strong>bauer and Lautemann, 1989), a natural way <strong>of</strong> measuring <strong>the</strong> complexity <strong>of</strong> a term<br />

rewrite system is to analyze its derivational complexity. The derivational complexity is<br />

a function which relates <strong>the</strong> size <strong>of</strong> a term and <strong>the</strong> maximal number <strong>of</strong> rewrite steps that<br />

can be executed starting from any term <strong>of</strong> that size in <strong>the</strong> given rewrite system . We<br />

are particularly interested in small, i.e. polynomial upper bounds on this function. In<br />

contrast to our approach <strong>of</strong> measuring derivational complexity, <strong>the</strong> constructor discipline<br />

is mentioned in (Lescanne, 1995). In this field, we look at <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> function<br />

that is encoded by a constructor system. It is ei<strong>the</strong>r measured by <strong>the</strong> number <strong>of</strong> rewrite<br />

steps needed to bring <strong>the</strong> term into normal form (Bonfante, Cichon, Marion and Touzet,<br />

n.d.; Avanzini and Moser, 2008), or by counting <strong>the</strong> number <strong>of</strong> steps needed by some<br />

evaluation mechanism different from standard term rewriting (Marion, 2003; Bonfante,<br />

Marion and Péchoux, 2007).<br />

In this paper, we describe cdiprover3, a tool which uses polynomial and contextdependent<br />

interpretations in order to prove termination and complexity bounds <strong>of</strong> term<br />

rewrite systems. The tool, its predecessors, and full experimental data are available at<br />

http://cl-informatik.uibk.ac.at/˜aschnabl/experiments/cdi/ .<br />

s Polynomial interpretations, introduced in (Lankford, 1979), are a standard direct termination<br />

pro<strong>of</strong> method. Besides showing termination <strong>of</strong> rewrite systems, <strong>the</strong>y also provide<br />

an easy way to extract upper bounds on <strong>the</strong> derivational complexity (H<strong>of</strong>bauer and<br />

Lautemann, 1989). However, as noticed in (H<strong>of</strong>bauer, 2001), this <strong>of</strong>ten heavily overestimates<br />

<strong>the</strong> derivational complexity. Context dependent interpretations, also introduced in<br />

(H<strong>of</strong>bauer, 2001), are an effort to improve <strong>the</strong>se upper bounds.<br />

165


The remainder <strong>of</strong> this paper is organised as follows: Section 2 outlines <strong>the</strong> basics <strong>of</strong><br />

term rewriting needed to state all relevant results. In Section 3, we briefly describe polynomial<br />

and context dependent interpretations, which are used by cdiprover3. Section<br />

4 describes <strong>the</strong> implementation <strong>of</strong> cdiprover3, and mentions some experimental results.<br />

In Section 5, we explain <strong>the</strong> input and output <strong>of</strong> cdiprover3 in detail. Last, in<br />

Section 6, we state conclusions and potential future work.<br />

2 Term Rewriting<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In this section, we review some basics <strong>of</strong> term rewriting. We only cover <strong>the</strong> concepts<br />

which are relevant to this paper. A general introduction to term rewriting can be found in<br />

(Baader and Nipkow, 1998; TeReSe, 2003), for instance.<br />

A term rewrite system (TRS) R consists <strong>of</strong> a signature F, a countably infinite set <strong>of</strong><br />

variables V disjoint from F, and a finite set <strong>of</strong> rewrite rules l → r, where l and r are terms<br />

such that l /∈ V and all variables which occur in r also occur in l. The signature F defines<br />

a set <strong>of</strong> function symbols, and assigns to each function symbol f its arity. We assume that<br />

every signature contains at least one function symbol <strong>of</strong> arity 0. The set <strong>of</strong> terms built<br />

from F and V is denoted by T (F, V). The set <strong>of</strong> terms T (F) without any variables is<br />

called <strong>the</strong> set <strong>of</strong> ground terms over F. A function symbol is defined if it occurs at <strong>the</strong><br />

root <strong>of</strong> a left hand side <strong>of</strong> a rewrite rule. All non-defined function symbols are called<br />

constructors. A constructor based term is a term containing exactly one defined function<br />

symbol, which appears at <strong>the</strong> root <strong>of</strong> that term. We call <strong>the</strong> total number <strong>of</strong> function<br />

symbol and variable occurrences in a term t its size, denoted by |t|. A substitution is a<br />

mapping σ : Dom(σ) → T (F, V), where Dom(σ) is a finite subset <strong>of</strong> V. The result <strong>of</strong><br />

replacing all occurrences <strong>of</strong> variables x ∈ Dom(σ) in a term t by σ(x) is denoted by tσ.<br />

A context is a term C[�] containing a single occurrence <strong>of</strong> a fresh function symbol � <strong>of</strong><br />

arity 0. If we replace � with a term t, we denote <strong>the</strong> resulting term by C[t]. Given a TRS<br />

R and two terms s, t, we say that s rewrites to t (s →R t) if <strong>the</strong>re exist a context C, a<br />

substitution σ and a rewrite rule l → r in R such that s = C[lσ] and t = C[rσ]. The<br />

transitive closure <strong>of</strong> this relation is → +<br />

R . The reflexive and transitive closure is →∗R . We<br />

write →n R to express n-fold composition <strong>of</strong> →R. A TRS R is terminating if <strong>the</strong>re exists<br />

no infinite chain <strong>of</strong> terms t0, t1, . . . such that ti →R ti+1 for each i ∈ N. For a terminating<br />

TRS R, <strong>the</strong> derivation length <strong>of</strong> a ground term t is defined as dlR(t) = max{n | ∃s :<br />

t →n R s}. The derivational complexity is <strong>the</strong> function dcR : N → N which maps n to<br />

max{dlR(t) | |t| = n}.<br />

3 Used Termination Pro<strong>of</strong> Methods<br />

3.1 Polynomial Interpretations<br />

An F-algebra A for some signature F consists <strong>of</strong> a carrier A and interpretation functions<br />

{fA : A n → A | f ∈ F, n = arity(f)}. Given an assignment α : V → A, we denote <strong>the</strong><br />

evaluation <strong>of</strong> a term t into A by [α]A(t). It is defined inductively as follows:<br />

[α]A(x) = α(x) for x ∈ V<br />

[α]A(f(t1, . . . , tn)) = fA([α]A(t1), . . . , [α]A(tn)) for f ∈ F<br />

166


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

A well-founded monotone F-algebra is a pair (A, >) where A is an F-algebra and > is<br />

a well-founded proper order such that for every function symbol f ∈ F, fA is monotone<br />

with respect to >. It is compatible with a TRS R if for every rewrite rule l → r in R<br />

and every assignment α, [α]A(l) > [α]A(r) holds. It is a well-known fact that a TRS R<br />

is terminating if and only if <strong>the</strong>re exists a well-founded monotone algebra that is compatible<br />

with R. A polynomial interpretation (Lankford, 1979) is an interpretation into a<br />

well-founded monotone algebra (A, >) such that A ⊆ N, > is <strong>the</strong> standard order on <strong>the</strong><br />

natural numbers, and fA is a polynomial for every function symbol f. If a polynomial<br />

interpretation is compatible with a TRS R, <strong>the</strong>n we clearly have dlR(t) � [α]A(t) for all<br />

terms t.<br />

Example 1. Consider <strong>the</strong> TRS R with <strong>the</strong> following rewrite rules over <strong>the</strong> signature containing<br />

<strong>the</strong> function symbols 0 (arity 0), s (arity 1), + and - (arity 2). The system is<br />

example SK90/2.11.trs in <strong>the</strong> termination problems database 1 (TPDB), which is <strong>the</strong><br />

standard benchmark for termination provers:<br />

+(0, y) → y -(0, y) → 0 -(s(x), s(y)) → -(x, y)<br />

+(s(x), y) → s(+(x, y)) -(x, 0) → x<br />

The following interpretation functions build a compatible polynomial interpretation A<br />

over <strong>the</strong> carrier N:<br />

+A(x, y) = 2x + y -A(x, y) = 3x + 3y sA(x) = x + 2 0A = 1<br />

A strongly linear interpretation is a polynomial interpretation such that every interpretation<br />

function fA has <strong>the</strong> form fA(x1, . . . , xn) = �n i=1 xi + c, c ∈ N. A surprisingly<br />

simple property is that compatibility with a strongly linear interpretation induces a linear<br />

upper bound on <strong>the</strong> derivational complexity (Schnabl, 2007).<br />

A linear polynomial interpretation is a polynomial interpretation where each interpretation<br />

function fA has <strong>the</strong> shape fA(x1, . . . , xn) = �n i=1 aixi + c, ai ∈ N, c ∈ N.<br />

For instance, <strong>the</strong> interpretation given in Example 1 is a linear polynomial interpretation.<br />

Because <strong>of</strong> <strong>the</strong>ir simplicity, this class <strong>of</strong> polynomial interpretations is <strong>the</strong> one most commonly<br />

used in automatic termination provers. As illustrated by Example 2 below, if only<br />

a single one <strong>of</strong> <strong>the</strong> coefficients ai in any <strong>of</strong> <strong>the</strong> functions fA is greater than 1, <strong>the</strong>re might<br />

already exist derivations whose length is exponential in <strong>the</strong> size <strong>of</strong> <strong>the</strong> starting term.<br />

Example 2. Consider <strong>the</strong> TRS S with <strong>the</strong> following single rule over <strong>the</strong> signature containing<br />

<strong>the</strong> function symbols a, b (arity 1), and c (arity 0). The system is example<br />

SK90/2.50.trs in <strong>the</strong> TPDB:<br />

a(b(x)) → b(b(a(x)))<br />

The following interpretation functions build a compatible linear polynomial interpretation<br />

A over N:<br />

aA(x) = 2x bA(x) = x + 1 cA = 0<br />

If we start a rewrite sequence from <strong>the</strong> term an (b(c)), we reach <strong>the</strong> normal form b2n(an (c))<br />

after 2n − 1 rewriting steps. Therefore, <strong>the</strong> derivational complexity <strong>of</strong> S is at least exponential.<br />

1 http://www.lri.fr/˜marche/tpdb/.<br />

167


3.2 Context Dependent Interpretations<br />

Even though polynomial interpretations provide an easy way to obtain an upper bound<br />

on <strong>the</strong> derivational complexity <strong>of</strong> a TRS, <strong>the</strong>y are not very suitable for proving polynomial<br />

derivational complexity. Strongly linear interpretations only capture linear derivational<br />

complexity, but even a slight generalization admits already examples <strong>of</strong> exponential<br />

derivational complexity, as illustrated by Example 2. In (H<strong>of</strong>bauer, 2001), context dependent<br />

interpretations are introduced. They use an additional parameter (usually denoted<br />

by ∆) in <strong>the</strong> interpretation functions, which changes in <strong>the</strong> course <strong>of</strong> evaluating <strong>the</strong> interpretation<br />

<strong>of</strong> a term, thus making <strong>the</strong> interpretation dependent on <strong>the</strong> context. This way <strong>of</strong><br />

computing interpretations also allows us to bridge <strong>the</strong> gap between linear and polynomial<br />

derivational complexity.<br />

Definition 3. A context-dependent interpretation C for some signature F consists <strong>of</strong> functions<br />

{fC[∆] : (R + 0 ) n → R + 0 | f ∈ F, n = arity(f), ∆ ∈ R + } and {f i C : R+ → R + | f ∈<br />

F, i ∈ {1, . . . , arity(f)}}. Given a ∆-assignment α : R + × V → R + 0 , <strong>the</strong> evaluation <strong>of</strong> a<br />

term t by C is denoted by [α, ∆]C(t). It is defined inductively as follows:<br />

[α, ∆]C(x) = α(∆, x) for x ∈ V<br />

[α, ∆]C(f(t1, . . . , tn)) = fC[∆]([α, f 1 C (∆)]C(t1), . . . , [α, f n C (∆)]C(tn)) for f ∈ F<br />

Definition 4. For each ∆ ∈ R + , let >∆ be <strong>the</strong> order defined by a >∆ b ⇐⇒ a − b � ∆.<br />

A context-dependent interpretation C is compatible with a TRS R if for all rewrite rules<br />

l → r in R, all ∆ ∈ R + , and every ∆-assignment α, we have [α, ∆]C(l) >∆ [α, ∆]C(r).<br />

Definition 5. A ∆-linear interpretation is a context dependent interpretation C whose<br />

interpretation functions have <strong>the</strong> form<br />

fC[∆](z1, . . . , zn) =<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

n�<br />

n�<br />

a(f,i)zi + b(f,i)zi∆ + cf∆ + df f<br />

i=1<br />

i=1<br />

i C(∆) =<br />

a(f,i) + b(f,i)∆<br />

with a(f,i), b(f,i), cf, df ∈ N, a(f,i) + b(f,i) �= 0 for all f ∈ F, 1 � i � n. If we have<br />

a(f,i) ∈ {0, 1} for all f, i, we also call it a ∆-restricted interpretation<br />

We consider ∆-linear interpretations because <strong>of</strong> <strong>the</strong> similarity between <strong>the</strong> functions<br />

fC[∆] and <strong>the</strong> interpretation functions <strong>of</strong> linear polynomial interpretations. Ano<strong>the</strong>r point<br />

<strong>of</strong> interest is that <strong>the</strong> simple syntactical restriction to ∆-restricted interpretations yields a<br />

quadratic upper bound on <strong>the</strong> derivational complexity. Moreover, because <strong>of</strong> <strong>the</strong> special<br />

shape <strong>of</strong> ∆-linear interpretations, we need no additional monotonicity criterion for our<br />

main <strong>the</strong>orems:<br />

Theorem 6 ((Moser and Schnabl, 2008)). Let R be a TRS and suppose that <strong>the</strong>re exists<br />

a compatible ∆-linear interpretation. Then R is terminating and dcR(n) = 2 O(n) .<br />

Theorem 7 ((Schnabl, 2007)). Let R be a TRS and suppose that <strong>the</strong>re exists a compatible<br />

∆-restricted interpretation. Then R is terminating and dcR(n) = O(n 2 ).<br />

168<br />


Example 8. Consider <strong>the</strong> TRS given in Example 1 again. A compatible ∆-restricted (and<br />

∆-linear) interpretation C is built from <strong>the</strong> following interpretation functions:<br />

+C[∆](x, y) = (1 + ∆)x + y + ∆ + 1 C(∆) = ∆<br />

1 + ∆<br />

+ 2 C(∆) = ∆<br />

-C[∆](x, y) = x + y + ∆ - 1 C(∆) = ∆ − 2 C(∆) = ∆<br />

sC[∆](x) = x + ∆ + 1 s 1 C(∆) = ∆ 0C[∆] = 0<br />

Note that this interpretation gives a quadratic upper bound on <strong>the</strong> derivational complexity.<br />

However, from <strong>the</strong> polynomial interpretation given in Example 1, we can only infer an exponential<br />

upper bound (H<strong>of</strong>bauer and Lautemann, 1989). Consider <strong>the</strong> term Pn,n, where<br />

we define P0,n = s n (0) and Pm+1,n = +(Pm,n, 0). We have |Pn,n| = 3n + 1. For every<br />

m, n ∈ N, Pm+1,n rewrites to Pm,n in n + 1 steps. Therefore, Pn,n reaches its normal form<br />

s n (0) after n(n + 1) rewriting steps. Hence, <strong>the</strong> derivational complexity is also Ω(n 2 ) for<br />

this example, so <strong>the</strong> inferred bound O(n 2 ) is tight.<br />

4 Implementation<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

cdiprover3 is written fully in OCaml 2 . It employs <strong>the</strong> libraries <strong>of</strong> <strong>the</strong> termination<br />

prover TTT2 3 . From <strong>the</strong>se libraries, functionality for handling TRSs and SAT encodings,<br />

and an interface to <strong>the</strong> SAT solver MiniSAT 4 are used. Without counting this, <strong>the</strong> tool<br />

consists <strong>of</strong> about 1700 lines <strong>of</strong> OCaml code. About 25% <strong>of</strong> that code are devoted to<br />

<strong>the</strong> manipulation <strong>of</strong> polynomials and extensions <strong>of</strong> polynomials that stem from our use<br />

<strong>of</strong> <strong>the</strong> parameter ∆. Ano<strong>the</strong>r 35% are used for constructing parametric interpretations<br />

and building suitable Diophantine constraints (see below) which enforce <strong>the</strong> necessary<br />

conditions for termination. Using TTT2’s library for propositional logic and its interface<br />

to MiniSAT, 15% <strong>of</strong> <strong>the</strong> code deal with encoding Diophantine constraints into SAT. The<br />

remaining code is used for parsing input options and <strong>the</strong> given TRS, generating output,<br />

and controlling <strong>the</strong> program flow.<br />

In order to find polynomial interpretations automatically, Diophantine constraints are<br />

generated according to <strong>the</strong> procedure described in (Contejean, Marché, Tomás and Urbain,<br />

2005). Putting an upper bound on <strong>the</strong> coefficients makes <strong>the</strong> problem finite. Essentially<br />

following (Fuhs, Giesl, Middeldorp, Schneider-Kamp, Thiemann and Zankl, 2007),<br />

we <strong>the</strong>n encode <strong>the</strong> (finite domain) constraints into a propositional satisfiability problem.<br />

This problem is given to MiniSAT. From a satisfying assignment for <strong>the</strong> SAT problem,<br />

we construct a polynomial interpretation which is monotone and compatible with <strong>the</strong><br />

given TRS.<br />

This procedure is also <strong>the</strong> basis <strong>of</strong> <strong>the</strong> automatic search for ∆-linear and ∆-restricted<br />

interpretations. The starting point <strong>of</strong> that search is an interpretation with uninstantiated<br />

coefficients. If we want to be able to apply Theorem 6 or 7, we need to find coefficients<br />

which make <strong>the</strong> resulting interpretation compatible with <strong>the</strong> given TRS. Fur<strong>the</strong>rmore,<br />

we need to make sure that no divisions by zero occur in <strong>the</strong> interpretation functions.<br />

Again, we encode <strong>the</strong>se properties into Diophantine constraints on <strong>the</strong> coefficients <strong>of</strong> a<br />

∆-linear or ∆-restricted interpretation. The encoding is an adaptation <strong>of</strong> <strong>the</strong> procedure in<br />

2 http://caml.inria.fr.<br />

3 http://colo6-c703.uibk.ac.at/ttt2.<br />

4 http://minisat.se.<br />

169


Table 1: Performance <strong>of</strong> cdiprover3<br />

Method SL SL+∆-restricted ∆-linear ∆-restricted<br />

-i -b X 31 31 31 3 7 15 31<br />

# success 41 87 83 83 86 86 86<br />

average success time 20 3010 5527 3652 4041 4008 3986<br />

# timeout 0 237 797 144 189 221 238<br />

(Contejean et al., 2005) to context-dependent interpretations. It is described in detail in<br />

(Schnabl, 2007; Moser and Schnabl, 2008). Once we have built <strong>the</strong> constraints, we continue<br />

using <strong>the</strong> same techniques as for searching polynomial interpretations: we encode<br />

<strong>the</strong> constraints in a propositional satisfiability problem, apply <strong>the</strong> SAT solver, and use a<br />

satisfying assignment to construct a context-dependent interpretation.<br />

Table 1 shows experimental results <strong>of</strong> applying cdiprover3 on <strong>the</strong> 957 known terminating<br />

examples <strong>of</strong> <strong>the</strong> TPDB. The tests were performed single-threaded on a 2.40 GHz<br />

Intel R○ CoreTM 2 Duo with 2 GB <strong>of</strong> memory. For each system, cdiprover3 was given<br />

a timeout <strong>of</strong> 60 seconds. All times in <strong>the</strong> table are given in milliseconds. The method<br />

SL denotes strongly linear interpretations. In all tests, we called cdiprover3 with <strong>the</strong><br />

options -i -b X (see Section 5 below), where X is specified in <strong>the</strong> second row <strong>of</strong> <strong>the</strong><br />

table. As we can see, cdiprover3 is currently able to prove polynomial derivational<br />

complexity for 87 <strong>of</strong> <strong>the</strong> 368 known terminating non-duplicating rewrite systems <strong>of</strong> <strong>the</strong><br />

TPDB (duplicating rewrite systems have at least exponential derivational complexity, so<br />

this restriction is harmless here). The results indicate that an upper bound <strong>of</strong> 7 on <strong>the</strong> coefficient<br />

variables suffices to capture all examples on our test set. Therefore, 3 and 7 seem<br />

to be good candidates for default values <strong>of</strong> <strong>the</strong> -b option. However, it should be noted<br />

that our handling <strong>of</strong> <strong>the</strong> divisions introduced by <strong>the</strong> functions f i C<br />

is computationally ra<strong>the</strong>r<br />

expensive, which is indicated by <strong>the</strong> number <strong>of</strong> timeouts and <strong>the</strong> average time needed<br />

for successful pro<strong>of</strong>s. This also explains <strong>the</strong> slight decrease in performance when we<br />

extend <strong>the</strong> search space to ∆-linear interpretations. However, <strong>the</strong>re is one system which<br />

can be handled by ∆-linear interpretations, but not by ∆-simple interpretations: system<br />

SK90/2.50 in <strong>the</strong> TPDB, which we mentioned in Example 2.<br />

5 Using cdiprover3<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

cdiprover3 is called from command line. The basic usage pattern for cdiprover3<br />

is<br />

$ ./cdiprover3 <br />

• specifies <strong>the</strong> maximum number <strong>of</strong> seconds until cdiprover3 stops<br />

looking for a suitable interpretation.<br />

• specifies <strong>the</strong> path to <strong>the</strong> file which contains <strong>the</strong> considered TRS.<br />

• For , <strong>the</strong> following switches are available:<br />

-c defines <strong>the</strong> desired subclass <strong>of</strong> <strong>the</strong> searched polynomial or contextdependent<br />

interpretation. The following values <strong>of</strong> are legal:<br />

170


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

linear, simple, simplemixed, quadratic These classes correspond to <strong>the</strong> respective<br />

subclasses <strong>of</strong> polynomial interpretations, as defined in (Steinbach,<br />

1992). Linear polynomial interpretations imply an exponential upper<br />

bound on <strong>the</strong> derivational complexity. The o<strong>the</strong>r classes imply a double<br />

exponential upper bound, cf. (H<strong>of</strong>bauer and Lautemann, 1989).<br />

pizerolinear, pizerosimple, pizerosimplemixed, pizeroquadratic For <strong>the</strong>se<br />

values, cdiprover3 tries to find a polynomial interpretation with <strong>the</strong><br />

following restrictions: defined function symbols are interpreted by linear,<br />

simple, simple-mixed, or quadratic polynomials, respectively. Constructors<br />

are interpreted by strongly linear polynomials. These interpretations<br />

guarantee that <strong>the</strong> derivation length <strong>of</strong> all constructor based terms is polynomial<br />

(Bonfante et al., n.d.).<br />

sli This option corresponds to strongly linear interpretations. As mentioned<br />

in Section 3, <strong>the</strong>y induce a linear upper bound on <strong>the</strong> derivational complexity<br />

<strong>of</strong> a compatible TRS.<br />

deltalinear This value specifies that <strong>the</strong> tool should search for a ∆-linear<br />

interpretation. By Theorem 6, compatibility with such an interpretation<br />

implies an exponential upper bound on <strong>the</strong> derivational complexity.<br />

deltarestricted This option corresponds to ∆-restricted interpretations. By<br />

Theorem 7, <strong>the</strong>y induce a quadratic upper bound.<br />

-b sets <strong>the</strong> upper bound for <strong>the</strong> coefficient variables. The default value<br />

for this bound is 3.<br />

-i This switch activates an incremental strategy for handling <strong>the</strong> upper bound on<br />

<strong>the</strong> coefficient variables. First, cdiprover3 tries to find a solution using<br />

an intermediate upper bound <strong>of</strong> 1 (which corresponds to encoding each coefficient<br />

variable by one bit). Whenever <strong>the</strong> tool fails to find a pro<strong>of</strong> for some<br />

upper bound b, it is checked whe<strong>the</strong>r b is equal to <strong>the</strong> bound specified by <strong>the</strong><br />

-b option. If that is <strong>the</strong> case, <strong>the</strong>n <strong>the</strong> search for a pro<strong>of</strong> is given up. O<strong>the</strong>rwise,<br />

b is set to <strong>the</strong> minimum <strong>of</strong> <strong>the</strong> bound specified by <strong>the</strong> -b option and<br />

2(b+1)−1 (which corresponds to increasing <strong>the</strong> number <strong>of</strong> bits used for each<br />

coefficient variable by 1).<br />

If <strong>the</strong> -c switch is not specified, <strong>the</strong>n <strong>the</strong> standard strategy for proving polynomial<br />

derivational complexity is employed. First, cdiprover3 looks for a strongly linear<br />

interpretation. If that is not successful, <strong>the</strong>n a suitable ∆-restricted interpretation is<br />

searched. The input TRS files are expected to have <strong>the</strong> same format as <strong>the</strong> files in <strong>the</strong><br />

TPDB. The format specification for this database is available at http://www.lri.<br />

fr/˜marche/tpdb/format.html.<br />

The output given by cdiprover3, as exemplified by Example 9, is structured as<br />

follows. The first line contains a short answer to <strong>the</strong> question whe<strong>the</strong>r <strong>the</strong> given TRS<br />

is terminating: YES, MAYBE, or TIMEOUT. The latter means that cdiprover3 was<br />

still busy after <strong>the</strong> specified timeout. MAYBE means that a termination pro<strong>of</strong> could not<br />

be found, and cdiprover3 gave up before time ran out. The answer YES indicates<br />

that an interpretation <strong>of</strong> <strong>the</strong> given class has been found which guarantees termination <strong>of</strong><br />

<strong>the</strong> given TRS. It is followed by <strong>the</strong> inferred bound on <strong>the</strong> derivational complexity and a<br />

171


listing <strong>of</strong> <strong>the</strong> interpretation functions. After <strong>the</strong> interpretation functions, <strong>the</strong> elapsed time<br />

between <strong>the</strong> call <strong>of</strong> cdiprover3 and <strong>the</strong> output <strong>of</strong> <strong>the</strong> pro<strong>of</strong> is given. In all cases, <strong>the</strong><br />

answer is concluded by statistics stating <strong>the</strong> total number <strong>of</strong> monomials in <strong>the</strong> constructed<br />

Diophantine constraints, and <strong>the</strong> upper bound for <strong>the</strong> coefficients that was used in <strong>the</strong> last<br />

call to MiniSAT.<br />

Example 9. Given <strong>the</strong> TRS shown in Example 1, cdiprover3 produces <strong>the</strong> output<br />

shown in Figure 1. The interpretations in Example 8 and in <strong>the</strong> output are equivalent.<br />

Note that <strong>the</strong> parameter ∆ in <strong>the</strong> interpretation functions fC[∆] is treated like ano<strong>the</strong>r<br />

argument <strong>of</strong> <strong>the</strong> function. The interpretation functions f i C are represented by f tau i in <strong>the</strong><br />

output.<br />

6 Conclusion<br />

In this paper, we have presented <strong>the</strong> (as far as we know) first tool which is specifically<br />

designed for automatically proving polynomial derivational complexity <strong>of</strong> term rewriting.<br />

We have also given a brief introduction into <strong>the</strong> applied pro<strong>of</strong> methods. With our current<br />

implementation, we are able to prove polynomial derivational complexity for 87 <strong>of</strong><br />

<strong>the</strong> 368 known terminating non-duplicating rewrite systems <strong>of</strong> <strong>the</strong> TPDB. By adding new<br />

termination methods to our tool which can prove polynomial derivational complexity <strong>of</strong><br />

rewrite systems, we could extend <strong>the</strong> range <strong>of</strong> problems that <strong>the</strong> prover can solve. The<br />

matchbounds technique comes to mind here, which induces a linear upper bound on <strong>the</strong><br />

derivational complexity <strong>of</strong> <strong>the</strong> considered system (Geser, H<strong>of</strong>bauer, Waldmann and Zantema,<br />

2007; Korp and Middeldorp, 2007). Ano<strong>the</strong>r avenue for future work is <strong>the</strong> search for<br />

o<strong>the</strong>r subclasses <strong>of</strong> context-dependent interpretations which imply non-quadratic and nonlinear,<br />

but polynomial upper bounds on <strong>the</strong> derivational complexity. A fur<strong>the</strong>r possibility<br />

would be to find more efficient ways <strong>of</strong> handling <strong>the</strong> divisions introduced by <strong>the</strong> functions<br />

f i C . Results in this area would help to fur<strong>the</strong>r improve <strong>the</strong> power <strong>of</strong> cdiprover3.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Avanzini, M. and Moser, G. (2008). Complexity analysis by rewriting, Proc. 9th FLOPS,<br />

Vol. 4989 <strong>of</strong> LNCS, pp. <strong>13</strong>0–146.<br />

Baader, F. and Nipkow, T. (1998). Term Rewriting and All That, Cambridge University<br />

Press.<br />

Bonfante, G., Cichon, A., Marion, J.-Y. and Touzet, H. (n.d.). Algorithms with polynomial<br />

interpretation termination pro<strong>of</strong>, J. Funct. Program. (1): 33–53.<br />

Bonfante, G., Marion, J.-Y. and Péchoux, R. (2007). Quasi-interpretation syn<strong>the</strong>sis by<br />

decomposition, Proc. 4th ICTAC, Vol. 4711 <strong>of</strong> LNCS, pp. 410–424.<br />

Contejean, E., Marché, C., Tomás, A. P. and Urbain, X. (2005). Mechanically proving<br />

termination using polynomial interpretations., J. Autom. Reason. 34(4): 325–363.<br />

Fuhs, C., Giesl, J., Middeldorp, A., Schneider-Kamp, P., Thiemann, R. and Zankl, H.<br />

(2007). SAT solving for termination analysis with polynomial interpretations, Proc.<br />

SAT 2007, Vol. 4501 <strong>of</strong> LNCS, pp. 340–354.<br />

172


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Geser, A., H<strong>of</strong>bauer, D., Waldmann, J. and Zantema, H. (2007). On tree automata that<br />

certify termination <strong>of</strong> left-linear term rewriting systems, Inf. Comput. 205(4): 512–<br />

534.<br />

H<strong>of</strong>bauer, D. (2001). Termination pro<strong>of</strong>s by context-dependent interpretations, Proc. 12th<br />

RTA, Vol. 2051 <strong>of</strong> LNCS, pp. 108–121.<br />

H<strong>of</strong>bauer, D. and Lautemann, C. (1989). Termination pro<strong>of</strong>s and <strong>the</strong> length <strong>of</strong> derivations,<br />

Proc. 3rd RTA, Vol. 355 <strong>of</strong> LNCS, pp. 167–177.<br />

Korp, M. and Middeldorp, A. (2007). Proving termination <strong>of</strong> rewrite systems using<br />

bounds, Proc. 18th RTA, Vol. 4533 <strong>of</strong> LNCS, pp. 273–287.<br />

Lankford, D. (1979). On proving term-rewriting systems are noe<strong>the</strong>rian, Technical Report<br />

MTP-2, Math. Dept., Louisiana Tech. University.<br />

Lescanne, P. (1995). Termination <strong>of</strong> rewrite systems by elementary interpretations, Formal<br />

Aspects <strong>of</strong> Computing 7(1): 77–90.<br />

Marion, J.-Y. (2003). Analysing <strong>the</strong> implicit complexity <strong>of</strong> programs, Inf. Comput.<br />

183(1): 2–18.<br />

Moser, G. and Schnabl, A. (2008). Proving quadratic derivational complexities using<br />

context dependent interpretations, Proc. 19th RTA. Accepted for publication.<br />

Schnabl, A. (2007). Context Dependent Interpretations 5 , Master’s <strong>the</strong>sis, Universität<br />

Innsbruck.<br />

Steinbach, J. (1992). Proving polynomials positive, Proc. 12th FSTTCS, Vol. 652 <strong>of</strong><br />

LNCS, pp. 191–202.<br />

TeReSe (2003). Term Rewriting Systems, Vol. 55 <strong>of</strong> Cambridge Tracts in Theoretical<br />

Computer Science, Cambridge University Press.<br />

5 Available online at http://cl-informatik.uibk.ac.at/˜aschnabl/<br />

173


Figure 1: Output produced by cdiprover3.<br />

$ cat tpdb-4.0/TRS/SK90/2.11.trs<br />

(VAR x y)<br />

(RULES<br />

+(0,y) -> y<br />

+(s(x),y) -> s(+(x,y))<br />

-(0,y) -> 0<br />

-(x,0) -> x<br />

-(s(x),s(y)) -> -(x,y)<br />

)<br />

(COMMENT Example 2.11 (Addition and Subtraction) in \cite{SK90})<br />

$ ./cdiprover3 -i tpdb-4.0/TRS/SK90/2.11.trs 60<br />

YES<br />

QUADRATIC upper bound on <strong>the</strong> derivational complexity<br />

This TRS is terminating using <strong>the</strong> deltarestricted interpretation<br />

-(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 0*X1*delta + 1*delta<br />

s(delta, X0) = + 1*X0 + 1 + 0*X0*delta + 1*delta<br />

0(delta) = + 0 + 0*delta<br />

+(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 1*X1*delta + 1*delta<br />

- tau 1(delta) = delta/(1 + 0 * delta)<br />

- tau 2(delta) = delta/(1 + 0 * delta)<br />

s tau 1(delta) = delta/(1 + 0 * delta)<br />

+ tau 1(delta) = delta/(1 + 1 * delta)<br />

+ tau 2(delta) = delta/(1 + 0 * delta)<br />

Time: 0.024418 seconds<br />

Statistics:<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Number <strong>of</strong> monomials: 187<br />

Last formula building started for bound 1<br />

Last SAT solving started for bound 1<br />

174


THE RANK(S) OF A TOTALLY LEXICALIST SYNTAX<br />

Éva Szilágyi<br />

University <strong>of</strong> Pécs<br />

Abstract. Our project works on <strong>the</strong> implementation <strong>of</strong> a totally lexicalist grammar. Now<br />

syntax has been worked out, which in this approach is like a dependency grammar, but word<br />

order is handled. In harmony with <strong>the</strong> idea <strong>of</strong> total lexicalism, no PS-trees (nor transformation)<br />

exist. We use rank parameters, close to Optimality Theory for expressing word order<br />

variations in a language. A special kind <strong>of</strong> rank parameters account for Hungarian focus phenomena,<br />

which makes radical surface changes in word order (beyond intonational effects).<br />

The system is implemented in a relational database (SQL).<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Predicates are seeking <strong>the</strong>ir arguments in every language <strong>of</strong> <strong>the</strong> world, and adjuncts are<br />

seeking <strong>the</strong>ir joining points too. We claim that only 8-10 operations work in languages,<br />

but <strong>the</strong>ir effectiveness is different. This can be ordered by rank parameters: a universal<br />

tool (as in Optimality Theory (Archangeli and Langendoen, 1997)) with language-specific<br />

settings. Our project aims to develop an MT system based on GASG (Generalized Argument<br />

Structure Grammar), a totally lexicalist <strong>the</strong>ory (Alberti, 1999). We are linguists<br />

basically, so our high-priority goal is linguistic. Lexicalist <strong>the</strong>ories are successful nowadays,<br />

and we aim to try out this extremity <strong>of</strong> lexicalism both <strong>the</strong>oretically and practically.<br />

For us this is more important than effectiveness in size, speed or time.<br />

The lexicon is in a relational database. The essence <strong>of</strong> relational databases is in <strong>the</strong><br />

definition <strong>of</strong> relations. Relations describe facts and contribute <strong>the</strong> database as well. Each<br />

entity is an n-tuple: <strong>the</strong> elements <strong>of</strong> <strong>the</strong> tuples are in a relation contributing a record.<br />

The elements are attributes contributing <strong>the</strong> fields <strong>of</strong> a record. A relation is a table<br />

in <strong>the</strong> database, where each row (record) is an n-tuple and each column is an attribute<br />

(Halassy, 1994). We chose Micros<strong>of</strong>t SQL 2005 for our implementation, so we have a<br />

complete and complex database management frame system.<br />

A morphophonological component has been transferred from our former project. Now<br />

rules <strong>of</strong> syntax are being built in. The main component will be <strong>the</strong> semantic component:<br />

<strong>the</strong> implementation <strong>of</strong> <strong>the</strong> DRT-based (Kamp, van Genabith and Reyle, 2004) (Asher and<br />

Lascarides, 2003) ReALIS dynamical semantic system (Alberti, 2005).<br />

GASG is a monostratal declarative grammar which is considered to be ”totally lexicalist”.<br />

Total lexicalism means that all information is in <strong>the</strong> description <strong>of</strong> <strong>the</strong> lexical<br />

items, and unification exclusively moves <strong>the</strong> combining <strong>of</strong> lexical elements. Thus, it can<br />

be considered as a modified unificational categorial grammar (even function application<br />

is omitted). It carries on radical lexicalism, introduced by (Karttunen, 1986), which states<br />

that if <strong>the</strong> lexicon is properly rich, <strong>the</strong>n sentences so can be produced by unification that<br />

phrase-structure is practically redundant, besides, it goes to false ambiguities. Works<br />

in computational linguistics (for example (Schneider, 2005)) also come to <strong>the</strong> point that<br />

175


educing phrase-structure could be useful. Many applications lean on phrase-structure,<br />

because o<strong>the</strong>rwise a dependency grammar, without restricting word-order, is not effecitve<br />

in computation. GASG accounts for word-order by rank parameters, so giving up phrasestructure<br />

does not result in exponential running time <strong>of</strong> <strong>the</strong> analyzing algorhythm.<br />

Thus, ’rules’ mentioned above are not really rules, but properties which can be unified.<br />

Requested arguments and <strong>the</strong>ir realizations are properties, too. Word order requirements<br />

are also properties: requirements with different strength. Our grammar model uses rank<br />

parameters for expressing word order, so this means that a requirement can not only be<br />

completed or violated, but it can compete with (partially) incompatible requirements.<br />

A special variant <strong>of</strong> <strong>the</strong>se rank parameters also expresses those cases where focus (or<br />

ano<strong>the</strong>r operator) is ”re-ordering” word order (compared to a neutral sentence). In written<br />

Hungarian sentences <strong>the</strong>re is no o<strong>the</strong>r sign <strong>of</strong> focus (in spoken sentences <strong>the</strong>re is emphasis<br />

as well).<br />

2 Rank parameters<br />

Primitive syntactic relations (like being before or after each o<strong>the</strong>r) can be considered as a<br />

direct preceding requirement in <strong>the</strong> description <strong>of</strong> <strong>the</strong> lexical item. This is because if an<br />

element is in relationship with a head, it wants to be its neighbour. To give a short example<br />

in Hungarian: a definite article needs a noun immediately after itself (1a). If an adjective<br />

is <strong>the</strong>re, it needs <strong>the</strong> noun being immediately after itself as well (1b). If this noun has a<br />

possessive suffix, <strong>the</strong> suffix wants <strong>the</strong> possessor between <strong>the</strong> article and <strong>the</strong> adjective (1c).<br />

Ano<strong>the</strong>r adjective, expressing nationality has to be before <strong>the</strong> noun (1d). Both adjectives<br />

cannot precede <strong>the</strong> noun: nationality gets priority in this case. Since sentences are linear,<br />

a head has only two neighbours <strong>the</strong>oretically. And practically languages usually pick <strong>the</strong>ir<br />

complements from one direction.<br />

(1) a. a tanárom<br />

<strong>the</strong> teacher-Poss1Sg<br />

’my teacher’<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

b. az okos tanárom<br />

<strong>the</strong> clever teacher-Poss1Sg<br />

’my clever teacher’<br />

c. az én okos tanárom / *az okos én tanárom<br />

<strong>the</strong> I clever teacher-Poss1Sg / <strong>the</strong> clever I teacher-Poss1Sg<br />

’my clever teacher’<br />

d. az én okos magyar tanárom<br />

<strong>the</strong> I clever Hungarian teacher-Poss1Sg<br />

’my clever Hungarian teacher’<br />

These relations can be expressed by a parameter, called rank parameter, a number<br />

expressing that two lexical items need to be that close to each o<strong>the</strong>r to express <strong>the</strong> relationship<br />

between <strong>the</strong>m. So now we can calculate how a requirement can be satisfied<br />

indirectly (or partially). In <strong>the</strong> case <strong>of</strong> (1a) and as for <strong>the</strong> nationality adjective (1d) it is<br />

regarded as <strong>the</strong> direct satisfaction <strong>of</strong> a requirement. The requirement <strong>of</strong> <strong>the</strong> article in (1b)<br />

or <strong>the</strong> adjective in (1d) is an indirect satisfaction.<br />

176


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 1: Indirect satisfaction in (1d).<br />

Rank parameters show in which direction <strong>the</strong> satisfying word should be. It is expressed<br />

by a character. It can be a, b or c, referring to a following or a previous position or both.<br />

We differentiate two types <strong>of</strong> rank parameters based on <strong>the</strong> way <strong>of</strong> satisfying requirements.<br />

Recessive rank parameters (r) give neighbourhood relations (as in (1a-d)), and<br />

<strong>the</strong>y are satisfied ei<strong>the</strong>r if <strong>the</strong>y are adjacent immediately or ano<strong>the</strong>r element with stronger<br />

(a smaller number) rank is wedged in 1 . In Figure 1. <strong>the</strong> 5 strength requirement <strong>of</strong> <strong>the</strong> determinant<br />

az ’<strong>the</strong>’ to <strong>the</strong> noun tanárom ’teacher-Poss1S’ is satisfied. This case is a partial<br />

or indirect satisfaction (Alberti, 1999) (see fur<strong>the</strong>r examples in (6-7)). From conflicting<br />

dominant rank parameters (d) only <strong>the</strong> strongest one can be satisfied, all o<strong>the</strong>rs are deleted<br />

(see section 6).<br />

Dominant parameters come language-specifically from ei<strong>the</strong>r syntax or semantics. For<br />

example, in Hungarian <strong>the</strong> subject <strong>of</strong> a sentence precedes <strong>the</strong> verb by a dominant semantic<br />

rank parameter, and in no language it is morpheme-marked (thus, it is not a separate<br />

lexical item). In contrary, <strong>the</strong> subject obligately precedes <strong>the</strong> verb in English, even if it is<br />

semantically empty. Dominant parameters also play an important part in <strong>the</strong> Hungarian<br />

focus phenomena (see examples (6-11)).<br />

3 Predicates and arguments, heads and complements<br />

Argument structures are considered as entities. Their elements are given by a stock table<br />

<strong>of</strong> argument types. Therefore, an argument is formed by a relationship between <strong>the</strong> argument<br />

structure and an argument type. For example, <strong>the</strong> Hungarian verb lakik ’live’ has<br />

two arguments: <strong>the</strong> one who lives somewhere and <strong>the</strong> place where <strong>the</strong> one lives.<br />

Argument types are described by a number parameter which places <strong>the</strong> argument in<br />

a scale <strong>of</strong> being agentive or patient-like. Those types which are not in <strong>the</strong> central frame<br />

which describes relations between subjects and objects get a neutral parameter.<br />

In Hungarian we consider nominal parts <strong>of</strong> speech as <strong>the</strong>y have more than one argument<br />

structure: <strong>the</strong>y can be arguments <strong>the</strong>mselves as <strong>the</strong>ir basic – in most <strong>of</strong> <strong>the</strong> languages<br />

<strong>the</strong> only one – role (2a), or can be nominal predicates, too, because <strong>the</strong> copula is phonetically<br />

null in Hungarian in present tense third person singular (2b). And we count <strong>the</strong> short<br />

possessive form here, which searches for a possessive suffix (2c).<br />

(2) a. Péter Budapesten lakik.<br />

Peter-NOM Budapest-SUPERESS live-3Sg<br />

’Peter lives in Budapest.’<br />

1 Wedging in has perceptional limits.<br />

177


. Annak a fiúnak a neve Péter.<br />

That-DAT <strong>the</strong> boy-DAT <strong>the</strong> name-Poss3Sg Peter-NOM<br />

’That boy’s name is Peter.’<br />

c. Péter kalapja.<br />

Peter-NOM hat-Poss3Sg<br />

’Peter’s hat’<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

We store <strong>the</strong> required complements <strong>the</strong> same way: <strong>the</strong>re is a case frame, where: <strong>the</strong><br />

word ’case’ now has an extended meaning, we record here all forms like infinitive or<br />

postpositional phrases, just like constant phrases to which a case-suffixed word form (3a)<br />

can be switched (3b). Therefore, cases are stored as a relationship between <strong>the</strong> case frame<br />

and a case type.<br />

(3) a. Péter elárult pár dolgot Mariról.<br />

Peter-NOM disclose-Past3Sg couple thing-ACC Mary-DELAT<br />

’Peter disclosed a couple <strong>of</strong> things about Mary.’<br />

b. Péter elárult pár dolgot Marival kapcsolatban.<br />

Peter-NOM disclose-Past3Sg couple thing-ACC Mary-INS relation-INESS<br />

’Peter disclosed a couple <strong>of</strong> things about Mary/related to Mary.’<br />

Sometimes <strong>the</strong> lexical item does not select a certain case for its argument. The verb<br />

lakik ’live’ has two cases for its arguments: <strong>the</strong> former one gets <strong>the</strong> nominative case,<br />

by <strong>the</strong> linkage between <strong>the</strong> argument and <strong>the</strong> case. The o<strong>the</strong>r one is a joker type: ’not<br />

specified’. The lack <strong>of</strong> <strong>the</strong> filled argument may cause a non-grammatical sentence, even<br />

though at this point we do not know <strong>the</strong> exact case (case type) it is realized as. Therefore,<br />

argument types and case types can be linked, too. For <strong>the</strong> ’PLACE’ type argument, more<br />

case types can be selected, as <strong>the</strong>se examples show:<br />

(4) a. Péter egy szép házban lakik.<br />

Peter-NOM a nice house-INESS live-3Sg<br />

’Peter lives in a nice house’<br />

b. Péter Budapesten lakik.<br />

Peter-NOM Budapest-SUPERESS live-3Sg<br />

’Peter lives in Budapest.’<br />

c. Péter az iskola mellett lakik.<br />

Peter-NOM <strong>the</strong> school-NOM next-POSTPOS live-3Sg<br />

’Peter lives next to <strong>the</strong> school.’<br />

Syntax may account for adjuncts too. A suffixed noun is an adjunct when <strong>the</strong> suffix is<br />

compositional, but all those compositional elements are complements which are required<br />

by ano<strong>the</strong>r element. In this case <strong>the</strong> suffix (or <strong>the</strong> lexical item: ott ’<strong>the</strong>re’) tells about<br />

itself that it is an adjunct requiring a noun.<br />

4 Rank parameters in operation<br />

Rank parameters come from description, by experience. In <strong>the</strong> followings, some Hungarian<br />

examples show how <strong>the</strong>y work.<br />

178


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In Hungarian a head-complement relation is given by a 7 strength rank parameter. We<br />

do not give any direction because (since lexical items are moprhemes) <strong>the</strong> place <strong>of</strong> <strong>the</strong><br />

complement is underspecified at this point. Semantic requirements search an aspectualization<br />

argument in <strong>the</strong> pre-verbal position. There is always an argument giving aspect:<br />

usually it is a pre-verb (5a) 2 or a bare NP (5b) or occasionally it can be <strong>the</strong> verb itself (5c)<br />

(Alberti, 2004).<br />

(5) a. Péter megírta a leckét.<br />

Peter-NOM Perf+write-Past3Sg <strong>the</strong> homework-ACC<br />

’Peter has written <strong>the</strong> homework.’<br />

b. Már három hete újságot árulok.<br />

Already three week newspaper-ACC sell-1Sg<br />

’I have been selling newspaper for three weeks already.’<br />

c. Péter csalódik Mariban.<br />

Peter-NOM get-disappointed-3Sg Mary-INESS<br />

’Peter gets disappointed in Mary.’<br />

Pre-verbs have two rank parameters, both recessive. In neutral sentences like (6a)<br />

<strong>the</strong> pre-verb el ’away’ must precede indul ’starts going’, given by a strong (r2b) rank<br />

parameter. The emphasis is on <strong>the</strong> pre-verb, and <strong>the</strong> verb has no emphasis, so practically<br />

<strong>the</strong>y form one phonological word. In o<strong>the</strong>r cases, like in (6b), <strong>the</strong> pre-verb may follow <strong>the</strong><br />

verb by a weaker (r3a) rank parameter. This time <strong>the</strong>y are separate phonological words.<br />

(6) a. Péter elindul horgászni.<br />

Peter-NOM away+go3Sg fish-INF<br />

’Peter goes fishing.’<br />

b. Péter ’horgászni indul el. / ’Péter indul el horgászni.<br />

Peter-NOM fish-INF go-3Sg away / Peter-NOM go-3Sg away fish-INF<br />

’Why Peter goes away is that he will fish.’ / ’It is Peter who goes fishing.’<br />

Sometimes a certain argument gives aspect. For example, <strong>the</strong> verb lakik ’live’ has an<br />

argument for ’PLACE’, and it is in <strong>the</strong> preceding position with a strong (r2b) rank (7a),<br />

or in <strong>the</strong> following position with a weaker (r3a) rank (7b) 3 .<br />

(7) a. Péter Budapesten lakik.<br />

Peter-NOM Budapest-SUPERESS live-3Sg<br />

’Peter lives in Budapest.’<br />

b. *Péter lakik Budapesten / ’Péter lakik Budapesten.<br />

Peter-NOM live-3Sg Budapest-SUPERESS<br />

’*Peter lives in Budapest.’ / ’It is Peter who lives in Budapest.’<br />

There are even more special cases when a verb having a pre-verb still gets <strong>the</strong> aspect<br />

from ano<strong>the</strong>r argument.<br />

2 Pre-verbs in Hungarian are considered as complements (as well as in o<strong>the</strong>r <strong>the</strong>ories), because <strong>the</strong>y are<br />

separate words. It is a matter <strong>of</strong> orthography that if <strong>the</strong> preverb preceeds <strong>the</strong> verb immediately <strong>the</strong>y should<br />

be joint.<br />

3 In <strong>the</strong> examples apostrophe means strong emphasis. Besides word order, this denotes focus in a Hungarian<br />

sentence.<br />

179


(8) a. Péter Budapesten szállt meg.<br />

Peter-NOM Budapest-SUPERESS stay-Past3Sg Perf<br />

’Peter stayed in Budapest.’<br />

b. *Péter megszállt Budapesten / Péter ’megszállt Budapesten.<br />

Peter-NOM Perf+stay-Past3Sg Budapest-SUPERESS<br />

’*Peter stayed in Budapest.’ / ’What Peter did in Budapest was that he stayed <strong>the</strong>re.’<br />

As we can see in (8b), <strong>the</strong> first sentence without emphasis is non-grammatical. The<br />

second variant is grammatical, but not neutral in any cases: a focus throws <strong>the</strong> locative<br />

back, so only <strong>the</strong> weaker requirement can be satisfied (see fur<strong>the</strong>r in <strong>the</strong> next two sections).<br />

The aspect-giving argument has to be stored with two rank parameters in every case.<br />

5 Focus in Hungarian<br />

Focus in Hungarian can be noticed by emphasis and word order (Kiss, 2000). In <strong>the</strong><br />

following examples (9a) is a neutral sentence and (9b-c) are variants with a focus pointing<br />

on different complements <strong>of</strong> <strong>the</strong> verb.<br />

(9) a. Mari süteményt süt Péternek.<br />

Mary-NOM cookie-ACC bake-3Sg Peter-DAT<br />

’Mary is baking cookies for Peter.’<br />

b. Mari ’Péternek süt süteményt.<br />

’It is Peter for whom Mary is baking cookies.’<br />

c. Mari ’süteményt süt Péternek (és nem kenyeret).<br />

’Those are cookies (and not bread) what Mary is baking for Peter.’<br />

In our solution focus is a separate lexical item 4 , because it influences o<strong>the</strong>r elements<br />

in <strong>the</strong> sentence by its own requirements. It searches for two o<strong>the</strong>r elements: <strong>the</strong> focused<br />

element and a verb. Focus gives <strong>the</strong> verb a strong dominant rank parameter to be in <strong>the</strong><br />

following position (d6a). 5<br />

In <strong>the</strong> previous section we claimed that <strong>the</strong> aspect-giving argument (mostly a pre-verb)<br />

has to be stored with two rank parameters. In neutral sentences (as in (6a)) <strong>the</strong> stronger<br />

(r2b) rank parameter is satisfied. But when a focus comes (see (6b)), <strong>the</strong> requirement <strong>of</strong><br />

<strong>the</strong> pre-verb cannot be satisfied. The weaker (r3a) requirement is still <strong>the</strong>re, and it can be<br />

satisfied.<br />

6 Processing<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Search rolls from <strong>the</strong> finite verb. Those elements, which turn out to be not required by <strong>the</strong><br />

verb or any <strong>of</strong> its complements (adjuncts mostly), are legitimate if <strong>the</strong>y find an element to<br />

attach to.<br />

The first step <strong>of</strong> <strong>the</strong> process is to check dominant rank parameters. In Figure 2. <strong>the</strong> focused<br />

element tortát ’cake-ACC’ directly preceds <strong>the</strong> verb hozott ’bring-Past3Sg’.) Then<br />

all conflicting requirements are deleted:<br />

4 Although it is phonetically null in Hungarian, in some languages it is a morpheme (eg. eskimo,<br />

quechua, tamil). This explains why we consider it as a separate lexical item.<br />

5 Progressive form <strong>of</strong> telic situations may work <strong>the</strong> same.<br />

180


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 2: Processing.<br />

1. Ranks applying to <strong>the</strong> same element from <strong>the</strong> same element (In Figure 2. r3a between<br />

<strong>the</strong> pre-verb be ’in’ and <strong>the</strong> verb, only r7b remains);<br />

2. All o<strong>the</strong>r ranks between <strong>the</strong> two elements (r7a from <strong>the</strong> verb to <strong>the</strong> focused tortát,<br />

and r7c from tortát to <strong>the</strong> verb);<br />

3. Ranks applying to ano<strong>the</strong>r element with a reverse direction (r7b rank <strong>of</strong> <strong>the</strong> verb to<br />

<strong>the</strong> subject Péter ’Peter-NOM’ changes 6 to r7c, because subject could be anywhere<br />

around <strong>the</strong> verb if <strong>the</strong>re is a focus);<br />

4. The dominant rank parameter wins if <strong>the</strong>re are two conflicting requirements <strong>of</strong> <strong>the</strong><br />

same element (between Péter ’Peter-NOM’ and <strong>the</strong> verb hozott ’bring-Past3Sg’<br />

<strong>the</strong>re is r7c and d7a, due to <strong>the</strong> focus <strong>the</strong> former one remains in this sentence, but<br />

in a neutral sencence d7a applies.)<br />

The next step is to check recessive rank parameters: ei<strong>the</strong>r two elements are neighbours<br />

directly or <strong>the</strong>re is ano<strong>the</strong>r element between <strong>the</strong>m which is required with a stronger rank<br />

parameter (this may bring adjoining elements). In <strong>the</strong> example it goes as follows:<br />

1. egy tortát ’a cake-ACC’, a szobába ’<strong>the</strong> room-ILLAT’, hozott be ’bring-Past3Sg in’<br />

are neighbours directly;<br />

2. in be a szobába ’in <strong>the</strong> room-ILLAT’ <strong>the</strong> definite article is in between, but it has a<br />

stronger rank parameter (r5a against r7c);<br />

3. Péter and hozott has egy tortát in between due to <strong>the</strong> 6 strength rank parameter by<br />

<strong>the</strong> focus.)<br />

In our system, contrary to phrase-ctructure grammars, any element can be focused.<br />

Sometimes <strong>the</strong> verb does not succeed <strong>the</strong> focused element immediately. An adjoining<br />

word may follow it which wedges itself in by a stronger rank parameter, like (10) shows:<br />

(10) a. Péter egy lánnyal találkozott.<br />

Peter-NOM a girl-INS meet-Past3Sg<br />

’Peter met a girl.’<br />

6 Practically its direction is deleted, see section 2.<br />

181


. Péter egy ’okos lánnyal találkozott.<br />

Peter-NOM a clever girl-INS meet-Past3Sg<br />

’It was a clever girl whom Peter met.’<br />

c. Péter ’két okos lánnyal találkozott.<br />

Peter-NOM two clever girl-INS meet-Past3Sg<br />

’It was two clever girls whom Peter met.’<br />

(11) a. Péter olvasott egy verset Adytól.<br />

Peter-NOM read-Past3Sg a poem-ACC Ady-ABL<br />

’Peter read a poem by Ady.’<br />

b. *Péter egy ’verset Adytól olvasott.<br />

Peter-NOM a poem-ACC Ady-ABL read-Past3Sg<br />

In (11) <strong>the</strong> focused element (verset ’poem-ACC’) has a complement (Adytól ’Ady-<br />

ABL’), but complements are required with a 7 strength rank parameter and it is weaker<br />

than <strong>the</strong> 6 strength rank parameter between <strong>the</strong> focus and <strong>the</strong> verb.<br />

7 Conclusion<br />

We are working on <strong>the</strong> implementation <strong>of</strong> this system in which predicate-argument and<br />

head-complement relations, adjuncts and word order are all handled in <strong>the</strong> lexicon. Rank<br />

parameters account for word order variations in a language, and for o<strong>the</strong>r phenomena like<br />

scrambling (this shows clear differences between languages) or focus and progressive<br />

(which are sometimes invisible). The next step will be a semantic component, because<br />

we believe that intelligent applications can be made only on real linguistic basis which<br />

requires fine semantics.<br />

Acknowledgements<br />

I am grateful to <strong>the</strong> Hungarian National Scientific Research Fund (OTKA K60595) for<br />

<strong>the</strong>ir contribution to my costs.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Alberti, G. (1999). GASG: The grammar <strong>of</strong> total lexicalism, Working Papers in <strong>the</strong> Theory<br />

<strong>of</strong> Grammar 6(1). Theoretical Linguistics Programme, Budapest University and<br />

Research Institute for Linguistics, Hungarian Academy <strong>of</strong> Sciences.<br />

Alberti, G. (2004). Climbing for aspect with no rucksack, in K. É. Kiss and H. van<br />

Riemsdijk (eds), Verb Clusters; A study <strong>of</strong> Hungarian, German and Dutch, Linguistics<br />

Today 69, John Benjamins, Amsterdam:Philadelphia, pp. 253–289.<br />

Alberti, G. (2005). ReALIS. Doctoral dissertation at Hungarian Academy <strong>of</strong> Sciences,<br />

ms. HAS Research Institute for Linguistics and University <strong>of</strong> Pécs.<br />

URL: http://lingua.btk.pte.hu/gelexi.asp<br />

Archangeli, D. and Langendoen, T. D. (eds) (1997). Optimality Theory: an Overview,<br />

Blackwell, Oxford.<br />

182


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Asher, N. and Lascarides, A. (2003). Logics <strong>of</strong> Conversation, Cambridge University<br />

Press, Cambridge.<br />

Halassy, B. (1994). Az adatbázis-tervezés alapjai és titkai [Basics and Secrets <strong>of</strong> Designing<br />

a Database], IDG, Budapest.<br />

Kamp, H., van Genabith, J. and Reyle, U. (2004). Discourse representation <strong>the</strong>ory. ms.<br />

to appear in Handbook <strong>of</strong> Philosophical Logic.<br />

URL: http://www.ims.uni-stuttgart.de/∼hans<br />

Karttunen, L. (1986). Radical lexicalism, Report No. CSLI-86-68, CSLI Publications.<br />

Kiss, K. É. (2000). Az egyszerü mondat szerkezete [<strong>the</strong> Structure <strong>of</strong> <strong>the</strong> Simple Sentence],<br />

in F. Kiefer (ed.), Strukturális magyar nyelvtan I. Mondattan [Structural Hungarian<br />

Grammar Vol. 1 Syntax], Vol. 7., Akadémiai Kiadó, Budapest, pp. 79–177.<br />

Schneider, G. (2005). A broad-coverage, representationally minimal LFG parser: Chunks<br />

and f-structures are sufficient, in M. Butt and T. H. King (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />

LFG05 Conference, CSLI Publications, University <strong>of</strong> Bergen, pp. 388–407.<br />

183


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

184


EXPRESSING CONJUNCTIVE AND AGGREGATE QUERIES OVER<br />

ONTOLOGIES WITH CONTROLLED ENGLISH<br />

Camilo Thorne<br />

Free University <strong>of</strong> Bozen-Bolzano<br />

Abstract. We propose to characterize <strong>the</strong> computational complexity <strong>of</strong> answering questions<br />

in ontology-mediated controlled language interfaces to structured data sources by expressing<br />

ontology-based data access in controlled English. This means: compositionally mapping a<br />

controlled subset <strong>of</strong> English into knowledge bases and formal queries for which <strong>the</strong> computational<br />

complexity <strong>of</strong> ontology-based data access is known. In <strong>the</strong> present paper, we extend<br />

this approach to conjunctive queries and to conjunctive queries with aggregation functions.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Lately, <strong>the</strong>re has been a renewed interest within <strong>the</strong> computational linguistics community<br />

(Minock, 2005; Lesmo and Robaldo, 2007) in natural language interfaces to databases<br />

(NLIDBs), where what is aimed at is managing, with natural language (NL), relational<br />

databases (DBs). In particular, robust interfaces supporting controlled fragments (CLs)<br />

<strong>of</strong> English and based on ontologies, computational semantics and deep semantic parsing<br />

have been developed, by, for instance, <strong>the</strong> Attempto project (Bernstein et al., 2003;<br />

Fuchs et al., 2005). Controlled languages are fragments <strong>of</strong> NL tailored to fit data management<br />

tasks by, typically, constraining <strong>the</strong>ir restricted vocabulary (and syntax), <strong>the</strong>reby<br />

stripping <strong>the</strong>m from ambiguity, whe<strong>the</strong>r structural or semantic. Controlled languages allow<br />

a trade-<strong>of</strong>f between <strong>the</strong> coverage and <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> translation <strong>of</strong> questions into<br />

formal queries. Ontologies (<strong>the</strong> conceptualizations <strong>of</strong> <strong>the</strong> domain) play <strong>the</strong> intermediate<br />

role between <strong>the</strong> CL’s vocabulary and <strong>the</strong> domain terminology.<br />

However, some important issues regarding controlled English interfaces have not been,<br />

to <strong>the</strong> best <strong>of</strong> our knowledge, fully adressed. One <strong>of</strong> <strong>the</strong>m is <strong>the</strong> tractability and untractability<br />

<strong>of</strong> processing CL information requests and utterances, viz., how difficult is<br />

declaring and accessing structured data with a controlled English interface? And by difficult,<br />

we mean its computational complexity. We believe that a way <strong>of</strong> adressing this issue<br />

consists in expressing ontology based data access with CLs. By this we mean designing<br />

declarative and interrogative controlled subsets <strong>of</strong> English that compositionally map<br />

through a semantic mapping �.� (taken from NL formal semantics) into formal queries,<br />

ontologies and database facts, <strong>the</strong>ir meaning representations (MRs). Ontology based data<br />

access provides <strong>the</strong> logical underpinning <strong>of</strong> accessing structured data w.r.t. ontologies and<br />

its computational complexity, a measure <strong>of</strong> how difficult a task it might be.<br />

The main purpose <strong>of</strong> this paper is tw<strong>of</strong>old. On <strong>the</strong> one hand, we will say what means to<br />

express in CL ontology based data access. On <strong>the</strong> o<strong>the</strong>r hand, we will proceed to express<br />

in controlled English a class <strong>of</strong> formal queries known as conjunctive queries. Conjunctive<br />

queries are good in that with <strong>the</strong>m we reach an optimal computational complexity. Last,<br />

but not least, we will extend our controlled language to cover aggregate queries, which<br />

are conjunctive queries to which <strong>the</strong> basic SQL aggregation functions, COUNT, MIN, MAX<br />

and SUM, have been added.<br />

185


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

2 Ontology Based Data Access<br />

Accessing and declaring data w.r.t. an ontology or conceptualization can be characterized<br />

in terms <strong>of</strong> formal logic as follows (Rosati, 2007). A relational query q <strong>of</strong> arity n is a<br />

formal expression q(x) ← Qyβ(x, y), where q(x) is <strong>the</strong> head and x denotes a sequence<br />

<strong>of</strong> n variables, <strong>the</strong> query’s distinguished variables, and Qyβ(x, y) is <strong>the</strong> body, a first order<br />

logic (FOL) quantified boolean combination <strong>of</strong> relational atoms where <strong>the</strong> distinguished<br />

variables occur free and <strong>the</strong> o<strong>the</strong>rs (<strong>the</strong> sequence y) bound to a quantifier. Qy denotes<br />

<strong>the</strong> sequence <strong>of</strong> its quantifier prefixes. When no confusion arises, we shall abbreviate<br />

Qyβ(x, y) with Φ[x]. A query is said to be boolean if its arity is n = 0. A collection<br />

<strong>of</strong> such queries is called a query language. A relational database (DB) D is a finite set<br />

<strong>of</strong> ground atoms over a schema R := {R1, ..., Rn}, where, for i ∈ [1, n], Ri is a relation<br />

symbol <strong>of</strong> arity m ≥ 1, and over a countably infinite domain Dom <strong>of</strong> constants. The<br />

active domain adom(D) <strong>of</strong> D is <strong>the</strong> set <strong>of</strong> constants that occur in D (a finite subset <strong>of</strong><br />

Dom). An ontology O is a set <strong>of</strong> FOL axioms that make explicit a certain number <strong>of</strong><br />

constraints holding over a domain. They are typically defined over some fragment <strong>of</strong><br />

FOL called an ontology language. This language should be rich enough to express DBs<br />

(i.e., DB atoms). The pair 〈O, D〉 is called a knowledge base (KB), and can be seen as a<br />

FOL logical <strong>the</strong>ory: a set <strong>of</strong> ground atoms (<strong>the</strong> DB) plus a set <strong>of</strong> axioms (<strong>the</strong> ontology).<br />

A gound substitution is a function σ(.) from V ar(q), <strong>the</strong> set <strong>of</strong> variables <strong>of</strong> q, into Dom.<br />

They are extended to sequences <strong>of</strong> variables in <strong>the</strong> standard way. KBs and substitutons<br />

give rise to <strong>the</strong> certain answers semantics <strong>of</strong> query q <strong>of</strong> arity n over a KB 〈O, D〉, denoted<br />

q(〈O, D〉). It consists in collecting <strong>the</strong> values in adom(D) <strong>of</strong> all <strong>the</strong> ground substitutions<br />

σ(.) for which 〈O, D〉 logically entails qσ, where qσ denotes <strong>the</strong> grounding <strong>of</strong> q by σ(.).<br />

Formally, q(〈O, D〉) := {σ(x) ∈ adom(D) n | σ s.t. 〈O, D〉 |= qσ}. To investigate its<br />

computational complexity we must look at <strong>the</strong> associated recognition problem:<br />

Definition 1. (QA) The KB query answering (QA) decision problem is <strong>the</strong> FOL entailment<br />

problem stated as follows: given a KB 〈O, D〉, a sequence c ∈ Dom n <strong>of</strong> n constants, a<br />

CQ q <strong>of</strong> arity n and distinguished variables x, check if <strong>the</strong>re exists a ground substitution<br />

σ(.) s.t. σ(x) = c and 〈O, D〉 |= qσ holds, where qσ is <strong>the</strong> grounding <strong>of</strong> q by σ(.).<br />

When we focus on #(adom(D)) (<strong>the</strong> number <strong>of</strong> constants <strong>of</strong> D) while considering<br />

constant both size(q) (<strong>the</strong> number <strong>of</strong> symbols <strong>of</strong> <strong>the</strong> query) and #(O) (<strong>the</strong> number <strong>of</strong><br />

axioms), we speak, in a manner set by (Vardi, 1982), <strong>of</strong> <strong>the</strong> data complexity <strong>of</strong> QA. Such<br />

complexity will depend on <strong>the</strong> query language and <strong>the</strong> ontology language chosen (Rosati,<br />

2007).<br />

The certain answers semantics can provide a formal semantics for ontology mediated<br />

CL data access interfaces and QA’s data complexity both a measure <strong>of</strong> <strong>the</strong>ir difficulty and<br />

a criterion for optimality. To implement this strategy we need, we believe, to go through<br />

two stages: (i) We need to choose an ontology language and a query language for which<br />

<strong>the</strong> computational complexity <strong>of</strong> QA is known and for which data complexity is optimal.<br />

(ii) We need to express with controlled English QA.<br />

3 Expressing QA with Controlled English<br />

A compositional translation �.�, as proposed and conceived by Montague in (Montague,<br />

1970) is a function that homomorphically maps a fragment <strong>of</strong> natural language (English<br />

186


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

in our case) into, basically, FOL augmented with <strong>the</strong> types, <strong>the</strong> lambda abstraction and <strong>the</strong><br />

function application constructs <strong>of</strong> <strong>the</strong> simply typed λ-calculus, a.k.a. λ-FOL. They assign<br />

to NL utterances a λ-FOL formula: its meaning representation (MR). The key feature <strong>of</strong><br />

compositional translations is that <strong>the</strong>y can be made to map declarative fragments <strong>of</strong> NL<br />

into ontology languages and interrogative fragments into query languages.<br />

Definition 2. (Expressing QA) Given an ontology language L and a query language Q,<br />

expressing QA in controlled English consists in: (i) Defining a grammar G and a compositional<br />

translation �.� for a controlled declarative fragment L(G) s.t. �.� maps L(G)<br />

into L. (ii) Defining a grammar G ′ and a compositional translation �.� for a controlled<br />

interrogative fragment L(G ′ ) s.t. �.� maps L(G ′ ) into Q.<br />

We have dealt elsewhere with <strong>the</strong> problem <strong>of</strong> expressing KBs and ontology languages<br />

by expressing, in particular, <strong>the</strong> DL-LiteR,⊓ ontology language or logic and, in general,<br />

<strong>the</strong> DL-Lite family <strong>of</strong> DLs (Calvanese, De Giacomo, Lembo, Lenzerini and Rosati,<br />

2007). Description logics (DLs) are knowledge representation logics that conceptually<br />

model a domain in terms <strong>of</strong> classes, roles (binary relations among classes) and inheritance<br />

relations between classes and roles. In (Bernardi, Calvanese and Thorne, 2007; Thorne,<br />

2007) we define a declarative CL, Lite-English, a compositional translation �.� and<br />

show that:<br />

Theorem 1. (Bernardi et al., 2007) For every sentence S in <strong>the</strong> CL Lite-English,<br />

<strong>the</strong>re exists a DL-LiteR,⊓ assertion α s.t. �S� = α. Conversely, every DL-LiteR,⊓<br />

assertion α is <strong>the</strong> image by �.� <strong>of</strong> some sentence S in Lite-English.<br />

To get <strong>the</strong> whole picture we need to look now at query languages. It turns out to be that<br />

QA for DL-LiteR,⊓ is optimal w.r.t. data complexity, falling under LOGSPACE (actually,<br />

AC 0 ), a minimal complexity class, when we choose as query language <strong>the</strong> class <strong>of</strong><br />

relational queries known as ruled-based conjunctive queries (CQs). Conjunctive queries<br />

are queries over a schema R whose body is a conjunction <strong>of</strong> existentially quantified relational<br />

atoms. Expressing query languages w.r.t. which QA’s computational complexity is<br />

optimal can shed light on <strong>the</strong> conditions under which <strong>the</strong> task <strong>of</strong> accessing data w.r.t. an<br />

ontology with CL might be a relatively easy task.<br />

4 Expressing Conjunctive Queries<br />

In this section we will show how to express graph-shaped simple conjunctive queries, a<br />

subclass <strong>of</strong> <strong>the</strong> class <strong>of</strong> CQs, for which QA is optimal too. A typical boolean graph-shaped<br />

query over, say, <strong>the</strong> constant Mary and <strong>the</strong> binary predicates loves and hates is<br />

(1) q() ← ∃x∃y(loves(Mary, x) ∧ hates(x, y))<br />

which we would like to express through <strong>the</strong> CL Y/N-question<br />

(2) Does Mary love somebody who hates somebody?<br />

And a typical non-boolean graph-shaped query over <strong>the</strong> same set <strong>of</strong> relational symbols<br />

(i.e., <strong>the</strong> schema {loves, hates}) is<br />

(3) q(x) ← ∃y(loves(x, y) ∧ hates(x, y))<br />

187


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(Lexical rule) (Value <strong>of</strong> �.� on word and category)<br />

Det → some λP.λQ.∃x(P (x) ∧ Q(x)): (e → t) → ((e → t) → (e → t))<br />

Proi → somebody λP.∃xP (x): (e → t) → t<br />

Pro −<br />

i → anybody<br />

Coord → and<br />

λP.∃xP (x): (e → t) → t<br />

λP.λQ.∃x(P (x) ∧ Q(x)): (e → t) → ((e → t) → (e → t))<br />

Relproi → who<br />

Proi → him<br />

Proi → himself<br />

Intpro → which<br />

λP.λx.P (x): (e → t) → (e → t)<br />

λP.P (x): (e → t) → t<br />

λP.P (x): (e → t) → t<br />

λP.λQ.λx.P (x) ∧ Q(x): (e → t) → (e → t)<br />

Intproi → whoi<br />

NPgapi → ɛ<br />

λP.λx.P (x): (e → t) → (e → t)<br />

λP.P (x): (e → t) → t<br />

Ni → man,... λx.man(x): e → t,...<br />

IVi → runs,... λx.run(x): e → t,...<br />

IV −<br />

i → run,...<br />

TVi,j → loves,...<br />

TV<br />

λx.run(x): e → t,...<br />

λα.λx.α(λy.loves(x, y)): ((e → t) → t) → (e → t),...<br />

−<br />

i,j → love,...<br />

TV<br />

λα.λx.α(λy.loves(x, y)): ((e → t) → t) → (e → t),...<br />

p<br />

i,j → loved,...<br />

Adji → mortal,...<br />

Pni → Mary,...<br />

λα.λx.α(λy.loves(x, y)): ((e → t) → t) → (e → t),...<br />

λx.mortal(x): e → t,...<br />

λP.P (Mary): (e → t) → t,...<br />

Table 1: Lexical rules for GCQ-English.<br />

which we would like to express through <strong>the</strong> CL Wh-question (containing an anaphoric<br />

pronoun)<br />

(4) Who loves somebody who hates him?<br />

Definition 3. (GCQs) A non-boolean graph-shaped simple conjunctive query (GCQ) <strong>of</strong><br />

arity ≤ 1 is a CQ over a schema R composed <strong>of</strong> relation symbols <strong>of</strong> arity ≤ 2 <strong>of</strong> <strong>the</strong> form<br />

q := q(x) ← Φ[x] where <strong>the</strong> body Φ[x] is inductively defined as:<br />

Φ[x] := Ai0 (x) ∧ ... ∧ Aim(x) ∧ Rj0 (x, x) ∧ ... ∧ Rjm(x, x) ∧ Rj0 (x, c) ∧ Rjm(x, c).<br />

Φ[x] := Φ ′ [x] ∧ ∃y(Ai0 (x) ∧ ... ∧ Aim(x) ∧ Rj0 (x, y) ∧ ... ∧ Rjm(x, y) ∧ Rj0 (y, x)∧<br />

∧Rjm(y, x) ∧ Φ ′′ [y]).<br />

Note that we allow in this definition for empty sequences <strong>of</strong> conjuncts, e.g., |Ai0(x)∧...∧<br />

Aim(x)| ≥ 0 (where |.| is <strong>the</strong> function that returns <strong>the</strong> number <strong>of</strong> predicates in <strong>the</strong> body<br />

<strong>of</strong> a relational query). A boolean GCQ is a query <strong>of</strong> <strong>the</strong> form q := q() ← ∃yΦ[y], where<br />

Φ[y] is <strong>the</strong> body <strong>of</strong> a non-boolean GCQ.<br />

4.1 Expressing Conjunctive Queries with GCQ-English<br />

GCQs are captured by <strong>the</strong> interrogative CL GCQ-English. Questions in GCQ-English<br />

fall under two main classes : (i) Wh-questions, that will map into non-boolean GCQs and<br />

(ii) Y/N-questions, that will map into boolean GCQs. For simplicity, we assume grammars<br />

to be phrase structure grammars augmented with semantic actions. Phrase structure<br />

grammars are composed <strong>of</strong> two sets <strong>of</strong> rewriting rules: lexical rules (a.k.a. lexicons)<br />

and phrase-structure rules. Table 2 shows <strong>the</strong> phrase-structure rules <strong>of</strong> GCQ-English’s<br />

grammar and Table 1 its lexicon. Moreover, <strong>the</strong> latter is divided into two sets: a closed<br />

set <strong>of</strong> function word rules, that express (at <strong>the</strong> semantical level) logical operations and<br />

connectives, and an open set <strong>of</strong> content word rules (nouns, adjectives, verbs), a feature we<br />

convey through dots.<br />

188


(Rule) (Semantic Action)<br />

Qwh → Intpro Ni Sgapi ? �Qwh� := �Intpro�(�Ni�)(�Sgapi�) Qwh → Intproi Sgapi ? �Qwh� := �Intproi�(�Sgapi ?�)<br />

QY /N → does NP −<br />

i VP−i<br />

? �QY /N � := �NP −<br />

i �(�VP−i<br />

�)<br />

QY /N → is NPi VPi? �QY /N � := �NPi�(�VPi�)<br />

Sgap i → NPgap i VPi<br />

�Sgap i � := �NPgap i �(�VPi�)<br />

VPi → VPi Coord VPi �VP� := �Coord�(�VP�)(�VP�)<br />

VP −<br />

i → VP−i<br />

Coord VP−i<br />

�VP −<br />

i � := �Coord�(�VPi�)(�VP −<br />

i �)<br />

VPi → TVi,j NPj<br />

VPi → is Adji VPi → is a Ni<br />

VP<br />

�VPi� := �TVi,j�(�NPj�)<br />

�VPi� := �Adji� �VPi� := �Ni�<br />

−<br />

i → IV−i<br />

�VP −<br />

i � := �IV−i<br />

�<br />

VPi → IVi<br />

VP<br />

�VPi� := �IVi�<br />

−<br />

i → TV−i,j<br />

NPj �VP−i<br />

� := �TV−i,j�(�NPj�)<br />

VPi → VP p<br />

i<br />

�VPi� := �VP p<br />

i �<br />

VP p<br />

i → TVpi,j<br />

NPj �VPpi<br />

� := �TVpi,j�(�NPj�)<br />

NP −<br />

NPi → Proi<br />

NPi → Det Ni<br />

NPi → Pni<br />

NPi → Proi<br />

i → Det− Ni �NP −<br />

Ni → Adj Ni<br />

Ni → Ni Relpro i Sgap i<br />

i � := �Det−�(�Ni�) �NPi� := �Proi�<br />

�NPi� := �Det�(�N�)<br />

�NPi� := �Pni�<br />

�NPi� := �Proi�<br />

�Ni� := �Adj�(�Ni�)<br />

�Ni� := �Relproi�(�Ni�)(�Sgapi�)) Table 2: Phrase structure rules for GCQ-English.<br />

The empty expression ɛ is what in linguistic <strong>the</strong>ory is called a trace, a placeholder for<br />

<strong>the</strong> antecedent <strong>of</strong> <strong>the</strong> relative pronoun. Symbols occurring in <strong>the</strong> phrase-structure rewriting<br />

rules are called components and represent <strong>the</strong> syntactic chunks into which sentences<br />

can be analysed. Symbols that rewrite into words, that is, symbols in <strong>the</strong> lexicon, are<br />

called categories or terminal components and represent parts <strong>of</strong> speech, that is, verbs,<br />

common and proper nouns, pronouns, adjectives, etc. Some basic morpho-syntactic and<br />

semantic features are attached to (some) components. The feature . − means that <strong>the</strong> component<br />

is <strong>of</strong> negative polarity, <strong>the</strong> feature . p , associated to verbs and verb phrase components,<br />

indicates that such component is to be inflected in <strong>the</strong> passive voice. Absence <strong>of</strong><br />

features indicates that components are in positive polarity and verbs and verb phrases in<br />

<strong>the</strong> active voice. Fur<strong>the</strong>rmore, indexes are assigned to components following <strong>the</strong> standard<br />

set by (Pratt, 2001) to: (i) Resolve intrasentential anaphora: anaphoric pronouns (”him”,<br />

”himself”) resolve with <strong>the</strong>ir nearest (antecedent) head noun. (ii) Indicate gap-filler dependencies.<br />

For simplicity, verbs are in 3rd person singular and in present tense.<br />

A quick glace at <strong>the</strong> grammar rules <strong>of</strong> GCQ-English will convince <strong>the</strong> reader that,<br />

for instance, <strong>the</strong> (English) question<br />

(5) Does John love Mary?<br />

and <strong>the</strong> question<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(6) Which man is mortal and loves somebody who hates him?<br />

lie within GCQ-English. By <strong>the</strong> same token, it is easy to see that <strong>the</strong> question<br />

(7) *Which teacher gives a lesson to his pupils?<br />

189


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

lies outside this CL. Why? Because we have no possesive adjectives (e.g., ”his”) and no<br />

ditransitive verbs (e.g., ”gives”).<br />

Semantic actions mean that we define <strong>the</strong> translation �.� by recursion over <strong>the</strong> syntactic<br />

components <strong>of</strong> GCQ-English in such a way that <strong>the</strong> application <strong>of</strong> each grammar rule,<br />

lexical or o<strong>the</strong>rwise, ”triggers” �.� (Jurafsky and Martin, 2000). The intermediate values<br />

<strong>of</strong> this function are called partial MRs. When we reach in a Wh-question <strong>the</strong> Qwh component<br />

�.� will map <strong>the</strong> λ-FOL expression obtained, <strong>of</strong> <strong>the</strong> form �Qwh� = λx.Φ[x]: e → t,<br />

into <strong>the</strong> GCQ q(x) ← Φ[x], where Φ[x] denotes a conjunction <strong>of</strong> existentially quantified<br />

atoms where variable x occurs free. In <strong>the</strong> case <strong>of</strong> a Y/N-question, <strong>the</strong> λ-FOL<br />

�QY/N� = Φ: t will be mapped into <strong>the</strong> boolean GCQ q() ← Φ, where Φ stands for a<br />

conjunction <strong>of</strong> existentially quantified atoms with no free variables. Types ensure that �.�<br />

always terminates. We can compute, given a GCQ-English question Q, �Q� as follows:<br />

(i) We compute <strong>the</strong> parse tree <strong>of</strong> Q. (ii) We compute �Q� bottom-up, from leaves to root,<br />

as in Figure 1. We start by assigning a λ-expression to <strong>the</strong> leaves. Then, at each internal<br />

node, we unify types and compute <strong>the</strong> λ-application and <strong>the</strong> β-reduction <strong>of</strong> its siblings.<br />

We omit types for reasons <strong>of</strong> space. In <strong>the</strong> end we obtain, at <strong>the</strong> root <strong>of</strong> <strong>the</strong> tree, a GCQ.<br />

The circle delimits an island; <strong>the</strong> dotted line, a gap-filler dependency forced upon by <strong>the</strong><br />

use <strong>of</strong> <strong>the</strong> pronoun.<br />

Figure 1: Translating ”Who loves Mary?”.<br />

Lemma 1. (Expressing GCQs) For every question Q in GCQ-English, <strong>the</strong>re exists a<br />

GCQ q s.t. �Q� = q. Conversely, every GCQ q is <strong>the</strong> image by �.� <strong>of</strong> some question Q in<br />

GCQ-English.<br />

Pro<strong>of</strong>. (Sketch) We prove each implication separately:<br />

(⇒) We need to show that for every Wh-question Q in GCQ-English <strong>the</strong>re exists a<br />

GCQ q <strong>of</strong> distinguished variable x and body Φ[x] s.t. �Q� = q(x) ← Φ[x]. Given<br />

that <strong>the</strong> only recursive components in GCQ-English’s grammar are verb phrases<br />

(VPs) and nominals (Ns), this can be proved by an easy simulatenous induction on<br />

Ns and VPs any by discarding all possible parse states where components do not<br />

satisfy co-indexing, polarity and voice constraints. For Y/N-questions we reason<br />

analogously.<br />

190


(⇐) We will prove, by induction on <strong>the</strong> body Φ[x] <strong>of</strong> a non-boolean GCQ q <strong>of</strong> distinguished<br />

variable x, that we can construct a question Q s.t. that q is <strong>the</strong> image <strong>of</strong><br />

Q by �.�. The result will <strong>the</strong>n follow both for boolean and non-boolean GCQs. Recall<br />

that Ns translate into unary predicates, TVs into binary predicates and Pns into<br />

constants:<br />

– (Basis) q(x) ← Φ[x] is <strong>the</strong> image <strong>of</strong> <strong>the</strong> question ”which Ai0 who is a Ai1 who<br />

. . . who is a Aim Rj0s himself and . . . and Rjms himself and Rj0s c and . . . and<br />

Rjms c and is Rj0d by c and . . . and is Rjmd by c?”.<br />

– (Inductive step) q(x) ← Φ[x] is <strong>the</strong> image <strong>of</strong> <strong>the</strong> question ”which Φ ′ [x] Rj0s<br />

and Rj1s and . . . and Rjms some Ai0 who is a Ai1 and who is a Ai2 and . . . and<br />

who is a Aim and who Rj0s him and . . . and who Rjms him and who Φ ′′ [y]?”,<br />

by induction hypo<strong>the</strong>sis on Φ ′ [x] and Φ ′′ [y]. ✷<br />

Theorem 2. (Expressing QA) The QA problem for Lite-English and GCQ-English<br />

falls under in LOGSPACE w.r.t. data complexity.<br />

Pro<strong>of</strong>. It follows immediately from Theorem 1 and Lemma 1. ✷<br />

5 Expressing Aggregate Queries<br />

The question we now need to answer is: how can we expand <strong>the</strong> coverage <strong>of</strong> our CL<br />

without compromising <strong>the</strong> tractability <strong>of</strong> QA? In this section we propose to cover graphshaped<br />

aggregate queries, that is, GCQs augmented with (some <strong>of</strong>) <strong>the</strong> basic SQL aggregation<br />

functions, COUNT, MIN, MAX and SUM. These functions are defined on finite subsets<br />

<strong>of</strong> Dom ∪ Q, i.e., on DB domains plus <strong>the</strong> linearly ordered set <strong>of</strong> rational numbers and<br />

take values in Q, that is, <strong>the</strong>y compute a rational number. For <strong>the</strong> purposes <strong>of</strong> <strong>the</strong> current<br />

paper, we will restrict our analysis to only two <strong>of</strong> <strong>the</strong>m, namely MAX and MIN, although<br />

this analysis can be easily generalized to cover all <strong>of</strong> <strong>the</strong>se functions.<br />

Aggregates arise frequently in domains and systems containing numerical data, e.g.<br />

geographical domains and systems. One <strong>of</strong> <strong>the</strong>m, <strong>the</strong> GEOQUERY geography database<br />

system, comes with a NL interface that supports NL questions expressing such functions<br />

(Mooney, 2007). The corpus <strong>of</strong> <strong>the</strong>se questions showed that user questions did basically<br />

convey ei<strong>the</strong>r a CQ or a CQ with aggregation functions (see Table 3). Most importantly,<br />

CQs Aggregations Negation<br />

Questions 34.54% 65.35% 0.11%<br />

Table 3: Frequency <strong>of</strong> CQs in GEOQUERY.<br />

answering CQs (and a fortiori GCQs) with aggregation functions over DL-Lite ontologies<br />

is polynomial w.r.t. data complexity. So, how do <strong>the</strong>se queries look like and what<br />

kind <strong>of</strong> questions do we want to have in our CL? We would like to capture queries over<br />

unary predicates computing a maximum like<br />

(8) q(max(n)) ← height(n) ∧ odd(n)<br />

with a CL Wh-question like<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(9) Which is <strong>the</strong> greatest height that is odd?<br />

191


(Rule) (Semantic action)<br />

VPi,j → COP NPj �VPi,j� := �COP�(�NPj�)<br />

(Lexical rule) (Value <strong>of</strong> �.� on word and category)<br />

Det → <strong>the</strong> greatest λP.max(P ): (Q → t) → Q<br />

Det → <strong>the</strong> smallest λP.min(P ): (Q → t) → Q<br />

Det → some λP.λQ.∃n(P (n) ∧ Q(n)): (Q → t) → ((Q → t) → (Q → t))<br />

Proi → something λP.∃nP (n): (Q → t) → t<br />

Pro −<br />

→ anything λP.∃nP (n): (Q → t) → t<br />

i<br />

Proi → it λP.P (n): (Q → t) → t<br />

Proi → itself λP.P (n): (Q → t) → t<br />

Coord → and λP.λQ.∃n(P (n) ∧ Q(n)): (Q → t) → ((Q → t) → (Q → t))<br />

Relpro i → that λP.λn.P (n): (Q → t) → (Q → t)<br />

Intpro i → which λP.λn.P (n): (Q → t) → (Q → t)<br />

COPi,j → is λn.λm.n ≈ m: Q → (Q → t)<br />

NPgap i → ɛ λP.P (n): (Q → t) → t<br />

Ni → height,... λn.height(n): Q → t,...<br />

Adj → odd,... λn.odd(n): Q → t,...<br />

Or queries computing a sum<br />

Table 4: Grammar rules for AGCQ-English.<br />

(10) q(sum(n)) ← height(n) ∧ odd(n)<br />

with <strong>the</strong> question<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(11) Which is <strong>the</strong> sum <strong>of</strong> all heights that are odd?<br />

Definition 4. (AGCQs) A graph-shaped conjunctive aggregate query (AGCQ) over a relational<br />

schema R is a query <strong>of</strong> <strong>the</strong> form q(α(n)) ← Φ[n], where α ∈ {min, max}, n<br />

is q’s distinguished variable, a numerical variable, and Φ[n] is <strong>the</strong> body <strong>of</strong> a non boolean<br />

GCQ. Note that <strong>the</strong>re are no boolean AGCQs.<br />

5.1 Expressing Aggregate Queries with AGCQ-English<br />

To express AGCQs in CL we extend AGCQ-English into a new fragment <strong>of</strong> English<br />

called AGCQ-English as follows. Aggregation functions min and max are conveyed,<br />

in English, by, respectively, definite NPs like ”<strong>the</strong> smallest N” and ”<strong>the</strong> greatest N”, only<br />

this time <strong>the</strong>y must denote not a set <strong>of</strong> properties, but, instead, a numeric value. The<br />

symbol N stands for a nominal component that denotes sets <strong>of</strong> numerical values. The<br />

rest <strong>of</strong> <strong>the</strong> expression behaves in a manner similar to a determiner. We must thus start by<br />

enriching our set <strong>of</strong> primitive λ-FOL types from {e, t} into {e, t, Q} and allow for new<br />

determiners <strong>of</strong> type (Q → t) → Q.<br />

Definition 5. (Aggregate Determiners) An aggregate determiner is any <strong>of</strong> <strong>the</strong> following:<br />

(i) The determiner ”<strong>the</strong> greatest”, associated to max and <strong>of</strong> partial MR λP.max(P ): (Q →<br />

t) → Q. (ii) The determiner ”<strong>the</strong> smallest”, associated to <strong>the</strong> aggregation function min<br />

and <strong>of</strong> partial MR λP.min(P ): (Q → t) → Q.<br />

Once aggregate determiners have been introduced, <strong>the</strong>re are three steps left to finish<br />

covering aggregate queries with AGCQ-English. (i) We introduce a new interrogative<br />

192


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Figure 2: Translating ”Which is <strong>the</strong> greatest height?”.<br />

pronoun ”which” <strong>of</strong> semantics λP.λn.(P )n: (Q → t) → (Q → t), where P is a predicate<br />

symbol <strong>of</strong> type Q → t (<strong>the</strong> type <strong>of</strong> sets <strong>of</strong> numbers) and n a variable <strong>of</strong> type Q (<strong>the</strong><br />

type <strong>of</strong> numeric values). (ii) We introduce new entries for function words to take into<br />

account <strong>the</strong> new basic type Q. (iii) We introduce <strong>the</strong> identity predicate ”is” (<strong>of</strong> category<br />

COP, for copula) <strong>of</strong> semantics λn.λm.m ≈ n: Q → (Q → t). The reader can see in Table<br />

4 <strong>the</strong> (new) lexical rules that extend CL coverage to aggregations. The semantic mapping<br />

�.� is <strong>the</strong>n computed in <strong>the</strong> standard way over <strong>the</strong> parse tree <strong>of</strong> a AGCQ-English<br />

question, only it will now output, at <strong>the</strong> root <strong>of</strong> <strong>the</strong> tree a λ-FOL expression <strong>of</strong> <strong>the</strong> form<br />

λm.m ≈ α(λn.Φ[n]): Q → t that �.� will proceed to map into q(α(n)) ← Φ[n]. The<br />

reader can see a sample run <strong>of</strong> <strong>the</strong> procedure in Figure 2. Whence:<br />

Lemma 2. For every question Q in AGCQ-English, <strong>the</strong>re exists a AGCQ q s.t. �Q� = q.<br />

Conversely, every ATCQ q is <strong>the</strong> image by �.� <strong>of</strong> some question Q in AGCQ-English.<br />

Pro<strong>of</strong>. (Sketch) As before, <strong>the</strong> first implication is proved by simultaneous induction on<br />

<strong>the</strong> Ns and TVs <strong>of</strong> question Q. The second implication is proved by induction on <strong>the</strong> body<br />

<strong>of</strong> AGCQs q. ✷<br />

Theorem 3. QA is in P for Lite-English and AGCQ-English.<br />

Pro<strong>of</strong>. It follows from Theorem 1 and Lemma 2. ✷<br />

6 Conclusions and Fur<strong>the</strong>r Work<br />

We have provided a certain number <strong>of</strong> guidelines on how to characterize <strong>the</strong> computational<br />

complexity <strong>of</strong> CL interfaces to ontology-driven data access and management systems.<br />

This is achieved by expressing QA in controlled English. We have also shown that<br />

we reach tractability when we choose DL-Lite as ontology language and GCQs and<br />

AGCQs as query languages, for which two CLs, GCQ-English and AGCQ-English,<br />

have been introduced. As fur<strong>the</strong>r work we plan to extend <strong>the</strong> coverage <strong>of</strong> AGCQ-English<br />

to <strong>the</strong> rest <strong>of</strong> <strong>the</strong> basic SQL functions, namely COUNT and SUM and to substantiate (or<br />

validate) <strong>the</strong> intuitiveness <strong>of</strong> <strong>the</strong>se CLs anf <strong>of</strong> <strong>the</strong>ir English constructs by analysing more<br />

question corpora.<br />

193


Acknowledgements<br />

I would like to thank my supervisors, R. Bernardi and D. Calvanese, toge<strong>the</strong>r with I. Pratt,<br />

for <strong>the</strong>ir help and suggestions.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Bernardi, R., Calvanese, D. and Thorne, C. (2007). Lite Natural Language, <strong>Proceedings</strong><br />

<strong>of</strong> <strong>the</strong> 7th International Workshop on Computational Semantics (IWCS-7).<br />

Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M. and Rosati, R. (2007).<br />

Tractable Reasoning and Efficient Query Answering in Description Logics: The<br />

DL-Lite Family, JAR .<br />

Jurafsky, D. and Martin, J. (2000). Speech and Language Processing, Prentice Hall.<br />

Lesmo, L. and Robaldo, L. (2007). Use <strong>of</strong> Ontologies in Practical NL Query Interpretation,<br />

<strong>Proceedings</strong> <strong>of</strong> AI*IA 2007.<br />

Minock, M. (2005). A Phrasal Approach to Natural Language Interfaces over Databases,<br />

Natural Language Processing and Information Systems, 10th International Conference<br />

on Applications <strong>of</strong> Natural Language to Information Systems (NLDB 2005).<br />

Montague, R. (1970). Universal Grammar, Theoria (36).<br />

Mooney, R. J. (2007). Learning for Semantic Parsing, <strong>Proceedings</strong> <strong>of</strong> CICLing2007.<br />

Pratt, I. (2001). On <strong>the</strong> Semantic Complexity <strong>of</strong> some Fragments <strong>of</strong> English, Technical<br />

report, Department <strong>of</strong> Computer Science – University <strong>of</strong> Manchester.<br />

Rosati, R. (2007). The Limits <strong>of</strong> Querying Ontologies, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Eleventh International<br />

Conference on Database Theory (ICDT 2007).<br />

Thorne, C. (2007). Managing Structured Data with Controlled English - An Approach<br />

Based on Description Logics, <strong>Proceedings</strong> <strong>of</strong> <strong>ESSLLI</strong> 2007 <strong>Student</strong> <strong>Session</strong>.<br />

Vardi, M. (1982). The Complexity <strong>of</strong> Relational Query Languages, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />

Fourteenth Annual ACM Symposium on Theory <strong>of</strong> Computing.<br />

194


1 Introduction<br />

INTERROGATION IN DYNAMIC EPISTEMIC LOGIC ∗<br />

Christina Unger – Gianluca Giorgolo<br />

UiL-OTS, Universiteit Utrecht<br />

Questions still exhibit an aura <strong>of</strong> mystery and challenge as it is commonly found with<br />

natural language phenomena that lie on <strong>the</strong> border between semantics and speech acts.<br />

Several treatments have been proposed within denotational semantics, never<strong>the</strong>less <strong>the</strong>re<br />

is no strong consensus about what kind <strong>of</strong> a semantic object <strong>the</strong>ir meaning is. One <strong>of</strong> <strong>the</strong><br />

early and most well-known approaches to <strong>the</strong> semantics <strong>of</strong> interrogatives was introduced<br />

by (Hamblin, 1973) and fur<strong>the</strong>r developed by (Karttunen, 1977). Their line <strong>of</strong> work reduces<br />

<strong>the</strong> meaning <strong>of</strong> interrogatives to propositions by letting questions denote <strong>the</strong> set<br />

<strong>of</strong> possible or true answers. A slightly different approach is <strong>the</strong> partition semantics by<br />

(Higginbotham and May, 1981) and (Groenendijk and Stokh<strong>of</strong>, 1984). It is based on <strong>the</strong><br />

intuition that <strong>the</strong> meaning <strong>of</strong> questions are partitions <strong>of</strong> <strong>the</strong> logical space constituted by<br />

<strong>the</strong> mutually exclusive possibilities that can serve as answers.<br />

In this paper we propose Dynamic Epistemic Logic (DEL) as a powerful tool for a<br />

unified treatment <strong>of</strong> all question types. We will start with an epistemic interpretation <strong>of</strong><br />

Dynamic Propositional Logic and show how to use it to formalize yes/no questions and<br />

answerhood. We will <strong>the</strong>n extend it with public announcements and add public questions<br />

as well as a possibility to embed questions. Finally we sketch how this can also be generalized<br />

to <strong>the</strong> case <strong>of</strong> constituent questions in <strong>the</strong> line <strong>of</strong> Groenendijk & Stokh<strong>of</strong>. In <strong>the</strong><br />

last section we will give an outlook on additional benefits <strong>of</strong> using DEL as a framework,<br />

e.g. <strong>the</strong> interaction <strong>of</strong> questions and presuppositions.<br />

2 Yes/no questions<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The basis for our investigations is propositional dynamic logic (PDL), an extension <strong>of</strong><br />

propositional logic with programs, under an epistemic interpretation. If P is a set <strong>of</strong><br />

propositions and A a set <strong>of</strong> relational atoms, with p ∈ P an arbitrary proposition and i<br />

ranging over A, <strong>the</strong> language is given by<br />

φ ::= ⊤ | p | ¬φ | φ ∧ φ | [π] φ<br />

π ::= i | π ; π | π ∪ π | π ∗ | TEST φ<br />

The main idea <strong>of</strong> giving this PDL-language an epistemic interpretation (see e.g. (van<br />

Ben<strong>the</strong>m, van Eijck and Kooi, 2006)) is that relational atoms represent epistemic accessibilities<br />

<strong>of</strong> single agents. The picture thus is <strong>the</strong> following: <strong>the</strong> state <strong>of</strong> knowledge <strong>of</strong><br />

∗ For valuable comments we are very grateful to <strong>the</strong> referees and to Jan van Eijck.<br />

195


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

a group <strong>of</strong> agents is modeled as a multimodal S5 Kripke model M = (W, V, R), where<br />

W is a non-empty set <strong>of</strong> worlds, V is a valuation function that assigns to every basic<br />

proposition <strong>the</strong> set <strong>of</strong> all worlds where that proposition is true, and R is a function that<br />

assigns to every agent i an equivalence relation ∼i, where w ∼i w ′ expresses that i cannot<br />

distinguish between w and w ′ , i.e. that w and w ′ are epistemic alternatives for i.<br />

The semantics is defined with respect to a model M = (W, R, V ), with <strong>the</strong> usual<br />

interpretation for ⊤, p, negation, and conjunction. The interpretation <strong>of</strong> [π] φ is given by:<br />

M, w |= [π] φ iff for all w ′ with (w, w ′ ) ∈ �π� M : M, w ′ |= φ<br />

where �π� M is <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> epistemic construct π, given as follows: basic epistemic<br />

constructs i are interpreted by ∼i and composed ones by means <strong>of</strong> regular operations on<br />

relations: �π ; π ′ � M = �π� M ◦ �π ′ � M where ◦ is relational composition, �π ∪ π ′ � M =<br />

�π� M ∪�π ′ � M , �π ∗ � M = (�π� M ) ∗ where ∗ is <strong>the</strong> reflexive transitive closure, and TEST is <strong>the</strong><br />

usual test <strong>of</strong> dynamic logics with �TEST φ� M = {(w, w) | w ∈ W and M, w |= φ}. The<br />

epistemic modalities thus express knowledge <strong>of</strong> an agent or a group <strong>of</strong> agents, including<br />

higher-order knowledge. Common knowledge among a group <strong>of</strong> agents is given by <strong>the</strong><br />

reflexive transitive closure <strong>of</strong> <strong>the</strong> union <strong>of</strong> all individual accessibilities <strong>of</strong> agents in <strong>the</strong><br />

group: [(i∪j ∪. . .) ∗ ] φ expresses that φ is common knowledge. 1 As an example, consider<br />

<strong>the</strong> following model (arrows in both directions are drawn as simple lines, and reflexive<br />

arrows are not drawn):<br />

w1<br />

p q r<br />

i, j<br />

j<br />

w0<br />

j<br />

p q r<br />

w2<br />

pq r<br />

Connections between two worlds with label i indicate that agent i confuses <strong>the</strong>se<br />

worlds. What is depicted is a knowledge state in which it is common knowledge among<br />

i and j that p (i.e. both know that p and both know that <strong>the</strong>y both know p, etc.) and that<br />

r → q, and where i also knows q but does not know r, whereas j is ignorant both about q<br />

and r.<br />

2.1 Direct yes/no questions and answerhood<br />

Now it is possible to formulate <strong>the</strong> ideas <strong>of</strong> Groenendijk & Stokh<strong>of</strong>’s partition semantics<br />

in E-PDL. Given <strong>the</strong> language above, we can define an additional process Fφ:<br />

F φ =def (TEST φ ; G ; TEST φ)∪(TEST ¬φ ; G ; TEST ¬φ) where G = W ×W<br />

With respect to <strong>the</strong> semantics given above, Fφ denotes an equivalence relation:<br />

�F φ� M = {(w, w ′ ) | M, w |= φ iff M, w ′ |= φ}<br />

1 The possibility to express common knowledge makes E-PDL more expressive than simple multi-agent<br />

epistemic logics that arise from extending propositional logic with knowledge modalities.<br />

196


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The process F thus partitions <strong>the</strong> model with respect to a formula. Using <strong>the</strong> restriction<br />

operation on relations, we will refer to <strong>the</strong> partition cell containing a world w (in a<br />

model M) as [w]F φ. An example is given in <strong>the</strong> following figure, where <strong>the</strong> model was<br />

partitioned with respect to p.<br />

p<br />

p<br />

p<br />

w<br />

[w]F p<br />

Important to note is that a question is not a formula, as in (Groenendijk and Stokh<strong>of</strong>,<br />

1984), but a process. This might seem like a minor difference, but it will allow us in <strong>the</strong><br />

next section to use questions as updates and <strong>the</strong>reby talk about <strong>the</strong> communicative act <strong>of</strong><br />

questioning.<br />

Defining <strong>the</strong> process F is enough already to define what it means for a formula to be a<br />

true or a possible answer to a question.<br />

Definition 1. A formula ψ is a true answer to <strong>the</strong> question whe<strong>the</strong>r φ, w.r.t. a model M<br />

and a world w, if for all w ′ ∈ W : M, w ′ |= ψ iff w ′ ∈ {v | (w, v) ∈ [w]F φ}.<br />

Definition 2. A formula ψ is a possible (or: appropriate) answer to <strong>the</strong> question whe<strong>the</strong>r<br />

φ, w.r.t. a model M, if <strong>the</strong>re is some w ∈ W , such that ψ is a true answer to <strong>the</strong> question<br />

whe<strong>the</strong>r φ w.r.t. M and w.<br />

In o<strong>the</strong>r words, a possible answer is a formula with a denotation that spans exactly one<br />

<strong>of</strong> <strong>the</strong> partition cells induced by <strong>the</strong> question; it is a true answer if this partition cell is<br />

<strong>the</strong> actual one with respect to a particular world w. The somewhat small difference to <strong>the</strong><br />

picture <strong>of</strong> Groenendijk & Stokh<strong>of</strong> is that we do not rely on entailment between questions<br />

to define answerhood. Entailment between questions can never<strong>the</strong>less be explicated by<br />

requiring for two questions whe<strong>the</strong>r φ and whe<strong>the</strong>r ψ to entail each o<strong>the</strong>r that for all M:<br />

p<br />

p<br />

�F φ� M ⊆ �F ψ� M<br />

Up to now, we have a logic with basic propositions and boolean combinations, toge<strong>the</strong>r<br />

with epistemic operations on <strong>the</strong>se, that represent knowledge <strong>of</strong> (groups <strong>of</strong>) agents. This<br />

gave us a possibility to talk about questions as partitioning processes and about formulas<br />

being answers to questions. However, it tells us nothing about what it means to pose<br />

a question in a communication, about its effects, and about what it means to answer it,<br />

because we have no means yet to talk about communicative actions. For that we will<br />

move to a Dynamic Epistemic Logic that also contains public announcements and public<br />

questions.<br />

197<br />

F p<br />

p


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

3 Questioning and answering<br />

Dynamic Epistemic Logic provides a logical framework for reasoning about knowledge <strong>of</strong><br />

(groups <strong>of</strong>) agents and change <strong>of</strong> this knowledge due to communication. Ano<strong>the</strong>r modal<br />

operator, taken from Public Announcement Logic (Plaza, 1989), is added to E-PDL, that<br />

models <strong>the</strong> event <strong>of</strong> all agents being told simultaneously and transparently that a certain<br />

formula holds. Change <strong>of</strong> knowledge induced by announcements corresponds to updates<br />

<strong>of</strong> knowledge states. Analogously, we add a modal operator for public questions and<br />

a one-place predicate to turn a formula φ into a formula WH φ. Thus, <strong>the</strong> language is<br />

extended in <strong>the</strong> following way:<br />

φ ::= ... | [!φ] φ | [?φ] φ | WH φ<br />

The communicative effect <strong>of</strong> a public announcement is given by a restriction operation on<br />

epistemic models:<br />

M, w |= [!φ] φ ′ iff M, w |= φ implies M | φ, w |= φ ′<br />

Where M | φ is <strong>the</strong> restriction <strong>of</strong> M with φ, i.e. <strong>the</strong> epistemic model M ′ = (W ′ , V ′ , R ′ )<br />

with W ′ = {w ∈ W | M, w |= φ} and V ′ <strong>the</strong> restiction <strong>of</strong> V to W ′ and R ′ <strong>the</strong> result <strong>of</strong><br />

restricting each ∼i to W ′ × W ′ .<br />

The interpretion <strong>of</strong> <strong>the</strong> question update has to be different, because asking a question in<br />

a communicative situation obviously has no effect on <strong>the</strong> knowledge <strong>of</strong> agents. It ra<strong>the</strong>r<br />

creates or shifts <strong>the</strong> focus <strong>of</strong> <strong>the</strong> conversation, in many cases to particular alternatives<br />

among which <strong>the</strong> answer lies. To model <strong>the</strong> focus <strong>of</strong> a conversation we add to R in <strong>the</strong><br />

model an additional accessibility relation FOCUS, which initially denotes W × W and is<br />

visible for all agents. The question update can <strong>the</strong>n be interpreted as reseting FOCUS:<br />

M, w |= [?φ] ψ iff M[FOCUS := �F φ� M ], w |= ψ<br />

Where M[FOCUS := �F φ� M ] is like M except for that <strong>the</strong> denotation <strong>of</strong> <strong>the</strong> relation<br />

FOCUS is set to <strong>the</strong> denotation <strong>of</strong> F φ.<br />

Answering can <strong>the</strong>n simply be seen as announcement <strong>of</strong> an answer. Note that an appropriate<br />

answer automatically resets <strong>the</strong> FOCUS relation to its default value (i.e. neutral<br />

focus W × W ), because <strong>the</strong> update with <strong>the</strong> answer will eliminate all but one partition<br />

cell. From <strong>the</strong> definition <strong>of</strong> an appropriate answer it also follows immediately that <strong>the</strong><br />

following holds.<br />

Proposition 1. If ψ is an appropriate answer to <strong>the</strong> question whe<strong>the</strong>r φ, <strong>the</strong>n for all<br />

w : M, w |= [?φ][!ψ] φ or M, w |= [?φ][!ψ]¬φ.<br />

It expresses that appropriate answers do indeed answer <strong>the</strong> question. (For announcements<br />

in general it is <strong>of</strong> course not <strong>the</strong> case that ei<strong>the</strong>r [!ψ]φ or [!ψ]¬φ holds.)<br />

This is best demonstrated by an example. Assume two agents Turing (t) and Church<br />

(c), and let p and q be propositions (for example, p for For every program we can<br />

decide whe<strong>the</strong>r it halts and q for There is no general algorithm to decide whe<strong>the</strong>r a<br />

property <strong>of</strong> natural numbers is true or not). The knowledge state <strong>of</strong> Turing and Church<br />

is depicted in <strong>the</strong> leftmost model <strong>of</strong> <strong>the</strong> below figure. For example, Turing does not know<br />

p, but he knows that Church knows whe<strong>the</strong>r p. These are preconditions usually considered<br />

198


to hold for questions to be felicitous. So Turing decides to ask whe<strong>the</strong>r p. 2 After updating<br />

with this question, appropriate answers will be p and ¬p (or formulas equivalent to <strong>the</strong>se).<br />

Assuming that w0 is our world <strong>of</strong> reference, <strong>the</strong> true answer is ¬p. Announcement <strong>of</strong> <strong>the</strong><br />

true answer eliminates world w1, which results in Turing knowing ¬p. In fact he now also<br />

happens to know q, whereas Church is still ignorant about it.<br />

w0<br />

p q<br />

t<br />

c<br />

w1<br />

p q<br />

w2<br />

p q<br />

?p<br />

3.1 Embedded yes/no questions<br />

w0<br />

p q<br />

t<br />

c<br />

w1<br />

p q<br />

w2<br />

p q<br />

!¬p<br />

w0<br />

p q<br />

c<br />

w2<br />

p q<br />

This approach is not at all restricted to direct questions but can straightforwardly deal with<br />

embedded questions as well, by means <strong>of</strong> <strong>the</strong> predicate WH. Embedded questions differ<br />

from direct questions in that <strong>the</strong>y seem to refer to a specific partition cell (namely <strong>the</strong> true<br />

one) and not <strong>the</strong> partitioning as a whole, yet do not give away which partition cell is <strong>the</strong><br />

true one. We already have <strong>the</strong> means to achieve this:<br />

�WH φ� M,w =def {w ′ | (w, w ′ ) ∈ [w]F φ}<br />

The formula WH φ can now be embedded in o<strong>the</strong>r formulas, for example in statements<br />

about <strong>the</strong> knowledge <strong>of</strong> agents. E.g., Ian knows whe<strong>the</strong>r Penicillin was flown in can<br />

be represented as [i] (WH p). In general, <strong>the</strong> following facts hold.<br />

Proposition 2. [i] (WH φ) |= [i] φ ∨ [i] ¬φ<br />

I.e. if an agent knows whe<strong>the</strong>r a formula is a case, he ei<strong>the</strong>r knows <strong>the</strong> formula or its<br />

negation.<br />

Proposition 3. [i] (WH φ) |�= φ and analogously [i] (WH φ) |�= ¬φ<br />

I.e. <strong>the</strong> statement that an agents knows whe<strong>the</strong>r a formula is <strong>the</strong> case does not provide <strong>the</strong><br />

information whe<strong>the</strong>r <strong>the</strong> formula or its negation is <strong>the</strong> case.<br />

4 Constituent questions<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Let us shortly sketch how our approach can be extended to <strong>the</strong> predicate logic case, to<br />

also account for constituent questions in <strong>the</strong> line <strong>of</strong> Groenendijk & Stokh<strong>of</strong>’s work.<br />

For this, we start with a first order dynamic logic:<br />

t ::= c | x<br />

φ ::= ⊤ | t | P t . . . t | ¬φ | φ ∧ φ | ∃x φ | [π]<br />

π ::= i | π ; π | π ∪ π | π ∗ | TEST φ<br />

2 Actually announcements and questions are not parametrized with respect to an agent. We will shortly<br />

come back to that in section 5.<br />

199


where c ranges over constants, and x ranges over variables. To this language, public<br />

announcements, questions, and a question embedding predicate are added, as above:<br />

φ ::= ... | [!φ] φ | [?φ] φ | WH φ<br />

This language is interpreted with respect to a first order model M with fixed domain, a<br />

world w, and a variable assignment g, as usual.<br />

Like Groenendijk & Stokh<strong>of</strong>, we add ano<strong>the</strong>r operator to bind variables.<br />

Definition 3. If φ is a formula in which all and only <strong>the</strong> variables x1, . . . , xn have one or<br />

more free occurrences, <strong>the</strong>n Qx1 . . . xn φ is a formula.<br />

A partitioning process over such a formula will model constituent questions. E.g.<br />

F Qx P x corresponds to <strong>the</strong> question about which entity lies in <strong>the</strong> denotation <strong>of</strong> P . The<br />

semantics can be adopted from (Groenendijk and Stokh<strong>of</strong>, 1997):<br />

where<br />

�Qx1 . . . xn φ� M,w,g = {w ′ | 〈Qx1 . . . xn φ〉 M,w′ ,g = 〈Qx1 . . . xn φ〉 M,w,g }<br />

〈Qx1 . . . xn φ〉 M,w,g = { (g ′ (x1), . . . , g ′ (xn)) | M, w, g ′ |= φ,<br />

where g ′ (x) = g(x) for all x �= x1 . . . xn}<br />

I.e. �Qx1 . . . xn φ� M,w,g is <strong>the</strong> set <strong>of</strong> all worlds in which <strong>the</strong> same entities belong to <strong>the</strong><br />

extension <strong>of</strong> φ as in w. For example, for a question like Who is coming to <strong>the</strong> party?<br />

we get<br />

�F Qx P x� M,g = {(w, w ′ ) | �Qx P x� M,w,g = �Qx P x� M,w′ ,g }<br />

This means that F Qx P x partitions <strong>the</strong> model with respect to all possible extensions <strong>of</strong><br />

<strong>the</strong> predicate P . Thus, John is coming to <strong>the</strong> party is indeed a true answer to <strong>the</strong><br />

question if j is in <strong>the</strong> extension <strong>of</strong> P in <strong>the</strong> actual world.<br />

Notice that in <strong>the</strong> case <strong>of</strong> closed formulas φ, <strong>the</strong> process F φ models a yes/no question<br />

as above.<br />

4.1 Answerhood<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

In <strong>the</strong> case <strong>of</strong> constituent questions, we have to distinguish between partial and exhaustive<br />

true answers. This does not pose a problem, assuming that <strong>the</strong> result <strong>of</strong> announcing an<br />

exhaustive answer to a question φ is equal to [w]F φ, where w is <strong>the</strong> actual world, while<br />

<strong>the</strong> result <strong>of</strong> announcing a partial answer is a subset <strong>of</strong> FOCUS that contains [w]F φ.<br />

4.2 Embedded constituent questions<br />

The WH predicate introduced in section 3 for embedding yes/no question also works for<br />

embedding constituent questions. Consider, for example, <strong>the</strong> embedding <strong>of</strong> who is coming<br />

to <strong>the</strong> party (as in Ian knows who is coming to <strong>the</strong> party): its interpretation is<br />

�WH Qx P x� M,g = {w ′ | (w, w ′ ) ∈ [w]F Qx P x}<br />

As mentioned above, F Qx P x partitions <strong>the</strong> model with respect to all possible extensions<br />

<strong>of</strong> P . Thus [w]F Qx P x corresponds to <strong>the</strong> true exhaustive answer. Therefore, Ian knows<br />

who is coming to <strong>the</strong> party means that Ian knows <strong>the</strong> exhaustive answer to <strong>the</strong> question<br />

about who is coming to <strong>the</strong> party.<br />

200


5 Fur<strong>the</strong>r research<br />

One goal for a semantics <strong>of</strong> direct questions, that we did not touch upon yet, is <strong>of</strong>fering<br />

updates YES and NO as answers to yes/no questions. Having <strong>the</strong>m correspond to <strong>the</strong><br />

actual use <strong>of</strong> natural language yes and no, however, is a quite complex matter. Leaving<br />

this discussion aside, <strong>the</strong> most straight-forward way to accommodate <strong>the</strong> possibility <strong>of</strong><br />

simple yes/no answers in our system would be to add a yes/no predicate to <strong>the</strong> language,<br />

which a question whe<strong>the</strong>r φ sets to <strong>the</strong> denotation <strong>of</strong> φ. YES would <strong>the</strong>n correspond<br />

to <strong>the</strong> announcement <strong>of</strong> this predicate, whereas NO would be <strong>the</strong> announcement <strong>of</strong> its<br />

negation. Having such a predicate provides ano<strong>the</strong>r possibility to specify answerhood:<br />

a formula ψ would be an answer to <strong>the</strong> question whe<strong>the</strong>r φ if it were equivalent to <strong>the</strong><br />

predicate or its negation. This would actually suffice to also get all <strong>the</strong> partitioning effects<br />

and propositions we talked about in section 2. As long as only alternative questions are<br />

addressed, it is just a question <strong>of</strong> design whe<strong>the</strong>r to use such a predicate or a relation as we<br />

did. The advantage <strong>of</strong> our proposal, however, is it can be extended to <strong>the</strong> predicat logic<br />

case for constituent questions. It thus allows to use one uniform mechanism underlying<br />

both kinds <strong>of</strong> questions.<br />

Possibly <strong>the</strong> main benefit <strong>of</strong> our proposal is <strong>the</strong> general benefit <strong>of</strong> Dynamic Epistemic<br />

Logic for natural language analysis: DEL provides a powerful framework for <strong>the</strong> formalization<br />

<strong>of</strong> pragmatic concepts that play a role in communicative situations. Based on<br />

this, it can also serve to explore <strong>the</strong> interaction between <strong>the</strong>se phenomena. Let us illustrate<br />

this by shortly looking at presuppositions. An update <strong>of</strong> a question ψ that carries a<br />

presupposition φ can be expressed as <strong>the</strong> update (TEST φ ; ?ψ), which first tests whe<strong>the</strong>r<br />

<strong>the</strong> presupposition is satisfied, and, if so, updates with <strong>the</strong> actual question. If <strong>the</strong> presupposition<br />

test fails, <strong>the</strong> whole update is not successful. Such a presuppositon could, for<br />

example, be that <strong>the</strong> speaker who asks does not know <strong>the</strong> answer but holds it possible that<br />

someone among <strong>the</strong> addressees knows it. For this, one would need a means to parametrize<br />

announcements (and question updates) to a certain agent, which, to our knowledge, has<br />

not been done yet. On <strong>the</strong> o<strong>the</strong>r hand side, one can ask whe<strong>the</strong>r a proposition is <strong>the</strong><br />

case which contains a presupposition itself, e.g. Did John stop smoking?. Adopting<br />

<strong>the</strong> treatment <strong>of</strong> presuppositions in (Eijck and Unger, 2007), asking a proposition ψ that<br />

carries a presupposition φ, would set FOCUS to <strong>the</strong> denotation <strong>of</strong> F (Cφ ∧ ψ) (where Cφ<br />

abbreviates that φ is common knowledge). If Cφ is false, however, <strong>the</strong> partitioning induced<br />

by Cφ ∧ ψ consists just <strong>of</strong> one partition cell, thus fails to create a focus with which<br />

an informative answer would be possible.<br />

Fur<strong>the</strong>rmore, incorporating not only knowledge but also belief would allow for modeling,<br />

among <strong>the</strong> presuppositions, different expectations in question pairs like Are you<br />

going to Groningen? and Aren’t you going to Groningen?.<br />

Last but not least, ano<strong>the</strong>r possible line <strong>of</strong> research is to investigate how our FOCUS<br />

relation connects to Rooth’s <strong>the</strong>ory <strong>of</strong> focus interpretation (Rooth, 1992), and thus explore<br />

<strong>the</strong> connection between questions and information-structural focus.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Eijck, J. v. and Unger, C. (2007). The epistemics <strong>of</strong> presupposition projection, in<br />

M. Aloni, P. Dekker and F. Roel<strong>of</strong>sen (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Sixteenth Amsterdam<br />

Colloquium, December 17–19, 2007, ILLC, Amsterdam, pp. 235–240.<br />

201


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Groenendijk, J. and Stokh<strong>of</strong>, M. (1984). Studies on <strong>the</strong> Semantics <strong>of</strong> Questions and <strong>the</strong><br />

Pragmatics <strong>of</strong> Answers, PhD <strong>the</strong>sis, Universiteit van Amsterdam.<br />

Groenendijk, J. and Stokh<strong>of</strong>, M. (1997). Questions, in J. van Ben<strong>the</strong>m and A. ter Meulen<br />

(eds), Handbook <strong>of</strong> logic and language, Elsevier, chapter 19, pp. 1055–1124.<br />

Hamblin, C. (1973). Questions in montague english, Foundations <strong>of</strong> Language 10: 41–53.<br />

Higginbotham, J. and May, R. (1981). Questions, quantifiers and crossing, Linguistic<br />

Review 1: 41–79.<br />

Karttunen, L. (1977). Syntax and semantics <strong>of</strong> questions, Linguistics and Philosophy 1: 1–<br />

44. Also published in: Portner & Partee (eds.): Formal Semantics. The Essential<br />

Readings. Blackwell, 2003, pp 382–420.<br />

Plaza, J. (1989). Logics <strong>of</strong> public communications, in M. Emrich, M. Pfeifer,<br />

M. Hadzikadic and Z. Ras (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 4th International Symposium<br />

on Methodologies for Intelligent Systems, pp. 201––216.<br />

Rooth, M. (1992). A <strong>the</strong>ory <strong>of</strong> focus interpretation, Natural Language Semantics<br />

1(1): 75–116.<br />

van Ben<strong>the</strong>m, J., van Eijck, J. and Kooi, B. (2006). Logics <strong>of</strong> communication and change,<br />

Information and Computation 204(11): 1620–1662.<br />

202


THE SEMANTIC CHANGE OF THE FRENCH -AGE-DERIVATION<br />

Melanie Uth<br />

University <strong>of</strong> Stuttgart<br />

Abstract. In this paper, I will investigate <strong>the</strong> diachrony <strong>of</strong> <strong>the</strong> French -age-derivation. I will<br />

argue that -age originally served to derive kind terms that have been reinterpreted as group<br />

terms and as true event nominalizations. The main hypo<strong>the</strong>sis will be that <strong>the</strong> reinterpretation<br />

<strong>of</strong> <strong>the</strong> -age-suffixation was enabled by <strong>the</strong> fact that <strong>the</strong> original derivatives and <strong>the</strong><br />

new derivatives share important features <strong>of</strong> <strong>the</strong>ir abstract conceptual representations. This<br />

approach to <strong>the</strong> diachrony <strong>of</strong> <strong>the</strong> -age-suffixation predicts that even nowadays, true event<br />

nominalizations in -age focus on <strong>the</strong> atelic parts <strong>of</strong> <strong>the</strong> event denoted by <strong>the</strong> base verb. As<br />

such, <strong>the</strong> proposal constitutes a (fur<strong>the</strong>r) evidence in favor <strong>of</strong> <strong>the</strong> hypo<strong>the</strong>sis that -age may<br />

be differentiated from its rival -ment by means <strong>of</strong> its specific aspectual characteristics.<br />

1 Introduction<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Modern French -ment and -age are <strong>of</strong>ten described as competing nominalization suffixes,<br />

since <strong>the</strong>y frequently attach to <strong>the</strong> same verbal bases. For example, Lüdtke (1987) gives<br />

a listing <strong>of</strong> 187 doublets, as e.g. gonflement/gonflage (’inflation’) and alludes to ”<strong>the</strong> numerically<br />

most important overlap in <strong>the</strong> French lexicon” (ibd.: 103). Contrary to that, in<br />

Old French, <strong>the</strong> -ment-suffixation was one <strong>of</strong> <strong>the</strong> standard procedures for deverbal event<br />

nominalization, while event nominalizations in -age were marginal. In our database, we<br />

attest 99,5% event nominalizations in -ment contrary to 0,5% deverbal nominalizations in<br />

-age. In this paper, I will investigate <strong>the</strong> conditions that enabled <strong>the</strong> propagation <strong>of</strong> true<br />

event nominalizations in -age from Old to Modern French. I will argue that -age could<br />

develop into an event nominalization suffix next to -ment because <strong>the</strong> two suffixes systematically<br />

differ as concerns <strong>the</strong> abstract conceptual representation <strong>of</strong> <strong>the</strong>ir derivatives.<br />

In section 2, I will concentrate on <strong>the</strong> Latin antecedents <strong>of</strong> <strong>the</strong> French -age-derivation, <strong>the</strong><br />

relational adjectives in -aticu, as well as on <strong>the</strong>ir substantivized forms that are transfered<br />

to Old French. Section 3 focusses on <strong>the</strong> genuinly French formations, i.e. group terms<br />

and true event nominalizations. In section 4, I will argue that <strong>the</strong> genesis <strong>of</strong> group terms<br />

in -age resulted from a reinterpretation <strong>of</strong> <strong>the</strong> borrowed terms, that was enabled by <strong>the</strong><br />

fact that <strong>the</strong> original derivatives and <strong>the</strong> new derivatives share important features <strong>of</strong> <strong>the</strong>ir<br />

abstract conceptual representations. In section 5, I hypo<strong>the</strong>size that an analogous reinterpretation<br />

occured in <strong>the</strong> deverbal domain, predicting that true event nominalizations in<br />

-age only attach to bases that are in some sense atelic. Finally, we will consider different<br />

analyses <strong>of</strong> Modern French -age-nominalizations showing that <strong>the</strong>se may indeed be<br />

characterized by <strong>the</strong> salience <strong>of</strong> atelic aspectual values.<br />

2 The antecedents <strong>of</strong> New French -age: relational -aticu-adjectives and borrowed<br />

substantivizations<br />

The French -age-suffixation developed from <strong>the</strong> Latin denominal relational adjectives in<br />

-aticu that served to sub-classify <strong>the</strong> type <strong>of</strong> object or event denoted by <strong>the</strong> head nouns<br />

203


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(e.g. census terraticus ’tax on land’). Denominal relational adjectives establish a relation<br />

between <strong>the</strong> head noun and <strong>the</strong> base noun whose exact semantic function needs to be<br />

specified by <strong>the</strong> semantics <strong>of</strong> <strong>the</strong> derivational constituents, and possibly by fur<strong>the</strong>r contextual<br />

influences, c.f. eg. Fradin (2008: 3ff). For <strong>the</strong> present purposes, <strong>the</strong> crucial point is<br />

that relational adjectives classify types <strong>of</strong> nouns (in <strong>the</strong> sense <strong>of</strong> Vergnaud & Zubizaretta<br />

(1992)), instead <strong>of</strong> concrete tokens.<br />

In recent work on kind terms it is common to treat ”kinds” on a pair with ”classes or<br />

”types” <strong>of</strong> objects (in <strong>the</strong> relevant sense). For example, Krifka et al. (1995) as well as<br />

Chierchia (1998) assume that common nouns generally have both a kind-referring function<br />

and a predicative function, predicates and kinds being related by virtue <strong>of</strong> <strong>the</strong> realization<br />

relation R in <strong>the</strong> sense that every object y in <strong>the</strong> extension <strong>of</strong> <strong>the</strong> predicate δ is<br />

an instantiation <strong>of</strong> <strong>the</strong> kind x (ιx.∀y[δp(y) ↔ R(y,x)]). 1 Building on this approach to<br />

kinds, we may conceive <strong>of</strong> <strong>the</strong> relational -aticu-adjectives as deriving terms referring to<br />

(sub-)kinds, in a way such that census terraticus is a kind <strong>of</strong> census, just as e.g. porcus<br />

silvaticus (’wild pig’) is a kind <strong>of</strong> porcus and canis venaticus (’staghound’) is a kind <strong>of</strong><br />

canis. 2<br />

During <strong>the</strong> transition from Latin to Old French, <strong>the</strong> -aticu-adjectives were substantivized<br />

and resulted in designations <strong>of</strong> taxes, rights, status etc. that are lexicalized and<br />

entail <strong>the</strong> traditional head noun as a semantic constituent (cf. Fleischman (1990: 10ff)):<br />

(1) TAX (chevage = bounty, capitation; from chief ):<br />

la fud subjecte e rendid chevage . . .<br />

<strong>the</strong>re was-3Sg subjected and paid-3Sg capitation . . .<br />

’he subjected it and paid capitation’ (NCA:reis)<br />

(2) RIGHT (passage = right to cross a territory; from passer ):<br />

si (. . . ) disent . . . , que il queroient passage . . .<br />

prt. say-3Pl that <strong>the</strong>y ask for passing<br />

’and (<strong>the</strong>y) said that <strong>the</strong>y ask for <strong>the</strong> right to pass. . . ’ (NCA: clari)<br />

It is important to note that, whereas <strong>the</strong> base nouns <strong>of</strong> <strong>the</strong>se lexicalized substantivizations<br />

consistently retain <strong>the</strong>ir kind-referring function, <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> incorporated<br />

head nouns varies between kind-reference and object-reference, depending on <strong>the</strong><br />

context. This difference is signalled by <strong>the</strong> determiner system <strong>of</strong> Old French, where terms<br />

that do not refer to actual extensions show up as bare nouns (cf. Foulet (1998: 49)):<br />

(3) a. et cele claciele guardoit en zz escrignet k il avoit quanqu<br />

and this little key keep-3Sg in a shrine that he got-3Sg when<br />

estovoit a monniage.<br />

was-3Sg in monasticism.<br />

‘and he kept this little key in a shrine that he got when he lived in monasticism’<br />

(NCA: P. Mouskes)<br />

b. ne fait sanblant que il s en faingne le singne fait dou<br />

not do-3Sg seeming that he refl <strong>of</strong> it feign-3Sg <strong>the</strong> monkey done by <strong>the</strong><br />

moniage.<br />

monkhood.<br />

‘he dit not seem to feign <strong>the</strong> monkey made by <strong>the</strong> monkhood’ (NCA: Renart)<br />

1 Contrary to Chierchia (1998), Krifka et al. (1995: 66) ”leave it open as to we<strong>the</strong>r every predicate has a<br />

corresponding kind individual”.<br />

2 See McNally & Boleda (2004) for a similar approach to relational adjectives. However, for <strong>the</strong> time<br />

being <strong>the</strong> above analysis is ment to refer only to Latin -aticu, not to relational adjectives in general.<br />

204


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Building on a representational format proposed by Fradin (2008: 4), <strong>the</strong> lexical semantics<br />

<strong>of</strong> <strong>the</strong> above substantivizations my be represented as in (4), where <strong>the</strong> index k is ment<br />

to indicate kind-reference,o signals object reference, and REL designates <strong>the</strong> relation that<br />

is introduced by <strong>the</strong> denominal adjective relating <strong>the</strong> denotation <strong>of</strong> <strong>the</strong> head noun (’rank’)<br />

to <strong>the</strong> one <strong>of</strong> <strong>the</strong> base noun (’monk’): 3<br />

(4) a. T (moniage) = (λxk.REL(xk, yk) ∧ rank ′ (xk) ∧ monk ′ (yk))<br />

b. T (moniage) = (λxk.REL(xk, yk) ∧ rank ′ (xk) ∧ monk ′ (yk))<br />

3 Genuinly French -age-derivatives: group terms and event nominals<br />

Next to <strong>the</strong> substantivized -age-derivatives borrowed from Latin <strong>the</strong>re are two genuinly<br />

French coinages, i.e. group terms (5) and true event nominalizations (6):<br />

(5) GROUP (porcage, ’porc’ = herd <strong>of</strong> swine):<br />

toutes mes bestes et le meilleur porc du porcage<br />

all my beasts and <strong>the</strong> best pig <strong>of</strong> <strong>the</strong> herd <strong>of</strong> pigs<br />

’all my beasts and <strong>the</strong> best pig <strong>of</strong> <strong>the</strong> herd <strong>of</strong> pigs’ (GO: B. deCaux)<br />

(6) EVENT NOMINALIZATION (mariage, ’marier’ = marriage):<br />

il firent le mariage du dit chevalier et de . . .<br />

<strong>the</strong>y make-3Pl <strong>the</strong> marriage <strong>of</strong> <strong>the</strong> said knight and <strong>of</strong> . . .<br />

’<strong>the</strong>y conducted <strong>the</strong> marriage <strong>of</strong> <strong>the</strong> mentioned knight and . . . ’ (NCA: vilhar)<br />

These forms developed from <strong>the</strong> substantivized -aticu-adjectives by means <strong>of</strong> three<br />

innovations: <strong>the</strong> replacement <strong>of</strong> <strong>the</strong> semantically incorporated head nouns by means <strong>of</strong><br />

”group <strong>of</strong>” and ”event <strong>of</strong>”, respectively, <strong>the</strong> strictly extensional interpretation <strong>of</strong> <strong>the</strong> <strong>the</strong>se<br />

new ”head nouns”, as well as <strong>the</strong> strictly extensional interpretation <strong>of</strong> <strong>the</strong> derivational<br />

bases, in a way such that <strong>the</strong> new derivatives generally refer to actual objects and events<br />

instead <strong>of</strong> kinds and event types:<br />

(7) T (porcage) = (λxo.REL(xo, yo) ∧ rank ′ (xo) ∧ pork ′ (yo))<br />

4 Approaching <strong>the</strong> origin <strong>of</strong> <strong>the</strong> group terms<br />

As regards <strong>the</strong> (semantic) constituents <strong>of</strong> <strong>the</strong> newly coined group nouns in -age, note that<br />

<strong>the</strong> new ”head nouns” are interpreted as denoting singular individuals (one group), while<br />

<strong>the</strong> kind-denoting base nouns have been reinterpreted as denoting plural individuals (e.g.<br />

several pigs). In <strong>the</strong> following I would like to argue that this hybrid character <strong>of</strong> <strong>the</strong> new<br />

coinages may be traced back to <strong>the</strong> fact that <strong>the</strong> kind-reference <strong>of</strong> <strong>the</strong> traditional -aticubase<br />

nouns was much more dependent on <strong>the</strong> instantiations <strong>of</strong> <strong>the</strong> respective kinds than<br />

<strong>the</strong> kind-reference <strong>of</strong> <strong>the</strong> head nouns.<br />

In section 2, we already argued that a common noun may principally denote both a<br />

kind as well as its instantiations. The relevant definition by Krifka et al. (1995: 66) is<br />

repreated in (8):<br />

3 This relation is specified as REL instead <strong>of</strong> R in order to separater it from <strong>the</strong> realization relation R<br />

mediating between a kind and its instances (cf. above).<br />

205


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(8) ιx.∀y[δp(y) ↔ R(y,x)]<br />

Roughly: “There is a kind x, such that every y that is in <strong>the</strong> extension <strong>of</strong> <strong>the</strong><br />

belonging predicate δp is a realization <strong>of</strong> x.”<br />

Fur<strong>the</strong>rmore, Krifka et al. (1995:78ff) show that, when a common noun shows up with<br />

kind-reference, <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> corresponding sentences <strong>of</strong>ten also involves <strong>the</strong><br />

instantiations <strong>of</strong> <strong>the</strong> kind. According to <strong>the</strong>se authors, ”even kind predicates [such as be<br />

extinct or be widespread (MU)] are related to properties <strong>of</strong> instances <strong>of</strong> <strong>the</strong> kind, if we<br />

engage in an analysis <strong>of</strong> <strong>the</strong> lexical meaning <strong>of</strong> such predicates (. . . ). For example, in<br />

order to show that <strong>the</strong> dodo is extinct, one has to show that <strong>the</strong>re have been realizations <strong>of</strong><br />

this kind in <strong>the</strong> past, that <strong>the</strong>re are no present realizations <strong>of</strong> this kind now, and, perhaps,<br />

that <strong>the</strong>re will be no more in <strong>the</strong> future” (ibd:78f). In <strong>the</strong> restant contexts triggering kindreference<br />

that are discussed by Krifka et al. (1995), <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> common<br />

noun relies to a still greater extent on <strong>the</strong> instantiations <strong>of</strong> <strong>the</strong> relevant kind. A central<br />

example is <strong>the</strong> so-called distinguishing property interpretation as in Dutchmen are good<br />

sailors meaning that ”<strong>the</strong> Dutch distinguish <strong>the</strong>mselves from o<strong>the</strong>r comparable nations by<br />

having good sailors” (ibd.: 82f). According to this interpretation, <strong>the</strong> verbal predicate is<br />

definitely related to properties <strong>of</strong> instantiations <strong>of</strong> <strong>the</strong> kind. 4<br />

Coming back to <strong>the</strong> Latin -aticu-adjectives, I would like to argue that <strong>the</strong> interpretation<br />

<strong>of</strong> <strong>the</strong>ir base nouns also essentially relied on <strong>the</strong> close relation between <strong>the</strong> kinds and <strong>the</strong>ir<br />

instantiations. For example, <strong>the</strong> kind-reference <strong>of</strong> <strong>the</strong> base noun baron <strong>of</strong> barnage (’quality<br />

<strong>of</strong> barons’) builds on <strong>the</strong> fact that an undefined number <strong>of</strong> instantiations <strong>of</strong> <strong>the</strong> kind is<br />

said to be distinguishably noble. This analysis largely holds for all borrowed derivatives<br />

derived from bases that refer to human beings, as e.g. eschevinage (’rank <strong>of</strong> jury men’),<br />

veuvage (’widowhood’),etc. By contrast, <strong>the</strong> kind interpretation <strong>of</strong> fiscal terms as porcage<br />

(’tax on swine’) is closely related to <strong>the</strong> instantiations since <strong>the</strong>se are <strong>the</strong> ones <strong>the</strong> taxes<br />

are to be payed for. Note that if we highlight <strong>the</strong> impact <strong>of</strong> <strong>the</strong> instantiations for <strong>the</strong> referential<br />

characteristics <strong>of</strong> <strong>the</strong> -aticu base nouns, this is not to say that <strong>the</strong> base nouns do<br />

not refer to kinds any longer. We are still faced with an intensional interpretation, i.e. <strong>the</strong><br />

instantiations do not need to exist in <strong>the</strong> actual world. For <strong>the</strong> sake <strong>of</strong> convenience, we<br />

will distinguish in what follows between intensionally defined instantiations and extensionally<br />

defined (actual) instances <strong>of</strong> kinds. Finally note that <strong>the</strong> instantiations <strong>of</strong> a kind<br />

are necessarily non-singular in <strong>the</strong> sense <strong>of</strong> Chierchia (1998:350) who argues that ”kinds<br />

(. . . ) will generally have a plurality <strong>of</strong> instances (even though sometimes <strong>the</strong>y may have<br />

just one or non). But something that is necessarily instantiated by just one individual (e.g.,<br />

<strong>the</strong> individual concept or transworld line associated with Gennaro Chierchia) would not<br />

qualify as a kind.”<br />

Contrary to that, if a given head noun like ’rank’ or ’quality’ refers to a kind, <strong>the</strong> entire<br />

kind as a whole is much more salient than <strong>the</strong> plurality <strong>of</strong> its instantiations. The second<br />

difference between <strong>the</strong> -aticu head nouns and <strong>the</strong> corresponding base nouns is already<br />

illustrated by example 3 above, i.e. whereas <strong>the</strong> base nouns consistently retain <strong>the</strong>ir kindreferring<br />

function, <strong>the</strong> head nouns refer to actual instances <strong>of</strong> <strong>the</strong> kind as soon as <strong>the</strong><br />

derivative shows up with an adequate predicate, as e.g. faire in (3b), repeated below as<br />

(9):<br />

4 Krifka et al. (1995) argue that <strong>the</strong> above sentence clearly differs from characterizing sentences as ”Potatoes<br />

contain vitamin C” in that <strong>the</strong> former unlike <strong>the</strong> latter is not adequately paraphrased by an indefinite<br />

singular NP (cf. ”A Dutchman is a good sailor” vs. ”A potatoe contains vitamin C”).<br />

206


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(9) ne fait sanblant que il s en faingne le singne fait<br />

not do-3Sg seeming that he refl. <strong>of</strong> it feign-3Sg <strong>the</strong> monkey done<br />

dou moniage. 5<br />

by <strong>the</strong> monkhood.<br />

‘he dit not seem to feign <strong>the</strong> monkey made by <strong>the</strong> monkhood’ (cf. 4b.)<br />

Hence, we may generalize that <strong>the</strong> incorporated head nouns <strong>of</strong> <strong>the</strong> borrowed -agederivatives<br />

are interperted as denoting ei<strong>the</strong>r <strong>the</strong> kind as an entirety or a concrete instance<br />

<strong>of</strong> it, whereas <strong>the</strong> morphologically established kind-reference <strong>of</strong> <strong>the</strong> corresponding base<br />

nouns essentially relies on <strong>the</strong> (non-singular) instantiations. I would like to argue that<br />

this difference between <strong>the</strong> interpretation <strong>of</strong> head nouns and base nouns is reflected at an<br />

abstract conceptual level <strong>of</strong> semantic representation, in a way such that <strong>the</strong> concepts denoted<br />

by <strong>the</strong> head nouns are interpreted as representing bounded entities without internal<br />

structure, whereas <strong>the</strong> concepts denoted by <strong>the</strong> base nouns are interpreted as representing<br />

unbounded entities composed <strong>of</strong> sub-individuals. Relying on Jackend<strong>of</strong>f (1991), we may<br />

represent <strong>the</strong> conceptual structure <strong>of</strong> e.g. le moniage (’<strong>the</strong> rank <strong>of</strong> monks’) as in (10),<br />

where <strong>the</strong> feature [± b] encodes <strong>the</strong> distinction between bounded entities like PIG and<br />

non-bounded entities like WATER, while <strong>the</strong> feature [± i] signals <strong>the</strong> distinction between<br />

non-structured individuals like BRICK and those that are composed <strong>of</strong> sub-individuals,<br />

like BUSES or CATTLE (cf. Jackend<strong>of</strong>f (1991: 20)). PL (”plural”) and REL (”relation”)<br />

denote functions that map between different values <strong>of</strong> b and i:<br />

(10) conceptual structure <strong>of</strong> le moniage (’<strong>the</strong> rank <strong>of</strong> monks’):<br />

[+b, -i RANK (REL ([-b, +i MONKS (PL ([+b, -i MONK ])])]<br />

As regards <strong>the</strong> group terms in -age, it is interesting to note that recent analyses tend to<br />

define even non-derived group terms like English committee as being semantically hybrid<br />

in a sense reminiscent to our substantivized -aticu-adjectives. For example, Barker (1992)<br />

proposes that a group term denotes an atomic individual that is (merely) related to <strong>the</strong><br />

plural individual constituting its members by a membership function f. An argument for<br />

<strong>the</strong> difference in extension between <strong>the</strong> group term and <strong>the</strong> related plural predicate is that<br />

<strong>the</strong>re are properties common to all <strong>of</strong> <strong>the</strong> members which are never true <strong>of</strong> <strong>the</strong> group. For<br />

example, Bill can be a member <strong>of</strong> committee A, whereas committee A cannot (cf. ibd.:<br />

73). Likewise a group may have properties that <strong>the</strong> collection <strong>of</strong> its members does not<br />

have, e.g. a group has members while a plurality does not.<br />

One piece <strong>of</strong> evidence corroborating <strong>the</strong> validity <strong>of</strong> this approach for <strong>the</strong> group nouns<br />

in -age comes from predicates that directly refer to <strong>the</strong> members <strong>of</strong> <strong>the</strong> group denoted by<br />

<strong>the</strong> corresponding -age-derivative:<br />

(11) li quens (. . . ) a fait son barnage asanbler.<br />

<strong>the</strong> count has done his knights asemble<br />

‘<strong>the</strong> count assembled his knights’ (NCA: elie)<br />

Adopting this approach to group nouns, we may conclude that <strong>the</strong> abstract conceptual<br />

representation <strong>of</strong> <strong>the</strong> genuinly French group nouns in -age is very similar to <strong>the</strong> one <strong>of</strong><br />

<strong>the</strong> borrowed -aticu-substantivizations, since it likewise entails a bounded non-composed<br />

5 We may assume that borrowed -age-derivatives as in (8) require a determiner since <strong>the</strong> verbal predicate<br />

triggers <strong>the</strong> type shifting from kind (e) to predicate (e,t), cf. e.g. Chierchia (1998: 353).<br />

207


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

concept (in this case <strong>the</strong> group individual), followed by an unbounded concept that is<br />

composed <strong>of</strong> sub-individuals (i.e. <strong>the</strong> members). The relevant conceptual representation<br />

is given in (12), COMP representing Jackend<strong>of</strong>f’s (1991) ”composed <strong>of</strong>”-function (cf.<br />

(ibd.: 23)).<br />

(12) [+b, -i GROUP (COMP ([-b, +i BARONS (PL ([+b, -i BARON ])])]<br />

According to this analysis, <strong>the</strong> substantivized -aticu-adjectives borrowed from Latin<br />

and <strong>the</strong> genuinly French group nouns in -age have <strong>the</strong> same ’skeleton’ (in <strong>the</strong> sense <strong>of</strong><br />

Lieber (2004)):<br />

(<strong>13</strong>) a. [+b, -i ([-b, +i (+b, -i) ])] (borrowed substantivizations)<br />

rank monks monk<br />

b. [+b, -i ([-b, +i (+b, -i) ])] (genuinely French group terms)<br />

group barons baron<br />

This suggests that <strong>the</strong> genesis <strong>of</strong> <strong>the</strong> group nouns in -age essentially relied on <strong>the</strong><br />

abstract conceptual representation <strong>of</strong> <strong>the</strong> substantivized -aticu-adjectives.<br />

Still, <strong>the</strong> differences between <strong>the</strong> borrowed derivatives and <strong>the</strong> new ones are remarkable,<br />

<strong>the</strong> most important development arguably being <strong>the</strong> change from <strong>the</strong> kind level to<br />

<strong>the</strong> level <strong>of</strong> actual instances, that accompanies <strong>the</strong> replacement <strong>of</strong> <strong>the</strong> incorporated head<br />

nouns. One possible approach to this change would be to assume that <strong>the</strong> incorporated<br />

head nouns were replaced by concepts whose quantitative constitution is closest to <strong>the</strong><br />

abstract conceptual representation <strong>of</strong> <strong>the</strong> traditional derivatives, hence <strong>the</strong> introduction <strong>of</strong><br />

<strong>the</strong> group-concept. According to this view, <strong>the</strong> new coinages represent <strong>the</strong> default realizations<br />

choosen by <strong>the</strong> native speakers because <strong>of</strong> <strong>the</strong>ir proximity to <strong>the</strong> skeleton in<br />

(<strong>13</strong>a).<br />

However, since we do not dispose <strong>of</strong> any concrete evidence that could corroborate such<br />

an analysis, this reasoning remains highly speculative. Fur<strong>the</strong>r investigation is needed<br />

to shed light on <strong>the</strong> exact diachronic development <strong>of</strong> <strong>the</strong> -age-derivatives. For our purposes,<br />

<strong>the</strong> most important conclusion from <strong>the</strong> above is that <strong>the</strong> internally plural shape<br />

<strong>of</strong> <strong>the</strong> base nouns exhibited by <strong>the</strong> substantivized -aticu-adjectives is transferred to <strong>the</strong><br />

genuinely French -age-derives, in a way such that, from a synchronic point <strong>of</strong> view on<br />

<strong>the</strong> group nouns, we may generalize that <strong>the</strong> attachment <strong>of</strong> -age necessarily involves <strong>the</strong><br />

pluralization <strong>of</strong> <strong>the</strong> base noun. This generalization may be captured by assuming that<br />

-age introduces a plural operator *P (in <strong>the</strong> sense <strong>of</strong> Link (1983)) into <strong>the</strong> relevant representation.<br />

In <strong>the</strong> following, I will argue that <strong>the</strong> restriction to pluralized bases extends to<br />

<strong>the</strong> deverbal domain and that it is this restriction that enabled <strong>the</strong> event nominalization in<br />

-age to become more and more productive despite <strong>of</strong> <strong>the</strong> existence <strong>of</strong> <strong>the</strong> akin procedure<br />

in -ment.<br />

5 Etymologically conditioned pluractionality and aspectual properties <strong>of</strong> New French<br />

-age<br />

Example 14 contrasts a deverbal substantivized -aticu-adjective (14a) with a true event<br />

nominalization in -age (14b). The nominalization in (14a) means ’right <strong>of</strong> crossing’, <strong>the</strong><br />

head noun as well as <strong>the</strong> base verb denoting event types. Contrary to that, <strong>the</strong> nominalization<br />

in (14b) means ’<strong>the</strong> event <strong>of</strong> passing’, <strong>the</strong> new head noun as well as <strong>the</strong> base verb<br />

referring to actual instances <strong>of</strong> events:<br />

208


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(14) a. si disent que il queroient passage (. . . ).<br />

prt. say-3Pl that <strong>the</strong>y ask for passing<br />

’<strong>the</strong>y said that <strong>the</strong>y ask for <strong>the</strong> right to pass’ (cf.(2))<br />

b. cil m abandona le passage de la haie mout doucement.<br />

this me allow-3Sg <strong>the</strong> passing <strong>of</strong> <strong>the</strong> hedge very gently<br />

’he very gently allowed me to pass <strong>the</strong> hedge’ (NCA: rose)<br />

Evidently, <strong>the</strong> borrowed deverbal -age-derivatives exhibit <strong>the</strong> same semantics as <strong>the</strong><br />

borrowed denominal ones. That is, <strong>the</strong> event type denoted by e.g. passage in <strong>13</strong>a. is<br />

closely related to its instantiations, since <strong>the</strong> right is conceded for instantiations <strong>of</strong> crossing<br />

that are,fur<strong>the</strong>rmore, restricted to distinguished territories. Similarly, fees denoted by<br />

e.g. pressoirage or troillage (’fee for use <strong>of</strong> <strong>the</strong> village press’, cf. Fleischman (1990:74))<br />

are payed for instantiations <strong>of</strong> pressing events that display specific properties, etc. Note<br />

that <strong>the</strong> fees and rights are still estimated and conceded for events in general, i.e. for event<br />

types. Never<strong>the</strong>less, <strong>the</strong> derivation <strong>of</strong> <strong>the</strong> corresponding -aticu-adjectives obviously relied<br />

on (typical) instantiations.<br />

Naturally, <strong>the</strong> parallel between <strong>the</strong> denominal and <strong>the</strong> deverbal borrowed substantivized<br />

-age-derivatives also extends to <strong>the</strong> head noun. That is, due to <strong>the</strong> accidental<br />

character <strong>of</strong> <strong>the</strong>ir kind-reference (showing up only since <strong>the</strong>y occurr in <strong>the</strong> realm <strong>of</strong> an<br />

-aticu-adjective), <strong>the</strong> head nouns are largely independent from <strong>the</strong>ir instantiations, focussing<br />

on <strong>the</strong> entire type <strong>of</strong> event as a whole. Fur<strong>the</strong>rmore, just as <strong>the</strong> denominal derivatives,<br />

<strong>the</strong> head nouns <strong>of</strong> <strong>the</strong> deverbal substantivizations may also be coerced to refer to<br />

actual instances by contextual means:<br />

(15) et<br />

and<br />

rendi<br />

gave-3Sg<br />

chascuns<br />

everyone<br />

son<br />

his<br />

passage<br />

toll<br />

a<br />

to<br />

ceuls<br />

those<br />

qui<br />

that<br />

leur<br />

<strong>the</strong>m<br />

avoient<br />

had<br />

presté.<br />

lended<br />

’and everyone returned his toll to those had lended it to <strong>the</strong>m.’ (NCA: vilhar)<br />

This parallelism <strong>of</strong> denominal and deverbal -aticu-substantivizations suggests that <strong>the</strong><br />

extensional reinterpretation <strong>of</strong> <strong>the</strong> deverbal ones (i.e. <strong>the</strong>ir shift from kind-reference to<br />

object-reference) may be modelled along <strong>the</strong> lines <strong>of</strong> <strong>the</strong> analysis proposed above for <strong>the</strong><br />

group nouns. In order to put forward this hypo<strong>the</strong>sis, I will draw on van Geenhoven (2005)<br />

who introduces <strong>the</strong> so-called pluractional operator, that corresponds to Link’s plural operator<br />

*P and that operates o verbal bases in order to ”distribute subevent times in various<br />

ways over <strong>the</strong> overall event time <strong>of</strong> an utterance”. Pluractionality and (indefinite) plurality<br />

join <strong>the</strong> characteristic <strong>of</strong> cumulative reference, a concept that was originally introduced<br />

to define <strong>the</strong> reference <strong>of</strong> mass nouns and indefinite plurals denoting homogeneous pluralities<br />

or masses. The crucial characteristic <strong>of</strong> an entity being in <strong>the</strong> extension <strong>of</strong>, for<br />

example, a mass term, is that its parts, as well as any sum <strong>of</strong> its parts, are in <strong>the</strong> extension<br />

<strong>of</strong> <strong>the</strong> same term. As is pointed out by Quine (1960:19), ”[s]o called mass terms like ’water’,<br />

’footwear’, and ’red’ have <strong>the</strong> semantic property <strong>of</strong> referring cumulatively: any sum<br />

<strong>of</strong> parts which are water is water.” Evidently, this characteristic can easily be transferred<br />

to <strong>the</strong> domain <strong>of</strong> eventualities, in <strong>the</strong> sense that atelic expressions refer cumulatively to<br />

eventualities, whereas telic expressions refer non-cumulatively to eventualities. Accordingly,<br />

van Geenhoven (2005: 6) takes pluractionality to be ”<strong>the</strong> true source <strong>of</strong> atelicity”,<br />

covering <strong>the</strong> ”atelic nature” (ibd.) <strong>of</strong> different lexical items as e.g. simple activity verbs<br />

(to sing), imperfective aspectual markers (engl. -ing) or frequency adverbs (occasionally).<br />

209


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Based on this approach, we may hypo<strong>the</strong>size that, due to <strong>the</strong> specific conceptual ’skeleton’<br />

<strong>of</strong> <strong>the</strong>ir antecedents, <strong>the</strong> innovative event nominalizations in -age referring to individual<br />

events are conceptualized as being internally pluractional (PLUR), just as <strong>the</strong><br />

innovative denominal group terms are perceived <strong>of</strong> as having pluralized bases:<br />

(16) conceptual representation <strong>of</strong> passage (’event <strong>of</strong> crossing’):<br />

[+b, -i EVENT (COMP ([-b, +i CROSSING (PLUR ([+b, -i CROSSING ])])]<br />

Unfortunately, this hypo<strong>the</strong>sis may hardly be verified for Old French, since true event<br />

nominalizations are only marginally represented in Old French corpora. However, evidence<br />

in favour comes from analyses <strong>of</strong> -age and -ment in Modern French, showing that<br />

event nominalizations in -age even nowadays exhibit aspectual characteristics related to<br />

pluractionality. One example is Bally (1965), who argues that -ment-nominalizations are<br />

generally very likely to be punctual or terminative, whereas -age-nominalizations ra<strong>the</strong>r<br />

realize durative and iterative aspectual values. Since aspectual values as iterativity, continuativity,<br />

durativity etc. may all be traced back to pluractionality (cf. van Geenhoven<br />

(2005: ibd.)), Bally’s differentiation <strong>of</strong> New French -age and -ment clearly supports our<br />

analysis.<br />

Interestingly, Martin (2008) <strong>of</strong>fers a detailed analysis <strong>of</strong> various aspectual differences<br />

between -age and -ment that is largely in line with Bally’s classification. For example,<br />

<strong>the</strong> non-terminativity <strong>of</strong> -age is illustrated by <strong>the</strong> complementary distribution <strong>of</strong> -age- and<br />

-tion-nominals in contexts as in (17):<br />

(17) a. Le dénazifiage de l’Allemagne (par X) a abouti sa dénazification (par X).<br />

’The denazifying <strong>of</strong> Germany (by X) resulted in its denazification (by X)’<br />

b. *La dénazification de l’Allemagne (par X) a abouti son dénazifiage (par X). The<br />

denazification <strong>of</strong> Germany (by X) resulted in its denazifying (by X).<br />

(Martin (2008: 12))<br />

Secondly, Martin argues that -age is able to denote longer eventive chains than -ment,<br />

as is evidenced by <strong>the</strong> fact -age-nominals derived from unergative intransitive bases exhibit<br />

an iterative interpretation, whereas <strong>the</strong> corresponding -ment-nominals are forced to<br />

show up with plural inflection in iterative contexts:<br />

(18) a. OK Une séance de miaulage. (singular)<br />

’A meouwing session’<br />

b. vs. * Une séance de miaulement. (singular)<br />

c. vs. OK Une séance de miaulements. (plural) (Martin (2008: 6))<br />

Thirdly, Martin states that -age contrary to -ment prefers internal arguments that are<br />

incrementally affected by <strong>the</strong> event denoted by <strong>the</strong> relevant base verb, a feature that is also<br />

displayed by o<strong>the</strong>r atelic expressions as e.g. <strong>the</strong> English Progressive (cf. van Geenhoven<br />

(2005: 12)).<br />

In my view, <strong>the</strong> above findings may be unified by assuming that <strong>the</strong> diachronically<br />

motivated restriction <strong>of</strong> <strong>the</strong> -age-derivation to pluractional bases carries over to Modern<br />

French where it is reflected by <strong>the</strong> fact that -age-nominalizations exhibit several aspectual<br />

values related to pluractionality, as e.g. iterativity, durativity or imperfectivity. A fur<strong>the</strong>r<br />

advantage <strong>of</strong> this analysis is that we may answer <strong>the</strong> question relating to <strong>the</strong> alleged suffix<br />

rivalry by arguing that -age could develop into an event nominalization suffix since it<br />

displays specific aspectual properties that distinguishes it from rival suffixes as -ment.<br />

210


6 Conclusion<br />

In this paper, I investigated <strong>the</strong> semantic development <strong>of</strong> <strong>the</strong> French -age-derivation. I<br />

argued that <strong>the</strong> genesis <strong>of</strong> group terms in -age resulted from a reinterpretation <strong>of</strong> <strong>the</strong><br />

borrowed substantivized -aticu-derivatives that was enabled by <strong>the</strong> fact that <strong>the</strong> original<br />

procedure and <strong>the</strong> new procedure share <strong>the</strong> same quantitative structure at an abstract level<br />

<strong>of</strong> conceptual representation. The result <strong>of</strong> this reinterpretation is that denominal -age<br />

is restricted to pluralized bases. With reference to van Geenhoven (2005), I <strong>the</strong>n related<br />

nominal plurality to verbal pluractionality through <strong>the</strong> notion <strong>of</strong> cumulative reference and<br />

I argued that <strong>the</strong> development in <strong>the</strong> deverbal domain strictly parallels <strong>the</strong> development<br />

in <strong>the</strong> denominal domain, in a way such that deverbal -age is restrained to pluractional<br />

bases. This analysis enables us to approach several questions concerning <strong>the</strong> change <strong>of</strong><br />

<strong>the</strong> -age-derivation. First <strong>of</strong> all, <strong>the</strong> change turns out to only affect more concrete levels<br />

<strong>of</strong> word formation, <strong>the</strong> basic skeleton being retained through <strong>the</strong> course <strong>of</strong> <strong>the</strong> diachronic<br />

development. This common ground constitutes both <strong>the</strong> condition that enabled <strong>the</strong> change<br />

to take place and <strong>the</strong> basic frame that determines <strong>the</strong> specific characteristics <strong>of</strong> <strong>the</strong> -agederivation<br />

to this day. Secondly, we may answer <strong>the</strong> question relating to <strong>the</strong> suffix rivalry<br />

by arguing that -age could develop into an event nominalization suffix since it displayes<br />

specific aspectual properties that distinguish it from its alleged rival -ment.<br />

Acknowledgements<br />

I wish to thank Martin Becker, Steffen Heidinger, Fabienne Martin, Achim Stein and<br />

Johannes Wespel for helpful discussions. Many thanks to <strong>the</strong> reviewers for <strong>the</strong>ir helpful<br />

comments, as well as to Fabienne Martin and Dennis Spohr for <strong>the</strong>ir technical support<br />

and <strong>the</strong>ir patience.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Bally, C. (1965). Linguistique générale et linguistique française, Francke, Berne.<br />

Barker, C. (1992). Group terms in english, Journal <strong>of</strong> Semantics 9: 69–93.<br />

Blum, C. (2002). Godefroy – Le Dictionnaire de l’Ancienne Langue française du IX e au<br />

XV e siècle [GO], Université Paris-Sorbonne. Electronic edition.<br />

Chierchia, G. (1995). Reference to kinds across languages, Natural Language and Linguistic<br />

Theory 6: 339–405.<br />

Fradin, B. (2008). On <strong>the</strong> semantics <strong>of</strong> denominal adjectives, On Line <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />

6th Mediterranean Morphology Meeting, Sept. 27-30, 2007, Vol. 2, Ithaca.<br />

Jackend<strong>of</strong>f, R. S. (1991). Parts and boundaries, Cognition 41: 9–45.<br />

Krifka, M., Pelletier, F. J., Carlson, G. N., ter Meulen, A., Chierchia, G. and Link, G.<br />

(1995). Genericity: An introduction, in G. N. Carlson and F. J. Pelletier (eds), The<br />

Generic Book, University <strong>of</strong> Chicago Press, Chicago.<br />

211


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Lieber, R. (2004). Morphology and Lexical Semantics, Cambridge University Press, Cambridge.<br />

Link, G. (1983). The logical analysis <strong>of</strong> plurals and mass terms: A lattice <strong>the</strong>oretical<br />

approach, in G. Link, R. Bauerle, C. Schwarze and A. von Stechow (eds), Meaning,<br />

Use and Interpretation <strong>of</strong> Language, Walter de Gruyter, Berlin, pp. 302–323.<br />

Lüdtke, J. (1978). Prädikative Nominalisierungen mit Suffixen im Katalanischen, Spanischen<br />

und Französischen, Niemeyer, Tübingen.<br />

Martin, F. (2008). The semantics <strong>of</strong> eventive suffixes in french. Paper presented to Formal<br />

Semantics in Moscow 4, 5th April 2008.<br />

McNally, L. and Boleda, G. (2004). Relational adjectives as properties <strong>of</strong> kinds, in<br />

O. Bonami and P. Cabredo H<strong>of</strong>herr (eds), Empirical Issues in Syntax and Semantics<br />

5, Papers from CSSP 2003, pp. 179–196.<br />

Quine, W. V. (1960). Word and Object, MIT Press, Cambridge, Mass.<br />

Stein, A. and Kunstmann, P. (2006). Le Nouveau Corpus d’Amsterdam [NCA], Universität<br />

Stuttgart, Institut für Linguistik/Romanistik, Stuttgart.<br />

van Geenhoven, V. (2005). Atelicity, pluractionality and adverbial quantification, in<br />

H. Verkuyl, H. de Swart and A. van Hout (eds), Perspectives on Aspect, Springer,<br />

The Ne<strong>the</strong>rlands, pp. 107–125.<br />

Vergnaud, J. and Zubizaretta, M. (1975). The definite determiner and <strong>the</strong> inalienable<br />

construction in french and english, Linguistic Inquiry pp. 595–652.<br />

212


ADVERSARY IMPLICATURES ∗<br />

Grégoire Winterstein<br />

Université Paris 7<br />

Abstract. The work reported in this paper deals with a certain preference that speakers show<br />

when reinforcing some conversational implicatures. We look at <strong>the</strong> apparent correlation between<br />

this class <strong>of</strong> inferences and <strong>the</strong> bi-partite classification <strong>of</strong> conversational implicatures<br />

proposed by L. Horn. We <strong>the</strong>n argue for a separation between <strong>the</strong> argumentative and inferential<br />

dimensions <strong>of</strong> an utterance and propose a brief explanation based on propositions by<br />

Ducrot.<br />

In this work we are interested in one aspect <strong>of</strong> what is classically considered as <strong>the</strong> reinforcement<br />

<strong>of</strong> conversational implicatures. In <strong>the</strong> first section we show that <strong>the</strong> felicitous<br />

reinforcement <strong>of</strong> implicatures isn’t free, as it is <strong>of</strong>ten considered to be (e.g. in<br />

(Levinson, 2000)). In some cases speakers show a preference for marking a contrast<br />

when reinforcing inferences, in o<strong>the</strong>rs a contrast can’t be used. We examine <strong>the</strong> properties<br />

<strong>of</strong> each class <strong>of</strong> implicatures defined in this manner. We <strong>the</strong>n look at <strong>the</strong> similarities<br />

between <strong>the</strong> class <strong>of</strong> inferences exhibiting this preference for contrast and <strong>the</strong> Q-based<br />

class <strong>of</strong> implicatures as defined by Horn. Ultimately, we discard <strong>the</strong> similarity as irrelevant<br />

to our purpose. More generally, we argue that a classical neo-gricean approach can’t<br />

give an explanation for <strong>the</strong> facts at hand.<br />

The second section aims at explaining <strong>the</strong>se facts in an argumentative perspective based<br />

on <strong>the</strong> works <strong>of</strong> Anscombre and Ducrot. We claim that some implicatures are in a systematic<br />

rhetorical opposition to <strong>the</strong> utterance <strong>the</strong>y are derived from, a fact which licenses<br />

<strong>the</strong> use <strong>of</strong> a contrast for reinforcement. Besides licensing it, this opposition seemingly<br />

requires <strong>the</strong> presence <strong>of</strong> contrast. We propose two different views to explain this preference.<br />

1 Empirical Domain<br />

1.1 Core data<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The data presented in (1) is our prime example <strong>of</strong> study. In (1b) B’s answer is interpreted<br />

as carrying with it <strong>the</strong> implicature in (1c) 1 , a standard example <strong>of</strong> scalar implicature as<br />

presented, among o<strong>the</strong>rs, in (Horn, 1989).<br />

(1) a. A: Do you know whe<strong>the</strong>r John will come?<br />

b. B: It’s possible<br />

c. ❀It’s not sure<br />

d. It’s possible, but it’s not sure<br />

∗ I thank Pascal Amsili, Jacques Jayez, Frédéric Laurens, François Mouret and <strong>the</strong> audiences <strong>of</strong> FSIM’4<br />

and JSM’08 for <strong>the</strong>ir precious help and remarks during <strong>the</strong> preparation <strong>of</strong> this work.<br />

1 We use <strong>the</strong> notation A❀B to mean that <strong>the</strong> utterance <strong>of</strong> A implicates B<br />

2<strong>13</strong>


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

The inference (1c) can be reinforced as in (1d). What interests us is that an utterance such<br />

as (2), without an adversative discourse marker, sounds degraded compared to (1d) (as an<br />

answer to (1a)).<br />

(2) B: # It’s possible and it’s not sure<br />

We believe that <strong>the</strong> preference for (1d) over (2) is somehow unexpected. If <strong>the</strong> implicature<br />

(1c) is indeed conveyed by <strong>the</strong> utterance <strong>of</strong> (1b), one has to explain how it can be construed<br />

as “opposed” to <strong>the</strong> utterance that allowed its presence in <strong>the</strong> first place (as suggested by<br />

<strong>the</strong> adversative but). A similar fact is already noted in (Anscombre and Ducrot, 1983)<br />

with <strong>the</strong> following example:<br />

(3) Pierre s’imagine que Jacques et moi sommes de vieilles connaissances, mais pourtant<br />

on ne s’est jamais rencontrés.<br />

Pierre figures that Jacques and I are old-time friends, but we never met.<br />

Anscombre and Ducrot use (3) to illustrate <strong>the</strong> difference between <strong>the</strong>ir notions <strong>of</strong> argumentation<br />

2 and inference. Although <strong>the</strong> first part <strong>of</strong> <strong>the</strong> utterance allows an inference<br />

towards <strong>the</strong> second part, it is never<strong>the</strong>less argumentatively opposed to it and thus licences<br />

a contrast. (Horn, 1991) shows that more generally any kind <strong>of</strong> content related to an<br />

utterance U (by relations <strong>of</strong> implicature, presupposition, logical entailment. . . ) can be<br />

felicitously redunded as long it is argumentatively opposed to U. Therefore, as unexpected<br />

as <strong>the</strong> preference for a contrast might be in (1dd), <strong>the</strong> situation appears common.<br />

This prompts us to look at <strong>the</strong> argumentative properties <strong>of</strong> <strong>the</strong> implicatures relative to<br />

<strong>the</strong>ir mo<strong>the</strong>r-utterances. More specifically, we’ll be checking whe<strong>the</strong>r certain subtypes <strong>of</strong><br />

implicatures are distinguished by this argumentative behaviour.<br />

On a last note about <strong>the</strong> core-data, we wish to mention <strong>the</strong> case <strong>of</strong> <strong>the</strong> scale <strong>of</strong> quantifiers:<br />

〈all, some〉. Usually, scalar implicatures are exemplified with this latter scale as in<br />

(4).<br />

(4) a. A: How is your experiment going?<br />

b. B: I tested some <strong>of</strong> <strong>the</strong> subjects.<br />

c. ❀B didn’t test all <strong>the</strong> subjects.<br />

d. I tested some <strong>of</strong> <strong>the</strong> subjects, but not all.<br />

e. # I tested some <strong>of</strong> <strong>the</strong> subjects, and not all.<br />

We prefer to rely on (1) because <strong>the</strong> preference for using an adversative appears stronger<br />

in (1d) than in (4d). Nei<strong>the</strong>r (2) nor (4e) can be entirely ruled out. Both can be used as<br />

corrections <strong>of</strong> a previous statements (in those cases <strong>the</strong>y would probably have specific<br />

prosodic patterns). Putting this aside, we also observe that <strong>the</strong> preference for marking<br />

a contrast is less strong for <strong>the</strong> examples with quantifiers. Simple Google searches for<br />

<strong>the</strong> french quelques-uns et pas tous or english some and not all yield several thousands<br />

<strong>of</strong> occurrences, not all <strong>of</strong> <strong>the</strong>m corrections, whereas a search for possible and not certain<br />

only provides results <strong>of</strong> <strong>the</strong> form only possible and not certain. The presence <strong>of</strong> <strong>the</strong> adverb<br />

2 The notion <strong>of</strong> argumentation is rooted in Anscombre and Ducrot’s view on discourse. According to<br />

<strong>the</strong>m a speaker always talk to a point and his utterances argue for a certain conclusion, quite <strong>of</strong>ten <strong>the</strong> topic<br />

<strong>of</strong> <strong>the</strong> discourse, which may or may not be explicit. Merin considers that understanding <strong>the</strong> nature <strong>of</strong> this<br />

topic is what “figuring out <strong>the</strong> speaker’s apparent and real intentions” is about. Anscombre and Ducrot<br />

consider that some linguistic items or structures, such as almost, bear specific argumentative properties and<br />

thus entertain a systematic argumentative opposition or correlation with o<strong>the</strong>r propositions.<br />

214


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

only restricts <strong>the</strong> meaning <strong>of</strong> possible and <strong>the</strong>se examples aren’t conclusive compared to<br />

<strong>the</strong> some and not all ones. However, <strong>the</strong> effect <strong>of</strong> only is an interesting one and we shall<br />

return to it below.<br />

1.2 First attempt at a classification<br />

Ra<strong>the</strong>r unsurprisingly, if we look at <strong>the</strong> cancellation <strong>of</strong> <strong>the</strong> implicature (1c), we find that<br />

<strong>the</strong> use <strong>of</strong> an adversative is odd in (5a). A reformulation as in (5b) sounds better.<br />

(5) a. # It’s possible but it’s sure<br />

b. It’s possible and it’s even sure<br />

Such observations have already been made in (Benndorf and Koenig, 1998). Using data<br />

about <strong>the</strong> cancellation <strong>of</strong> implicatures, <strong>the</strong> authors argue for a treatment <strong>of</strong> <strong>the</strong> semantic<br />

contribution <strong>of</strong> <strong>the</strong> adversative but based on Horn’s distinction between Q-based and Rbased<br />

implicatures. This distinction appears relevant since R-based implicatures 3 allow a<br />

contrast for <strong>the</strong>ir cancellation as shown with various examples in (6).<br />

(6) a. Gwen took <strong>of</strong>f her socks and jumped into bed, but not in that order<br />

b. Billy cut a finger, but it wasn’t his<br />

c. Sam and Max moved <strong>the</strong> piano, but not toge<strong>the</strong>r<br />

As expected, <strong>the</strong> use <strong>of</strong> an adversative to reinforce <strong>the</strong> same implicatures yields odd sentences:<br />

(7).<br />

(7) a. # Gwen took <strong>of</strong>f her socks and jumped into bed, but in that order<br />

b. # Billy cut a finger, but it was his<br />

c. # Sam and Max moved <strong>the</strong> piano, but toge<strong>the</strong>r<br />

It should be noted that <strong>the</strong> sentences in (7) are out only under <strong>the</strong> assumption that <strong>the</strong><br />

considered implicatures are present. It is easy to imagine contexts for which all <strong>the</strong>se<br />

sentences are correct. For example, if sentence (7b) is uttered about some mafia henchman<br />

who breaks o<strong>the</strong>r people’s fingers on a daily basis, <strong>the</strong> sentence is quite felicitous but <strong>the</strong><br />

implicature we’re interested in isn’t conveyed in <strong>the</strong> first place.<br />

In (5) we’ve seen that <strong>the</strong> cancellation <strong>of</strong> scalar implicatures doesn’t allow a contrast.<br />

The same goes for all o<strong>the</strong>r types <strong>of</strong> Q-based implicatures 4 : (8a) is a clausal implicature<br />

as first described in (Gazdar, 1979), (8b) is based on an attitude predicate, (8c) is based<br />

on Grice’s maxim <strong>of</strong> Manner ra<strong>the</strong>r than <strong>of</strong> Quantity (and belongs to Levinson’s M-based<br />

implicatures class).<br />

(8) a. Bill is in <strong>the</strong> kitchen or <strong>the</strong> living room, (?but/and in fact) I know which<br />

b. John thinks that Mary is pregnant, (?but/and in fact) she is indeed expecting a<br />

child<br />

c. Sam caused Max’s death, (?but/and in fact) he actually killed him on purpose<br />

3 R-based implicatures are enrichments <strong>of</strong> an utterance related to underspecified aspects <strong>of</strong> <strong>the</strong> propositional<br />

content (temporal ordering, causal relations etc.) They come about in a wide variety <strong>of</strong> shapes. In<br />

(Levinson, 2000) <strong>the</strong>se inferences are called I-based implicatures.<br />

4 For Horn, Q-based implicatures are essentially negative in nature: an implicated meaning is calculated<br />

by taking into account which stronger, or more informative, relevant forms <strong>the</strong> speaker could have uttered<br />

but chose not to. This notion <strong>of</strong> Q-implicatures subsumes Levinson’s Q and M implicatures.<br />

215


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

As in (1c) <strong>the</strong> reinforcement <strong>of</strong> <strong>the</strong>se inferences seems better with some contrast 5 .<br />

(9) a. Bill is in <strong>the</strong> kitchen or <strong>the</strong> living room, ?(but) I don’t know which<br />

b. John thinks that Mary is pregnant, ?(but) she’s not<br />

c. Sam caused Max’s death, ?(but) he didn’t kill him on purpose<br />

Relying on <strong>the</strong>se observations, Benndorf and Koenig proposes to change <strong>the</strong> classical<br />

description <strong>of</strong> but, as given in (Anscombre and Ducrot, 1977) and reproduced in (10), by<br />

reducing Ducrot’s notion <strong>of</strong> argumentativity to Gricean inferences.<br />

(10) a. A sentence p but q is felicitous iff <strong>the</strong>re is a proposition H such that:<br />

b. p is an argument for H<br />

c. q is an argument for ¬H<br />

d. q argues more strongly for ¬H than p argues for H<br />

Benndorf and Koenig’s description is given in (11), where “world inference” stands for<br />

any inference deriving from world knowledge.<br />

(11) a. A sentence p but q is felicitous iff <strong>the</strong>re is a proposition H such that:<br />

b. H is an R-inference or a “world inference” derived from p<br />

c. q toge<strong>the</strong>r with <strong>the</strong> common ground entails ¬H<br />

1.3 The limits <strong>of</strong> a purely Gricean description<br />

The description <strong>of</strong> but given in (11) is attractive because it explicits Ducrot’s argumentativity<br />

with well-studied inference mechanisms. However, this proposition raises several<br />

issues.<br />

As noted about (3), Anscombre and Ducrot are adamant about distinguishing inference<br />

and argumentation. A good illustration <strong>of</strong> <strong>the</strong> difference between <strong>the</strong> two is exemplified<br />

in (12).<br />

(12) a. Mary almost fell.<br />

b. → Mary didn’t fell.<br />

c. Mary almost fell but she caught herself.<br />

The utterance <strong>of</strong> (12a) conventionally conveys (12b) 6 and yet a contrast is preferred in<br />

(12c) where <strong>the</strong> first sentence is connected with one entailing (12b) (we don’t use (12b) as<br />

such because <strong>the</strong> repetition <strong>of</strong> <strong>the</strong> lexical material alters <strong>the</strong> judgment on (12c)). The use <strong>of</strong><br />

an adversative shows that (12a) and (12b) are argumentatively opposed. According to <strong>the</strong><br />

description <strong>of</strong> but given in (11), this amounts to say that on one hand (12a) conventionally<br />

conveys (12b) and at <strong>the</strong> same time R-implicates its opposite. Put more simply, this<br />

means that an utterance could, and should, convey two opposite inferences at <strong>the</strong> same<br />

time. If we adopt <strong>the</strong> classical Gricean view <strong>of</strong> an implicature as a part <strong>of</strong> meaning<br />

mutually recognized by both speaker and addressee, <strong>the</strong>n a speaker uttering (12c) should<br />

be contradicting himself, or at <strong>the</strong> very least sound “dissonant”.<br />

5 Actually <strong>the</strong> versions without any connector might sound acceptable with <strong>the</strong> second conjunct as an<br />

explanation <strong>of</strong> <strong>the</strong> first (especially for (9c)). We acknowledge such readings but won’t deal with <strong>the</strong>m<br />

directly. Our point lies in <strong>the</strong> fact that it’s not possible to reinforce <strong>the</strong>se inferences without enforcing a<br />

discourse relation. A Contrast relation is <strong>the</strong> most “natural” one to convey and it is <strong>the</strong> most compatible<br />

with all studied inferences.<br />

6 For a detailed study <strong>of</strong> <strong>the</strong> properties <strong>of</strong> almost see (Jayez and Tovena (Jayez and Tovena, 2008)).<br />

216


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Moreover, should we be able to find a sentence coordinated by but such that <strong>the</strong> second<br />

conjunct is <strong>the</strong> cancellation <strong>of</strong> a Q-based implicature, it would be a counter-example to<br />

<strong>the</strong> description in (11). We believe (<strong>13</strong>) is such an example 7 .<br />

(<strong>13</strong>) a. Mo<strong>the</strong>r: I hope Kevin has been polite with Granny and he has managed to eat<br />

some <strong>of</strong> her terrible cookies.<br />

b. Fa<strong>the</strong>r: The problem is, he did eat some <strong>of</strong> <strong>the</strong>m, but in fact he ate all <strong>of</strong> <strong>the</strong>m<br />

and Granny said that he was greedy.<br />

The use <strong>of</strong> some in answer (<strong>13</strong>b) is such that it excludes that Kevin ate all <strong>of</strong> <strong>the</strong> cookies:<br />

an implicature restricting <strong>the</strong> meaning <strong>of</strong> some seems present. Two options are available:<br />

1. In this particular utterance <strong>the</strong> implicature from some to not all isn’t a scalar implicature<br />

but an R-based one. On one hand, this would be consistent with (11).<br />

On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> presence <strong>of</strong> <strong>the</strong> reformulative item in fact is similar to <strong>the</strong><br />

standard cases <strong>of</strong> scalar implicature cancellation. At this stage, it would mean that<br />

<strong>the</strong>re are two different mechanisms for producing <strong>the</strong> same inference with similar<br />

characteristics except on <strong>the</strong> argumentative side: not a very desirable situation.<br />

2. The implicature is indeed a scalar implicature: in this case <strong>the</strong> argumentative orientation<br />

<strong>of</strong> <strong>the</strong>se inferences isn’t always opposed to <strong>the</strong>ir base-utterance. A simple<br />

Gricean approach is <strong>the</strong>n unable to provide a satisfactory analysis <strong>of</strong> <strong>the</strong> core data<br />

in (1). Since <strong>the</strong> description <strong>of</strong> but is usually given in an argumentative framework,<br />

this isn’t surprising. What has now to be explained is how <strong>the</strong> argumentativity <strong>of</strong><br />

<strong>the</strong>se inferences can be accounted for.<br />

A last observation we’ll make is that explaining <strong>the</strong> core data is much simpler once we<br />

abandon implicatures. Taking <strong>the</strong> meaning <strong>of</strong> some as more than 2 and possibly all <strong>the</strong>re<br />

is a clear opposition with a not all interpretation. Things are however a bit more tricky: as<br />

shown by (<strong>13</strong>b) <strong>the</strong> argumentative relationship between <strong>the</strong> some and not all propositions<br />

can vary. What we mean to investigate is on one hand <strong>the</strong> effect that this relation has on<br />

<strong>the</strong> discourse relations one can use to connect discourse segments and on <strong>the</strong> o<strong>the</strong>r hand<br />

<strong>the</strong> effect it has, if any, on <strong>the</strong> derivation <strong>of</strong> inferences.<br />

2 The argumentative approach<br />

Based on <strong>the</strong> observations <strong>of</strong> (1.3) we decide not to adopt <strong>the</strong> description <strong>of</strong> but given in<br />

(11) and keep <strong>the</strong> more traditional one in (10). We now have to explore <strong>the</strong> argumentative<br />

properties <strong>of</strong> implicatures. We will start with a short account <strong>of</strong> <strong>the</strong> argumentative<br />

properties <strong>of</strong> R-based implicatures and <strong>the</strong>n have a closer look at Q-based inferences.<br />

2.1 On <strong>the</strong> reinforcement <strong>of</strong> R-based implicatures<br />

We observed that utterances contrasting <strong>the</strong> content <strong>of</strong> an R-based implicature with its<br />

mo<strong>the</strong>r-utterrance were odd (cf. (7)) and that felicitously interpreting <strong>the</strong>se utterances<br />

implied contexts such that <strong>the</strong> targeted implicature didn’t arise in <strong>the</strong> first place. For <strong>the</strong>se<br />

7 Attested examples <strong>of</strong> this sort are rare, and even scarcer if we restrict <strong>the</strong>m to <strong>the</strong> specific use <strong>of</strong> but<br />

we’re interested in (namely Anscombre and Ducrot’s but/aber/sino), but we think that <strong>the</strong>y’re possible.<br />

217


particular inferences, it seems that we can argue for a systematic argumentative orientation<br />

regarding <strong>the</strong>ir mo<strong>the</strong>r-utterance.<br />

Contrary to <strong>the</strong>ir Q-based counterparts R-based, implicatures lack a propositional content<br />

<strong>of</strong> <strong>the</strong>ir own (as noted for example in (Levinson, 2000)). Expressing <strong>the</strong>m linguistically<br />

amounts to explicitely expressing an enriched version <strong>of</strong> <strong>the</strong> mo<strong>the</strong>r-utterance. Thus,<br />

expressing a contrast between an utterance B and <strong>the</strong> linguistic expression I <strong>of</strong> an hypo<strong>the</strong>tical<br />

R-implicature attached to B means contrasting two identical propositions: if B<br />

indeed carries an implicature, its full interpretation is I and B but I should be interpreted<br />

as I but I. The only way to “redeem” <strong>the</strong> sentence is to reject <strong>the</strong> implicature I associated<br />

with B and interpret B literally or with ano<strong>the</strong>r implicature. The description (11) is thus<br />

accounted for as a sufficient condition for <strong>the</strong> felicitous use <strong>of</strong> but, albeit not a necessary<br />

one.<br />

In <strong>the</strong> Relevance Theory approach by Sperber and Wilson (see (Wilson and Sperber,<br />

2005) for an introduction) <strong>the</strong> inferences in (7) belong to <strong>the</strong> realm <strong>of</strong> explicatures (see<br />

(Carston, 2005) for a presentation). A tempting generalization would <strong>the</strong>n be to say that<br />

<strong>the</strong> preference for marking a contrast is limited to <strong>the</strong> sole “real” implicatures and not<br />

observed in <strong>the</strong> case <strong>of</strong> explicatures. The latter wouldn’t be argumentatively opposed<br />

to <strong>the</strong> utterance <strong>the</strong>y’re attached to because <strong>the</strong>y’re enrichments <strong>of</strong> <strong>the</strong> meaning <strong>of</strong> an<br />

utterance. But, according to (Noveck and Sperber, 2007) and (Carston, 2005), most cases<br />

<strong>of</strong> scalar implicatures are really explicatures, including <strong>the</strong> examples in (1). Fur<strong>the</strong>rmore,<br />

in Grice’s famous “garage” example, reproduced in (14), <strong>the</strong> relevant inference is an<br />

implicature, not an explicature, and yet, it is its cancellation that demands a contrast,<br />

not its reinforcement (cf. <strong>the</strong> bracketed part in (14b)).<br />

(14) a. A: I am out <strong>of</strong> petrol.<br />

b. B: There is a garage round <strong>the</strong> corner, [but it’s closed].<br />

Therefore <strong>the</strong> distinction between explicature and implicature in Relevance Theory isn’t<br />

satisfactory to explain our data.<br />

2.2 Q-based inferences<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Recent works in experimental pragmatics (see (Breheny, Katsos and Williams, 2005)) distinguish<br />

contexts according to <strong>the</strong>ir relation with a targeted scalar inference: <strong>the</strong>y can be<br />

upper-bounded (allowing an interpretation with <strong>the</strong> implicature), lower-bounded (blocking<br />

an interpretation with <strong>the</strong> implicature) or neutral. These cognitive studies showed that<br />

<strong>the</strong> implicature at hand is only generated in upper-bounded contexts. Our main interest<br />

will be limited to <strong>the</strong>se upper-bounded contexts, and inside <strong>the</strong>se contexts to have an account<br />

<strong>of</strong> both <strong>the</strong> cases for which <strong>the</strong> preference is marked and those where it isn’t. In <strong>the</strong><br />

future we shall try to extend our analysis to all kinds <strong>of</strong> contexts, notably with <strong>the</strong> benefit<br />

<strong>of</strong> experimental data (see (4)).<br />

We will first outline how scalar implicatures are accounted for in an argumentative perspective<br />

and <strong>the</strong>n see that <strong>the</strong> possibility <strong>of</strong> marking a contrast between an implicature and<br />

its mo<strong>the</strong>r-utterance follows directly from <strong>the</strong> described mechanism. We’ll base our presentation<br />

on <strong>the</strong> account by Anscombre and Ducrot who introduced and first formalized<br />

<strong>the</strong> concept <strong>of</strong> argumentativity in discourse; our explanations are compatible with later<br />

argumentative frameworks, such as <strong>the</strong> decision-<strong>the</strong>oretic one proposed in (Merin, 1999).<br />

218


2.2.1 The derivation <strong>of</strong> Q-Implicatures<br />

The derivation <strong>of</strong> Q-implicatures has known various refinements in <strong>the</strong> argumentative<br />

perspective. The main argument behind this approach to implicatures is <strong>the</strong> possibility to<br />

give an account <strong>of</strong> various cases where no logical entailment scale is at play although a<br />

preference over propositions is observed (for numerous examples see (Hirschberg, 1985)).<br />

Ducrot, and Merin after him, proposes to replace <strong>the</strong> ordering <strong>of</strong> items based on logical relations<br />

by a relevance-based order (Merin’s relevance matches Ducrot’s argumentativity).<br />

The ordering <strong>of</strong> <strong>the</strong> items on a scale is determined by <strong>the</strong>ir argumentative force relative to<br />

<strong>the</strong> topic at hand in discourse. The apparent ordering by informativity (typically assumed<br />

in neo-Gricean approaches) is due to <strong>the</strong> fact that more informative propositions usually<br />

have more argumentative values. In (Ducrot, 1980):61 <strong>the</strong> derivation <strong>of</strong> an implicature<br />

such as (1b) is as follows:<br />

(15) a. 〈sure, possible〉H is an argumentative scale, i.e. a simple utterance including<br />

sure has more argumentative power, regarding a certain conclusion H, than one<br />

relying on possible, and possible has a semantic “at least” interpretation<br />

b. <strong>the</strong> utterance <strong>of</strong> (1b) gets fur<strong>the</strong>r interpreted by an exhaustivity law, similar to<br />

standard Gricean reasoning, and yields <strong>the</strong> desired meaning: since an utterance<br />

relying on sure would have been argumentatively superior and wasn’t used, one<br />

is entitled to infer that <strong>the</strong> corresponding proposition is false<br />

The point that matters here is that <strong>the</strong> implicatures come about from <strong>the</strong> negation <strong>of</strong> propositions<br />

that are argumentatively superior (this remains valid in Merin’s framework even<br />

though <strong>the</strong> mechanism is different).<br />

2.2.2 Results<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

According to <strong>the</strong> mechanism in (15), Q-based implicatures are necessarly argumentatively<br />

opposed to <strong>the</strong>ir mo<strong>the</strong>r-utterance: <strong>the</strong>y come about from <strong>the</strong> negation <strong>of</strong> a proposition<br />

that is argumentatively superior to <strong>the</strong>ir mo<strong>the</strong>r-utterance and thus <strong>the</strong>y argue in <strong>the</strong><br />

opposite direction. This explains <strong>the</strong> core data straightforwardly but not examples such as<br />

(<strong>13</strong>). If <strong>the</strong> sole way to derive Q-implicatures is through a unique argumentation-driven<br />

mechanism, <strong>the</strong>n <strong>the</strong> scalar implicature in (<strong>13</strong>) isn’t accounted for. As it happens, we can<br />

justify its presence on o<strong>the</strong>r grounds.<br />

In <strong>the</strong> context <strong>of</strong> (<strong>13</strong>) <strong>the</strong> proposition including all isn’t argumentatively superior to<br />

that containing some (i.e. to justify Kevin’s good behaviour, it’s better to say that he only<br />

ate some <strong>of</strong> <strong>the</strong> cookies). The mechanism (15) doesn’t exclude <strong>the</strong> all interpretation. On<br />

<strong>the</strong> o<strong>the</strong>r hand, what <strong>the</strong> speaker asserts sets a lower-bound on <strong>the</strong> argumentative force<br />

<strong>of</strong> its assertion: he means to convey something at least as argumentatively strong as its<br />

utterance. Since <strong>the</strong> all-proposition is argumentatively inferior to <strong>the</strong> some-proposition, it<br />

doesn’t belong to <strong>the</strong> speaker’s commitment (in Merin’s terms <strong>the</strong> all-proposition doesn’t<br />

belong to <strong>the</strong> speaker’s upward relevance cone).<br />

The second part <strong>of</strong> (<strong>13</strong>) should thus be treated as a way <strong>of</strong> correcting <strong>the</strong> first part. Such<br />

examples, where semantic and argumentative information are clearly decoupled could be<br />

an interesting starting point in <strong>the</strong> examination <strong>of</strong> <strong>the</strong> nature <strong>of</strong> correction as compared to<br />

reformulation.<br />

Cases including only, such as (16), can also be explained.<br />

219


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

(16) The complete extinction <strong>of</strong> mankind is only possible and not certain.<br />

In (16) only excludes <strong>the</strong> necessity <strong>of</strong> <strong>the</strong> extinction <strong>of</strong> mankind; re-asserting this exclusion<br />

can’t be argumentatively opposed to <strong>the</strong> first part <strong>of</strong> <strong>the</strong> utterance, <strong>the</strong>refore <strong>the</strong> use <strong>of</strong><br />

an adversative should be excluded. However, <strong>the</strong>re is a strong feeling for interpreting <strong>the</strong><br />

second conjunct as echoic. This isn’t surprising as <strong>the</strong> second conjunct is redundant. It’s<br />

an open question to know whe<strong>the</strong>r <strong>the</strong> use <strong>of</strong> only in those examples is limited to echoic<br />

cases (even without <strong>the</strong> second conjunct <strong>of</strong> <strong>the</strong> utterance).<br />

We can add an interesting side-observation to this. As we already remarked, <strong>the</strong> second<br />

conjunct <strong>of</strong> (<strong>13</strong>) demands a reformulative marker <strong>of</strong> some kind to mark <strong>the</strong> cancellation<br />

<strong>of</strong> <strong>the</strong> implicature. This remains valid even in <strong>the</strong> core-cases, as shown in (5b). This<br />

would mean that whereas adversatives aren’t sensitive to <strong>the</strong> presence <strong>of</strong> inferences (but<br />

only to argumentative properties <strong>of</strong> <strong>the</strong> utterances), reformulatives are (and are oblivious<br />

to argumentativity). This is a tentative hypo<strong>the</strong>sis that we shall try to pursue in future<br />

work.<br />

3 Obligatoriness <strong>of</strong> contrast<br />

We gave arguments to explain why <strong>the</strong> examples we’re interested in license a contrast.<br />

We gave no arguments as to why this contrast is preferred when overtly marked. A possibility<br />

we want to examine is <strong>the</strong> application <strong>of</strong> a principle close to Sauerland’s “Maximize<br />

Redundancy”, as stated in (Sauerland, 2008). This principle can be roughly paraphrased<br />

as urging a speaker to prefer, among a set <strong>of</strong> alternatives, a sentence that presupposes<br />

an already existing proposition over a sentence that presupposes nothing (with a pragmatic<br />

approach to presupposition as a proposition that is non-controversially part <strong>of</strong> all<br />

speakers’ Common Ground). Thus, a speaker should prefer saying <strong>the</strong> fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> victim<br />

ra<strong>the</strong>r than a fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> victim because <strong>the</strong> former presupposes a non-controversial<br />

proposition. Uttering <strong>the</strong> latter would suggest that <strong>the</strong> presupposition doesn’t obtain, contrary<br />

to common knowledge. Applied to our case, this means that given two contextually<br />

argumentatively opposed propositions p and q, a speaker will prefer to utter p but q ra<strong>the</strong>r<br />

than p and q. Using a simple conjunction implies that a contrast doesn’t hold between p<br />

and q and thus contradicts intuition, or at least makes <strong>the</strong> speaker sound “dissonant”. At<br />

this stage we need to fur<strong>the</strong>r back up this claim on at least two counts:<br />

1. by ensuring that <strong>the</strong> non-felicitousness <strong>of</strong> (4e) is related to that <strong>of</strong> utterances such as<br />

“a fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> victim”, and that <strong>the</strong> preference is <strong>of</strong> <strong>the</strong> same order <strong>of</strong> magnitude<br />

(as we already mentioned, <strong>the</strong> preference for (4d) is far from absolute)<br />

2. by ensuring that <strong>the</strong> predictions made by <strong>the</strong> Maximization principle apply to <strong>the</strong><br />

cases we study; <strong>the</strong> notion <strong>of</strong> presupposition used by Sauerland is technical and<br />

doesn’t necessarly applies to <strong>the</strong> contrast conveyed by <strong>the</strong> use <strong>of</strong> but (i.e. what is<br />

<strong>of</strong>ten called a conventional implicature ra<strong>the</strong>r than a presupposition)<br />

An alternative explanation for <strong>the</strong> preference for a marked contrast would be to consider<br />

this preference as an idiosyncratic property <strong>of</strong> <strong>the</strong> relation at hand. This would be in line<br />

with <strong>the</strong> approach <strong>of</strong> (Asher and Lascarides, 2003), where it is claimed that <strong>the</strong> semantics<br />

<strong>of</strong> <strong>the</strong> relation <strong>of</strong> Contrast (as defined in SDRT) are such that <strong>the</strong> relation requires a<br />

specific clue to be used, ei<strong>the</strong>r an overt cue element such as but or intonation alone. When<br />

two connected discourse segments are such that <strong>the</strong> second denies a default consequence<br />

220


<strong>of</strong> <strong>the</strong> first, <strong>the</strong> relation <strong>of</strong> Contrast holds and needs to be marked. As an example, <strong>the</strong><br />

first and second segment <strong>of</strong> (17) are opposed: that John doesn’t like hockey is a default<br />

consequence <strong>of</strong> <strong>the</strong> first, since <strong>the</strong> relation <strong>of</strong> opposition exists it needs to be overtly<br />

marked.<br />

(17) John hates sports but he likes hockey.<br />

The preference we observe for using an adversative would <strong>the</strong>n be a consequence <strong>of</strong> <strong>the</strong><br />

particular semantics <strong>of</strong> <strong>the</strong> relation <strong>of</strong> Contrast. In our core data if one ignores <strong>the</strong> implicature<br />

<strong>the</strong> needed opposition is more obvious: <strong>the</strong> implicature denies part <strong>of</strong> <strong>the</strong> denotation<br />

<strong>of</strong> its mo<strong>the</strong>r-utterance and somehow contradicts part <strong>of</strong> it in <strong>the</strong> same way that <strong>the</strong> second<br />

conjunct in (17) denies a default consequence <strong>of</strong> <strong>the</strong> first, thus triggering <strong>the</strong> need for a<br />

Contrast marker.<br />

4 Conclusion<br />

We observed what seemed to be a constraint on <strong>the</strong> felicitous reinforcement <strong>of</strong> some<br />

implicatures. Approaches in traditional Gricean terms weren’t sufficient to explain all <strong>the</strong><br />

possible data we encountered. The main conclusion we drew from this data was that,<br />

despite an apparent strong correlation, inference mechanisms couldn’t be at <strong>the</strong> source <strong>of</strong><br />

<strong>the</strong> argumentative orientation <strong>of</strong> an utterance. Therefore, <strong>the</strong> observed constraint doesn’t<br />

seem to apply on <strong>the</strong> reinforcement operation itself but is ra<strong>the</strong>r due to different discourse<br />

coherence mechanisms. We took an argumentative approach and showed that <strong>the</strong> standard<br />

accounts <strong>of</strong> adversatives and implicatures in this approach worked toge<strong>the</strong>r to justify <strong>the</strong><br />

possibility <strong>of</strong> marking a contrast. The actual preference for marking this available contrast<br />

could be related to <strong>the</strong> intrinsic nature <strong>of</strong> <strong>the</strong> Contrast discourse relation.<br />

We mentioned experimental pragmatics as a mean to shed more light on <strong>the</strong> phenomeon<br />

we studied. Among <strong>the</strong> different points we intend to study are <strong>the</strong> following:<br />

• According to <strong>the</strong> context <strong>the</strong> preference for a marked contrast should differ. In<br />

particular we expect that in lower-bounded contexts <strong>the</strong> preference might disappears<br />

and that <strong>the</strong> use <strong>of</strong> but is odd or takes longer to process.<br />

• We made an hypo<strong>the</strong>sis about reformulatives such as in fact that need to be refined.<br />

They appear to be sensitive to informativity scales and somehow indifferent to <strong>the</strong><br />

argumentative orientation <strong>of</strong> <strong>the</strong> propositions <strong>the</strong>y connect. A test in lower-bounded<br />

contexts could prove relevant to determine <strong>the</strong> truth behind this hypo<strong>the</strong>sis.<br />

The results <strong>of</strong> <strong>the</strong>se experiments could provide support for <strong>the</strong> argumentative approach<br />

to semantics and pragmatics we presented, and thus to an explanation <strong>of</strong> <strong>the</strong> main, nontrivial,<br />

fact we observed: an utterance can convey an implicature and yet be argumentatively<br />

opposed to it.<br />

References<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Anscombre, J.-C. and Ducrot, O. (1977). Deux mais en français, Lingua 43: 23–40.<br />

Anscombre, J.-C. and Ducrot, O. (1983). L’argumentation dans la langue, Pierre<br />

Mardaga, Liège:Bruxelles.<br />

221


<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

Asher, N. and Lascarides, A. (2003). Logics <strong>of</strong> Conversation, Cambridge: Cambridge<br />

University Press.<br />

Benndorf, B. and Koenig, J.-P. (1998). Meaning and context : German aber and sondern,<br />

in J.-P. Koenig (ed.), Discourse and cognition : bridging <strong>the</strong> gap, CSLI Publications,<br />

Stanford, pp. 365–386.<br />

Breheny, R., Katsos, N. and Williams, J. (2005). Are generalised scalar implicatures<br />

generated by default? an on-line investigation into <strong>the</strong> role <strong>of</strong> context in generating<br />

pragmatic inferences, Cognition .<br />

Carston, R. (2005). Relevance <strong>the</strong>ory and <strong>the</strong> saying/implicating distinction, in L. Horn<br />

and G. Ward (eds), The handbook <strong>of</strong> Pragmatics, Blackwell.<br />

Ducrot, O. (1980). Les échelles argumentatives, Les Éditions de Minuit.<br />

Gazdar (1979). Pragmatics: Implicature, Presupposition and Logical Form, New York :<br />

Academic Press.<br />

Hirschberg, J. (1985). A <strong>the</strong>ory <strong>of</strong> scalar implicature, PhD <strong>the</strong>sis, Univ. <strong>of</strong> Pennsylvania.<br />

Horn, L. (1989). A natural history <strong>of</strong> negation, The University <strong>of</strong> Chicago Press.<br />

Horn, L. (1991). Given as new: when redundant information isn’t, Journal <strong>of</strong> Pragmatics<br />

15(4): 3<strong>13</strong>–336.<br />

Jayez, J. and Tovena, L. (2008). Presque and almost: how argumentation derives from<br />

comparative meaning, in O. Bonami and P. C. H<strong>of</strong>herr (eds), Empirical Issues in<br />

Syntax and Semantics, Vol. 7, pp. 1–23.<br />

Levinson, S. C. (2000). Presumptive Meanings: The Theory <strong>of</strong> Generalized Conversational<br />

Implicature, MIT Press, Cambridge, MA, USA.<br />

Merin, A. (1999). Information, relevance and social decision-making, in L. Moss,<br />

J. Ginzburg and M. de Rijke (eds), Logic, Language, and computation, Vol. 2, CSLI<br />

Publications, Stanford:CA, pp. 179–221.<br />

Noveck, I. and Sperber, D. (2007). The why and how <strong>of</strong> experimental pragmatics: The<br />

case <strong>of</strong> ’scalar inferences’, in N. Burton-Roberts (ed.), Advances in Pragmatics,<br />

Palgrave Macmillan, Basingstoke.<br />

Sauerland, U. (2008). Implicated presuppositions, in A. Steube (ed.), Sentence and Context,<br />

Language, Context and Cognition, Mouton de Gruyter, Berlin. to appear.<br />

Wilson, D. and Sperber, D. (2005). Relevance <strong>the</strong>ory, in L. Horn and G. Ward (eds), The<br />

handbook <strong>of</strong> pragmatics, Blackwell.<br />

222


List <strong>of</strong> authors<br />

Martin Avanzini<br />

Martin.Avanzini@student.uibk.ac.at<br />

Institute <strong>of</strong> Computer Science<br />

University <strong>of</strong> Innsbruck<br />

Austria<br />

Timo Baumann<br />

timo@ling.uni-potsdam.de<br />

Institut für Linguistik<br />

Universität Potsdam<br />

Germany<br />

Christopher Brumwell<br />

chrisbrumwell@gmail.com<br />

ILLC<br />

Universiteit van Amsterdam<br />

The Ne<strong>the</strong>rlands<br />

Bert Le Bruyn<br />

Bert.LeBruyn@let.uu.nl<br />

Utrecht Institute <strong>of</strong> Linguistics<br />

Universiteit Utrecht<br />

The Ne<strong>the</strong>rlands<br />

James Burton<br />

jb162@brighton.ac.uk<br />

University <strong>of</strong> Brighton<br />

United Kingdom<br />

Gemma Celestino<br />

gceles@interchange.ubc.ca<br />

Department <strong>of</strong> Philosophy<br />

University <strong>of</strong> British Columbia &<br />

LOGOS Research Group<br />

Canada<br />

Dragan Doder<br />

ddoder@mas.bg.ac.yu<br />

Faculty <strong>of</strong> Mechanical Engineering<br />

Serbian Academy <strong>of</strong> Sciences and Arts<br />

Serbia and Montenegro<br />

Michael Franke<br />

m.franke@uva.nl<br />

ILLC<br />

Universiteit van Amsterdam<br />

The Ne<strong>the</strong>rlands<br />

Gianluca Giorgolo<br />

Gianluca.Giorgolo@let.uu.nl<br />

Utrecht Institute <strong>of</strong> Linguistics<br />

Universiteit Utrecht<br />

The Ne<strong>the</strong>rlands<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

223<br />

Michael Hartwig<br />

Michael.Jua.Hartwig@gmail.com<br />

Multimedia University, Cyberjaya<br />

Malaysia<br />

Simon Hopp<br />

simon.hopp@uni-konstanz.de<br />

Fachbereich Sprachwissenschaft<br />

University <strong>of</strong> Konstanz<br />

Germany<br />

Pierre Lison<br />

pierrel@coli.uni-sb.de<br />

Language Technology Lab<br />

Research Center for Artificial Intelligence<br />

Saarbrücken<br />

Germany<br />

Petar Maksimović<br />

petarmax@mi.sanu.ac.yu<br />

Ma<strong>the</strong>matical Institute<br />

Serbian Academy <strong>of</strong> Sciences and Arts<br />

Serbia and Montenegro<br />

Bojan Marinković<br />

bojanm@mi.sanu.ac.yu<br />

Ma<strong>the</strong>matical Institute<br />

Serbian Academy <strong>of</strong> Sciences and Arts<br />

Serbia and Montenegro<br />

Scott Martin<br />

scott@ling.osu.edu<br />

The Ohio State University<br />

United States<br />

Takako Nemoto<br />

nmt0731@yahoo.co.jp<br />

Ma<strong>the</strong>matical Institute<br />

Tohoku university<br />

Japan<br />

Ivelina Nikolova<br />

iva@lml.bas.bg<br />

LMD, IPP<br />

Bulgarian Academy <strong>of</strong> Sciences<br />

Bulgaria<br />

Yves Peirsman<br />

yvespeirsman@gmail.com<br />

QLVL<br />

Katholieke Universiteit Leuven<br />

Belgium


Aleksandar Perović<br />

pera@sf.bg.ac.yu<br />

Faculty <strong>of</strong> Transport and Traffic Engineering<br />

Serbian Academy <strong>of</strong> Sciences and Arts<br />

Serbia and Montenegro<br />

Maren Schierloh<br />

schierl1@msu.edu<br />

Michigan State University<br />

U.S.A.<br />

Andreas Schnabl<br />

andreas.schnabl@uibk.ac.at<br />

Institute <strong>of</strong> Computer Science<br />

University <strong>of</strong> Innsbruck<br />

Austria<br />

Éva Szilágyi<br />

essay229@gmail.com<br />

Department <strong>of</strong> Linguistics<br />

University <strong>of</strong> Pécs<br />

Hungary<br />

<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />

224<br />

CamiloThorne<br />

camilo.thorne@gmail.com<br />

Faculty <strong>of</strong> Computer Science<br />

Free University <strong>of</strong> Bozen-Bolzano<br />

Italy<br />

Christina Unger<br />

christina.unger@let.uu.nl<br />

Utrecht Institute <strong>of</strong> Linguistics<br />

Universiteit Utrecht<br />

The Ne<strong>the</strong>rlands<br />

Melanie Uth<br />

melanie.uth@ling.uni-stuttgart.de<br />

Institut fr Linguistik/Romanistik<br />

University <strong>of</strong> Stuttgart<br />

Germany<br />

Grégorie Winterstein<br />

gregoire.winterstein@linguist.jussieu.fr<br />

Laboratoire de Linguistique Formelle<br />

Université Paris 7<br />

France

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!